Download#

The download module of PPI Origami allows you to download files from their authoratative sources. PPI Origami works best when you designate one folder on your filesystem for keeping all original, untransformed datasets (we’ll call that the “raw folder”). You’ll refer to this folder in the process module, where “raw” files will be transformed and saved in a “processed” folder.

You can find a description of all the possible download commands by running:

ppi_origami download --help

Information specific to arguments of commands can be found by running the command with the help flag:

ppi_origami download COMMAND --help

This information is reproduced on this page.

class ppi_origami.__main__.Download#
static biogrid(raw_folder: Path, version: str = '4.4.224')#

Download the BioGRID PPI dataset to the raw folder. Namely, it downloads the BIOGRID-ORGANISM-{version}.mitab.zip file from the BioGRID release archive.

You can call this function from the CLI using:

ppi_origami download biogrid RAW_FOLDER --version 4.4.224
Parameters:
  • raw_folder (pathlib.Path) – The folder to download the dataset to.

  • version (str) – The BioGRID version to download, defaults to “4.4.224”.

Returns:

None

static dscript(raw_folder: Path, taxon: int)#

Download a D-SCRIPT PPI dataset to the raw folder. D-SCRIPT specifies datasets for H. sapiens, M. musculus, D. melanogaster, S. cerevisiae, C. elegans, and E. coli.

These organisms correspond to the NCBI Taxon IDs: 9606, 10090, 7227, 4932, 6239, and 511145. These are the only valid values of the taxon argument.

You can call this function from the CLI using:

ppi_origami download dscript RAW_FOLDER TAXON
Parameters:
  • raw_folder (pathlib.Path) – The folder to download the dataset to.

  • taxon (int) – The NCBI taxon ID of the organism whose links you wish to download. Must be on of 9606, 10090, 7227, 4932, 6239, or 511145.

Raises:

ValueError – A ValueError is raised when taxon is not one of 9606, 10090, 7227, 4932, 6239, or 511145.

Returns:

None

static hippie(raw_folder: Path, version: str = 'current')#

Download the HIPPIE PPI dataset to the raw folder. You may specify the version to download. A value of “current” results in the latest version being downloaded. To download older version, supply a version number like “2.2”.

You can call this function from the CLI using:

ppi_origami download hippie RAW_FOLDER --version current
Parameters:
  • raw_folder (pathlib.Path) – The folder to download the dataset to.

  • version (str) – The version of the HIPPIE dataset to download, defaults to “current”.

Returns:

None

static oma(raw_folder: Path)#

Download orthology data from OMA to the raw folder. Specifically, it downloads the orthology data in a gzipped XML format (oma-groups.orthoXML.xml.gz), as well as mappings between OMA identifiers and UniProt identifiers (oma-uniprot.txt.gz).

You can call this function from the CLI using:

ppi_origami download oma RAW_FOLDER
Parameters:

raw_folder (pathlib.Path) – The folder to download the dataset to.

Returns:

None

static string_aliases(raw_folder: Path, version: str = '12.0')#

Download the STRING aliases dataset. These are mappings between STRING identifiers, and other identifiers (most importantly, UniProt).

You can call this function from the CLI using:

ppi_origami download string_aliases RAW_FOLDER --version 12.0
Parameters:
  • raw_folder (pathlib.Path) – The folder to download the dataset to.

  • version (str) – The version of the STRING database to download.

Returns:

None

Download the STRING links dataset. Downloads the protein.links.{version}.txt.gz file.

You can call this function from the CLI using:

ppi_origami download string_links RAW_FOLDER --version 12.0 --taxon 9606
Parameters:
  • raw_folder (pathlib.Path) – The folder to download the dataset to.

  • version (str) – The version of the STRING database to download.

  • taxon (Optional[int]) – The NCBI taxon ID of the organism whose links you wish to download. Omit for all organisms. Defaults to None.

Returns:

None

Download the STRING links dataset. Downloads the protein.links.detailed.{version}.txt.gz file.

You can call this function from the CLI using:

ppi_origami download string_links_detailed RAW_FOLDER --version 12.0 --taxon 9606
Parameters:
  • raw_folder (pathlib.Path) – The folder to download the dataset to.

  • version (str) – The version of the STRING database to download.

  • taxon (Optional[int]) – The NCBI taxon ID of the organism whose links you wish to download. Omit for all organisms. Defaults to None.

Returns:

None

Download the STRING physical links dataset. Downloads the protein.physical.links.detailed.{version}.txt.gz file. This dataset only includes PPIs with evidence of physical interactions, and provide more information than string_physical_links.

You can call this function from the CLI using:

ppi_origami download string_physical_detailed_links RAW_FOLDER --version 12.0 --taxon 9606
Parameters:
  • raw_folder (pathlib.Path) – The folder to download the dataset to.

  • version (str) – The version of the STRING database to download.

  • taxon (Optional[int]) – The NCBI taxon ID of the organism whose links you wish to download. Omit for all organisms.

Returns:

None

Download the STRING physical links dataset. Downloads the protein.physical.links.{version}.txt.gz file. This dataset only includes PPIs with evidence of physical interactions.

You can call this function from the CLI using:

ppi_origami download string_physical_links RAW_FOLDER --version 12.0 --taxon 9606
Parameters:
  • raw_folder (pathlib.Path) – The folder to download the dataset to.

  • version (str) – The version of the STRING database to download.

  • taxon (Optional[int]) – The NCBI taxon ID of the organism whose links you wish to download. Omit for all organisms.

Returns:

None

static uniprot_delac(raw_folder: Path)#

Download UniProt deleted accessions to the raw folder. More info can be found in the UniProtKB Manual.

You can call this function from the CLI using:

ppi_origami download uniprot_delac RAW_FOLDER
Parameters:

raw_folder (pathlib.Path) – The folder to download the dataset to.

Returns:

None

static uniprot_id_mapping(raw_folder: Path)#

Download UniProt ID mappings to the raw folder. More info can be found on UniProt.org.

You can call this function from the CLI using:

ppi_origami download uniprot_id_mapping RAW_FOLDER
Parameters:

raw_folder (pathlib.Path) – The folder to download the dataset to.

Returns:

None

static uniprot_sec_ac(raw_folder: Path)#

Download UniProt secondary accessions to the raw folder.

You can call this function from the CLI using:

ppi_origami download uniprot_sec_ac RAW_FOLDER
Parameters:

raw_folder (pathlib.Path) – The folder to download the dataset to.

Returns:

None

static uniprot_seqs_db(processed_folder: Path, taxon: int | None = None)#

Download UniProt sequences and saves them to a LevelDB file in the processed folder. Will download sequences for the specified taxon if taxon is not None.

You can call this function from the CLI using:

ppi_origami download uniprot_seqs_db PROCESSED_FOLDER --taxon 9606
Parameters:
  • processed_folder (pathlib.Path) – The folder to save the database to.

  • taxon (int) – The NCBI taxon ID of the organism whose links you wish to download. Omit to download sequences for all organisms. Defaults to None.

Returns:

None

static uniprot_seqs_fasta(raw_folder: Path)#

Download UniProt sequences in the FASTA format to the raw folder.

You can call this function from the CLI using:

ppi_origami download uniprot_seqs_fasta RAW_FOLDER
Parameters:

raw_folder (pathlib.Path) – The folder to download the dataset to.

Returns:

None

static uniref(raw_folder: Path, threshold: int)#

Download the UniRef dataset from UniProt. You must specify a similarity threshold among the three available options: 50%, 90%, and 100%.

You can call this function from the CLI using:

ppi_origami download uniref RAW_FOLDER THRESHOLD
Parameters:
  • raw_folder (pathlib.Path) – The folder to download the dataset to.

  • threshold (int) – The UniRef identity threshold. Must be one of 50, 90, 100.

Returns:

None