Download#
The download
module of PPI Origami allows you to download files from their authoratative sources. PPI Origami works best when you designate one folder on your filesystem for keeping all original, untransformed datasets (we’ll call that the “raw folder”). You’ll refer to this folder in the process
module, where “raw” files will be transformed and saved in a “processed” folder.
You can find a description of all the possible download commands by running:
ppi_origami download --help
Information specific to arguments of commands can be found by running the command with the help flag:
ppi_origami download COMMAND --help
This information is reproduced on this page.
- class ppi_origami.__main__.Download#
- static biogrid(raw_folder: Path, version: str = '4.4.224')#
Download the BioGRID PPI dataset to the raw folder. Namely, it downloads the
BIOGRID-ORGANISM-{version}.mitab.zip
file from the BioGRID release archive.You can call this function from the CLI using:
ppi_origami download biogrid RAW_FOLDER --version 4.4.224
- Parameters:
raw_folder (pathlib.Path) – The folder to download the dataset to.
version (str) – The BioGRID version to download, defaults to “4.4.224”.
- Returns:
None
- static dscript(raw_folder: Path, taxon: int)#
Download a D-SCRIPT PPI dataset to the raw folder. D-SCRIPT specifies datasets for H. sapiens, M. musculus, D. melanogaster, S. cerevisiae, C. elegans, and E. coli.
These organisms correspond to the NCBI Taxon IDs: 9606, 10090, 7227, 4932, 6239, and 511145. These are the only valid values of the
taxon
argument.You can call this function from the CLI using:
ppi_origami download dscript RAW_FOLDER TAXON
- Parameters:
raw_folder (pathlib.Path) – The folder to download the dataset to.
taxon (int) – The NCBI taxon ID of the organism whose links you wish to download. Must be on of 9606, 10090, 7227, 4932, 6239, or 511145.
- Raises:
ValueError – A ValueError is raised when
taxon
is not one of 9606, 10090, 7227, 4932, 6239, or 511145.- Returns:
None
- static hippie(raw_folder: Path, version: str = 'current')#
Download the HIPPIE PPI dataset to the raw folder. You may specify the version to download. A value of “current” results in the latest version being downloaded. To download older version, supply a version number like “2.2”.
You can call this function from the CLI using:
ppi_origami download hippie RAW_FOLDER --version current
- Parameters:
raw_folder (pathlib.Path) – The folder to download the dataset to.
version (str) – The version of the HIPPIE dataset to download, defaults to “current”.
- Returns:
None
- static oma(raw_folder: Path)#
Download orthology data from OMA to the raw folder. Specifically, it downloads the orthology data in a gzipped XML format (
oma-groups.orthoXML.xml.gz
), as well as mappings between OMA identifiers and UniProt identifiers (oma-uniprot.txt.gz
).You can call this function from the CLI using:
ppi_origami download oma RAW_FOLDER
- Parameters:
raw_folder (pathlib.Path) – The folder to download the dataset to.
- Returns:
None
- static string_aliases(raw_folder: Path, version: str = '12.0')#
Download the STRING aliases dataset. These are mappings between STRING identifiers, and other identifiers (most importantly, UniProt).
You can call this function from the CLI using:
ppi_origami download string_aliases RAW_FOLDER --version 12.0
- Parameters:
raw_folder (pathlib.Path) – The folder to download the dataset to.
version (str) – The version of the STRING database to download.
- Returns:
None
- static string_links(raw_folder: Path, version: str = '12.0', taxon: int | None = None)#
Download the STRING links dataset. Downloads the
protein.links.{version}.txt.gz
file.You can call this function from the CLI using:
ppi_origami download string_links RAW_FOLDER --version 12.0 --taxon 9606
- Parameters:
raw_folder (pathlib.Path) – The folder to download the dataset to.
version (str) – The version of the STRING database to download.
taxon (Optional[int]) – The NCBI taxon ID of the organism whose links you wish to download. Omit for all organisms. Defaults to
None
.
- Returns:
None
- static string_links_detailed(raw_folder: Path, version: str = '12.0', taxon: int | None = None)#
Download the STRING links dataset. Downloads the
protein.links.detailed.{version}.txt.gz
file.You can call this function from the CLI using:
ppi_origami download string_links_detailed RAW_FOLDER --version 12.0 --taxon 9606
- Parameters:
raw_folder (pathlib.Path) – The folder to download the dataset to.
version (str) – The version of the STRING database to download.
taxon (Optional[int]) – The NCBI taxon ID of the organism whose links you wish to download. Omit for all organisms. Defaults to
None
.
- Returns:
None
- static string_physical_detailed_links(raw_folder: Path, version: str = '12.0', taxon: int | None = None)#
Download the STRING physical links dataset. Downloads the
protein.physical.links.detailed.{version}.txt.gz
file. This dataset only includes PPIs with evidence of physical interactions, and provide more information thanstring_physical_links
.You can call this function from the CLI using:
ppi_origami download string_physical_detailed_links RAW_FOLDER --version 12.0 --taxon 9606
- Parameters:
raw_folder (pathlib.Path) – The folder to download the dataset to.
version (str) – The version of the STRING database to download.
taxon (Optional[int]) – The NCBI taxon ID of the organism whose links you wish to download. Omit for all organisms.
- Returns:
None
- static string_physical_links(raw_folder: Path, version: str = '12.0', taxon: int | None = None)#
Download the STRING physical links dataset. Downloads the
protein.physical.links.{version}.txt.gz
file. This dataset only includes PPIs with evidence of physical interactions.You can call this function from the CLI using:
ppi_origami download string_physical_links RAW_FOLDER --version 12.0 --taxon 9606
- Parameters:
raw_folder (pathlib.Path) – The folder to download the dataset to.
version (str) – The version of the STRING database to download.
taxon (Optional[int]) – The NCBI taxon ID of the organism whose links you wish to download. Omit for all organisms.
- Returns:
None
- static uniprot_delac(raw_folder: Path)#
Download UniProt deleted accessions to the raw folder. More info can be found in the UniProtKB Manual.
You can call this function from the CLI using:
ppi_origami download uniprot_delac RAW_FOLDER
- Parameters:
raw_folder (pathlib.Path) – The folder to download the dataset to.
- Returns:
None
- static uniprot_id_mapping(raw_folder: Path)#
Download UniProt ID mappings to the raw folder. More info can be found on UniProt.org.
You can call this function from the CLI using:
ppi_origami download uniprot_id_mapping RAW_FOLDER
- Parameters:
raw_folder (pathlib.Path) – The folder to download the dataset to.
- Returns:
None
- static uniprot_sec_ac(raw_folder: Path)#
Download UniProt secondary accessions to the raw folder.
You can call this function from the CLI using:
ppi_origami download uniprot_sec_ac RAW_FOLDER
- Parameters:
raw_folder (pathlib.Path) – The folder to download the dataset to.
- Returns:
None
- static uniprot_seqs_db(processed_folder: Path, taxon: int | None = None)#
Download UniProt sequences and saves them to a LevelDB file in the processed folder. Will download sequences for the specified taxon if taxon is not
None
.You can call this function from the CLI using:
ppi_origami download uniprot_seqs_db PROCESSED_FOLDER --taxon 9606
- Parameters:
processed_folder (pathlib.Path) – The folder to save the database to.
taxon (int) – The NCBI taxon ID of the organism whose links you wish to download. Omit to download sequences for all organisms. Defaults to
None
.
- Returns:
None
- static uniprot_seqs_fasta(raw_folder: Path)#
Download UniProt sequences in the FASTA format to the raw folder.
You can call this function from the CLI using:
ppi_origami download uniprot_seqs_fasta RAW_FOLDER
- Parameters:
raw_folder (pathlib.Path) – The folder to download the dataset to.
- Returns:
None
- static uniref(raw_folder: Path, threshold: int)#
Download the UniRef dataset from UniProt. You must specify a similarity threshold among the three available options: 50%, 90%, and 100%.
You can call this function from the CLI using:
ppi_origami download uniref RAW_FOLDER THRESHOLD
- Parameters:
raw_folder (pathlib.Path) – The folder to download the dataset to.
threshold (int) – The UniRef identity threshold. Must be one of 50, 90, 100.
- Returns:
None