Analysis#
The analysis
module of PPI Origami allows you to validate and analyze datasets.
You can find a description of all the possible download commands by running:
ppi_origami analysis --help
Information specific to arguments of commands can be found by running the command with the help flag:
ppi_origami analysis COMMAND --help
This information is reproduced on this page.
- class ppi_origami.__main__.Analysis#
- static verify_rapppid(dataset_path: Path, taxon_ids: List[int] | None = None, sample_fraction: float | None = None, skip_protein_overlap: bool = False, skip_gotoh: bool = False, skip_sw: bool = False, skip_taxa_check: bool = False)#
Runs some sanity checks on a RAPPPID dataset. Specifically it can test:
Overlap between the protein splits
Overlap between proteins observed in interaction pairs belonging to different splits.
Checks whether proteins belong to a set of expected taxa
Checks whether sequences between splits are similar using either/both Gotoh or Smith-Waterman algorithms.
You can call this function from the CLI using:
ppi_origami analysis verify_rapppid DATASET_PATH --taxon_ids [9606,10090] --sample_fraction 0.05 --skip_protein_overlap False --skip_gotoh False --skip_sw False --skip_taxa_check False
- Parameters:
dataset_path (pathlib.Path) – The path to the RAPPPID dataset.
taxon_ids (Optional[List[int]]) – A list of taxa which proteins will be required to be from. If proteins are detected from taxa other than those specified, the test will fail. Defaults to
None
.sample_fraction (Optional[float]) – The fraction of dataset samples to test for sequence similarity and verifying protein organisms. Dataset samples are randomly chosen. Set to 1 or None to test the whole dataset. Defaults to
None
.skip_protein_overlap (bool) – Skip the protein overlap test.
skip_gotoh (bool) – Skip checking the sequence similarity with the Gotoh algorithm.
skip_sw (bool) – Skip checking the sequence similarity with the Smith-Waterman algorithm.
skip_taxa_check (bool) – Skip checking the protein’s organism.
- Returns:
None