Data#
Pretrained Weights#
You can download the pre-trained weights used in the INTREPPPID manuscript from the GitHub releases page.
Precomputed Datasets#
You can download precomputed datasets from the sources below:
Zenodo (DOI: 10.5281/zenodo.10594149)
All datasets are made available under the Creative Commons Attribution-ShareAlike 4.0 International license.
Dataset Format#
INTREPPPID requires that datasets be prepared specifically in HDF5 files.
Each INTREPPPID dataset must have the following hierarchical structure
intrepppid.h5
├── orthologs
├── sequences
│
├── splits
│ ├── test
│ ├── train
│ └── val
│
└── interactions
├── c1
│ ├── c1_train
│ ├── c1_val
│ └── c1_test
│
├── c2
│ ├── c2_train
│ ├── c2_val
│ └── c2_test
│
└── c3
├── c2_train
├── c2_val
└── c2_test
All but one of the “c” folders under “interactions” need be present, so long as that is the dataset you specify in the train step with the --c_type
flag.
Here is the schema for the tables:
Field Name |
Type |
Example |
Description |
---|---|---|---|
|
|
|
The OMA Group ID of the protein in the |
|
|
|
The UniProt accession of a protein with OMA Group ID |
Field Name |
Type |
Example |
Description |
---|---|---|---|
|
|
|
The UniProt accession that corresponds to the amino acid sequence in the |
|
|
|
The amino acid sequence indicated by the |
Field Name |
Type |
Example |
Description |
---|---|---|---|
|
|
|
The UniProt accession of the first protein in the interaction pair. |
|
|
|
The UniProt accession of the second protein in the interaction pair. |
|
|
|
The UniProt accession of the anchor protein for the orthologous locality loss. |
|
|
|
The OMA Group ID of the anchor protein, from which a positive protein can be chose for the orthologous locality loss. |
|
|
|
Label indicating whether |