Getting Started¶
Let’s get our feet wet by running PPI.org on our local computers. This runs just fine on my Thinkpad T495 laptop, so you don’t need a supercomputer to run it.
You will, however, need to have the following things installed.
(Optional) virtualenv
Step One - Clone the Repo¶
git clone https://github.com/jszym/ppi.bio ppi_bio
Step Two - Set-up your environment¶
Optionally begin by creating a virtual environment for RAPPPID Online
python -m virtualenv venv
source venv/bin/activate
Install all the required packages using pip.
cd ppi_bio
python -m pip install -r requirements.txt
Step Three - Configure your .env
file¶
We need to configure the local deployment using a .env
located in the rapppid_org
directory.
There is a handy example .env
file distributed with the PPI.bio code base. We can begin by
copying as so:
cp ppi_bio/.env.example ppi_bio/.env
Consult the configuration documentation for what to change in this file.
Step Four - Migrate the Database¶
Next, we need to migrate our database, which will create all the tables we need. To do so, run
python manage.py migrate
Step Five - Create your admin account¶
We’ll be using the admin interface in a minute, so we’ll need to create an admin account to log into. That’s easy to do with the following one-liner:
python manage.py createsuperuser
Then, just follow the on-screen instructions.
Step Six - Run the job runner¶
RAPPPID Online requires that, in a separate process, the Django Q job runner is running to process all the long-running ML tasks.
Open up another terminal window/tab and run the following:
python manage.py qcluster
Step Seven - Seed the database¶
Warning
This can take a long time, depending on the species.
We will need to download and load the UniProt cDNA protein sequences for the organisms you wish to make proteome predictions. Type in the following command:
python manage.py proteome_seed 9606
9606 is the NCBI taxon code for humans, and so this command seeds the database with human protein sequences. To do so
for other sequences, you will first need to create an Organism
object that corresponds to your desired species. You
can do this in the admin panel.
Step Eight - Precompute the INTREPPPID/RAPPPID embeddings¶
Warning
This is even longer than the previous step. It can take even longer than a day on under-powered laptops or servers. What I’ve had to do on PPI.bio is compute the embeddings on a faster desktop computer, dump the database to an SQL file, and then restore the database on the PPI.bio server.
Warning
Do not run this before the last step is complete. Otherwise, it’ll quit early and you’ll have a bunch of proteins without their corresponding INTREPPPID/RAPPPID vectors.
In order to make proteome-wide predictions, PPI.bio will need to pre-compute the vectors for the protein sequences in the database. To do this, run the following command:
python manage.py proteome_embed 9606 distinct-stylishly
This will pre-compute the embeddings for all the human sequences using the “nest-much” weights. The “nest-much” weights come installed out-of-the-box, and are trained on Human PPI data from the v11.5 of the STRING database.
You can install other INTREPPPID/RAPPPID weights by adding a new Weights object through the Django Admin panel.
Step Six - Run the test server¶
You can run the test server locally with the following command:
python manage.py runserver
This will output a URL that you can visit to go to your PPI.bio server.