RiboNN: A deep learning model to predict translation efficiency from mRNA sequence

For more information, please see our RiboNN paper.

System requirements:

This code has been tested on a system with 4 CPUs, 16 Gb RAM, and 1 NViDIA 10A GPU, with Ubuntu 20.04 as the OS (with CUDA Toolkit 11.3 installed). The required softwares are listed in environment.yml.

To install project requirements:

sudo apt install make

# install mamba (https://github.com/conda-forge/miniforge) into "miniforge3/" in the home directory. 
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh -b 

# clone the repo
git clone https://github.com/Sanofi-GitHub/RiboNN.git && cd RiboNN

# install the RiboNN environment
make install

# activate the riboNN environment
~/miniforge3/bin/mamba shell init 
source ~/.bashrc
mamba activate RiboNN

# create a directory for storing pretrained model weights
mkdir models

Note: Depending on your network speed, it may take a few minutes to install the required packages.

To download pre-trained model weights for transfer learning or making predictions: Please download weights.zip from https://zenodo.org/records/15375573, unzip it, and put its contents in the models directory.
To train the RiboNN model from scratch:
1. Put the training data in a tab-separated text file in the "data" folder, which already contain an example training data file. The tab-separated text file should have columns named "tx_id" (unique transcript IDs), "utr5_sequence", "cds_sequence" (including start and stop codons), and "utr3_sequence". Alternatively, the file may have columns named "tx_id", "tx_sequence" (full transcript seuquences containing 5'UTR, CDS, and 3'UTR), "utr5_size" (lengths of the 5'UTRs), and "cds_size" (lengths of the CDSs).
2. Edit the path to the training data ("tx_info_path") and other hyperparameters defined in the config/conf.yml file.
3. Edit the code below line 18 of src/main.py to control how the model will be trained.
4. Run make train at the terminal to start the training process.
To do transfer learning (using pretrained multi-task models saved in the models directory):
1. Put the training data in a tab-separated text file in the "data" folder, which already contain an example training data file. The tab-separated text file should have columns named "tx_id" (unique transcript IDs), "utr5_sequence", "cds_sequence" (including start and stop codons), and "utr3_sequence". Alternatively, the file may have columns named "tx_id", "tx_sequence" (full transcript seuquences containing 5'UTR, CDS, and 3'UTR), "utr5_size" (lengths of the 5'UTRs), and "cds_size" (lengths of the CDSs).
2. Edit the path to the training data ("tx_info_path") and other hyperparameters defined in the config/conf.yml file.
3. Edit the code below line 126 of src/main.py to control how the model will be trained.
4. Run make transfer_learning at the terminal to start the training process.
To make predictions after downloading the model weights (state_dict) into the models directory:
1. Please create a tab-separated text file with columns named "tx_id" (unique transcript IDs), "utr5_sequence", "cds_sequence" (including start and stop codons), and "utr3_sequence". Alternatively, the file may have columns named "tx_id", "tx_sequence" (full transcript seuquences containing 5'UTR, CDS, and 3'UTR), "utr5_size" (lengths of the 5'UTRs), and "cds_size" (lengths of the CDSs).
2. Save the text file as "prediction_input.txt" in the "data" folder. An example input file can be found in the "data" folder.
3. (Optional) Edit the code below line 173 of src/main.py to control how the model will be used for prediction.
4. Run make predict at the terminal.
5. The predictions will be automatically written to a tab-separated file named "prediction_output.txt" in the "results" folder. Pre-existing files with the same name will be overwritten.
Note: Input transcripts with 5'UTRs longer than 1,381 nt or combined CDS and 3'UTR sizes larger than 11,937 nt will be excluded in the output.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
MODEL WEIGHTS LICENSE.txt		MODEL WEIGHTS LICENSE.txt
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RiboNN: A deep learning model to predict translation efficiency from mRNA sequence

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

Sanofi-Public/RiboNN

Folders and files

Latest commit

History

Repository files navigation

RiboNN: A deep learning model to predict translation efficiency from mRNA sequence

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages