Getting Started with sRNAnalyzer
Download and Install Dependencies
Make sure you have python 2.6 or later and perl 5 or later installed.
Download and install bowtie. Make sure the
command is in your system path. Note: bowtie 2 is not supported in sRNAnalyzer.
Download and install the fastx_toolkit,
following the instructions on the website. Download the fastx 0.0.14 version. Make sure the command
fastx_collapser is in your
Download and install cutadapt.
This requires python 2.6 or later and a C compiler. The easiest way to install cutadapt is using
pip following the instructions on the cutadapt website. Make sure the
cutadapt command is in
your system path.
Download and Setup sRNAnalyzer
Download sRNAnalyzer. Unzip the downloaded archive.
You may want to add the sRNAnalzyer directory to your system
PATH so you can use the sRNAnalyzer commands
directly. Next, we need to download some databases for alignment. There are three options for databases to download: a small RNA database,
a database with human DNA and RNA, as well as some bacterial sequences, and the NCBI non-human database. The latter two databases are quite
large (> 70GB uncompressed), so it is recommended to begin with the sRNA database. The installation procedure for all three databases is the same.
First, download one of the databases and unzip the archive. Open the
DB_config file and change the line
base: Insert the path to this folder here
by inserting the full path to the folder. For example,
Looking at the
DB_config, you should see a list of database names with paths. These databases are the ones that
you can use in your pipeline now. It is also possible to add many new databases to the pipeline by downloading or building
bowtie indexes and specifying their location in the database configuration file. For more information, see the
Configuration File Documentation
Now you're ready to begin using the pipeline.
Using the Pipeline
In order to use the pipeline, we need to create a pipeline configuration file, which specifies preprocessing setting, such as adapter sequences, and alignment settings such as database order and maximum mismatch allowances. Go to the Config Docs to learn how to create a configuration file with the settings required for your project.
An typical pipeline configuration file is shown below,
preprocess: kit: NEB gzip: true stop-oligo: false alignment: type: single human_miRNA: 2 human_miRNA_sub: 2 human_piRNA: 2 human_snoRNA: 2
Using a terminal, change the directory so that the fastq or fast.gz files you wish to process are in the current working directory. In order to run preprocessing, run the command
/Downloads/sRNAnalyzer/preprocess.pl --config pipeline_config.conf
pipeline_config.yaml is your pipeline configuration file, and
/Downloads/ is replaced with wherever your sRNAnalyzer
folder is located. Or if you have added the sRNAnalyzer directory to the system
PATH, then simply use
preprocess.pl --config pipeline_config.conf
The preprocessing will generated
sample_Processed.fa files that have had adapter trimmed, low-quality reads
filtered out, and collapsed. Additional report files are also generated with information about adapter trimming and read
To perform the alignment, ensure that your database and pipeline configuration files are properly setup. After downloading the initial human small RNA databases, the databases available for alignment, which can be specified in the pipeline configuration file are,
human_miRNA human_miRNA_sub human_piRNA human_snoRNA virus_miRNA plant_miRNA all_miRNA all_miRNA_sub
Then, making sure that you are in the directory containing the _Processed.fa files you wish to align, run the command
/Downloads/sRNAnalyzer/align.pl /home/data pipeline_config.yaml DB_config.conf
align.pl /home/data pipeline_config.yaml DB_config.conf
if you have added sRNAnalyzer to the system
In the command,
pipeline_config.yaml is the pipeline
configuration file and
DB_config.conf is the database configuration file.
The align command will output several files, including feature files, profile files, a read distribution file, and an unmatched sequences file.
The next step in the pipeline is the summarization of the results of the alignment in order to prepare for statistical analysis of the data. An example summarization command is,
summarize.pl DB_config.conf --project my_project
This command will sum the feature and profile result from individual samples into result files for all
my_project is the name of the project, so all of the result files with start with the prefix
my_project_. The general form of the summarize command is,
summarize.pl <db-config-file> <sample-order-file> --project <project-name>
where the db-config-file is required, and the sample-order-file and project-name are both optional.
db-config-file is the database configuration file discussed above, and the
specifies the order of the samples in the result files. If the sample order file is not provided,
the order is alphabetical. The
summarize.pl command has two additional options,
--miRNA flag if you would like to summarize miRNA separately and get information about possible
miRNA SNPs. Use the
--exogenous flag if you would like to summarize exogenous reads, including summarizing by taxonomy information.
Note that the
--exogenous option is only available if the MainDBs or NCBI_NonHuman databases are installed.