Getting Started with sRNAnalyzer
==========================
Download and Install Dependencies
---------------------------------
Make sure you have python 2.6 or later and perl 5 or later installed.

Download and install [bowtie](http://bowtie-bio.sourceforge.net/index.shtml). Make sure the `bowtie`
command is in your system path. Note: bowtie 2 is not supported in sRNAnalyzer.

Download and install the [fastx_toolkit](http://hannonlab.cshl.edu/fastx_toolkit/download.html),
following the instructions on the website. Download the fastx 0.0.14 version. Make sure the command `fastx_collapser` is in your
system path.

Download and install [cutadapt](https://cutadapt.readthedocs.io/en/stable/installation.html).
This requires python 2.6 or later and a C compiler. The easiest way to install cutadapt is using
`pip` following the instructions on the cutadapt website. Make sure the `cutadapt` command is in
your system path.

Download and Setup sRNAnalyzer
--------------------------------
Go to (insert website) and download sRNAnalyzer. Unzip the downloaded archive.
You may want to add the sRNAnalzyer directory to your system `PATH` so you can use the sRNAnalyzer commands
directly. 
Next, we need to download some databases for alignment. Go to (insert website) to download
prebuilt bowtie indexes for human small RNA alignment. Open the `DB_config.conf` file and change the line

`base: Insert the path to this folder here`
 
by inserting the full path to the folder. For example,
 
`base: /databases/bowtie/indexes/sRNA_DBs`
 
Looking at the `DB_config.conf`, you should see a list of database names with paths. These databases are the ones that
you can use in your pipeline now. It is also possible to add many new databases to the pipeline by downloading or building
bowtie indexes and specifying their location in the database configuration file. For more information, see the
[Configuration File Documentation](ConfigDocs.html)
Now you're ready to begin using the pipeline. 

Using the Pipeline
------------------
In order to use the pipeline, we need to create a pipeline configuration file, which specifies preprocessing setting, such as
adapter sequences, and alignment settings such as database order and maximum mismatch allowances. Go to the
[Config Docs](ConfigDocs.html) to learn how to create a configuration file with the settings required for your project.

An typical pipeline configuration file is shown below,

    preprocess:
        kit:        NEB
        gzip:       true
        stop-oligo: false
    
    alignment:
        type: single
        human_miRNA:     2
        human_miRNA_sub: 2
        human_piRNA:     2
        human_snoRNA:    2

### Preprocessing ###
Using a terminal, change directory so that the fastq or fast.gz files you wish to process are in the current working
directory. In order to run preprocessing, run the command

    /Downloads/sRNAnalyzer/preprocess.pl --config pipeline_config.conf

where `pipeline_config.yaml` is your pipeline configuration file, and `/Downloads/` is replaced with wherever your sRNAnalyzer
folder is located. Or if you have added the sRNAnalyzer directory to the system `PATH`, then simply use

    preprocess.pl --config pipeline_config.conf

The preprocessing will generated `sample_Processed.fa` files that have had adapter trimmed, low-quality reads
filtered out, and collapsed. Additional report files are also generated with information about adapter trimming and read
quality.


### Alignment ###
To perform the alignment, ensure that your database and pipeline configuration files are properly setup.
After downloading the initial human small RNA databases, the databases available for alignment, which can be
specified in the pipeline configuration file are,
 
    human_miRNA
    human_miRNA_sub
    human_piRNA
    human_snoRNA
    virus_miRNA
    plant_miRNA
    all_miRNA
    all_miRNA_sub
    
Then, the run the alignment, run the command

    /Downloads/sRNAnalyzer/align.pl /home/data pipeline_config.yaml DB_config.conf

or

    align.pl /home/data pipeline_config.yaml DB_config.conf

if you have added sRNAnalyzer to the system `PATH`

In the command, `/home/data` is where the processed .fa files are located, `pipeline_config.yaml` is the pipeline
configuration file and `DB_config.conf` is the database configuration file.

The align command will output several files, including feature files, profile files, a read distribution file, and an
unmatched sequences file.

### Summarization ###
The next step in the pipeline is the summarization of the results of the alignment in order to prepare
for statistical analysis of the data. An example summarization command is,
    
    summarize.pl DB_config.conf --project my_project
    
This command will sum the feature and profile result from individual samples into result files for all
samples. `my_project` is the name of the project, so all of the result files with start with the prefix
`my_project_`. The general form of the summarize command is,

    summarize.pl <db-config-file> <sample-order-file> --project <project-name>

where the db-config-file is required, and the sample-order-file and project-name are both optional.
The `db-config-file` is the database configuration file discussed above, and the `sample-order-file`
specifies the order of the samples in the result files. If the sample order file is not provided,
the order is alphabetical. 

