Configuration File Documentation
================================
In order to setup the sRNAnalyzer pipeline, two configuration files are required.
The pipeline configuration file defines the preprocessing and alignment
settings for the pipeline.

Database Configuration File
---------------------------
The database configuration file defines the names of the alignment databases and tells sRNAnalyzer where
these databases are located. The base attribute must be an absolute path, where all all other paths are
relative to the base path. The other paths should be to bowtie indexes, **including** the prefix to the
index files. An example configuration file is shown below.

    base: /DBs/bowtie/indexes/
    human_miRNA: miRBase/hairpin_hsa_anno
    human_piRNA: piRBase/piR_human_v1.0

From this configuration file, we can now use the names `human_miRNA` and `human_piRNA` in the
pipeline configuration file defined below, since sRNAnalyzer can find the bowtie indexes corresponding
to these database names.

Pipeline Configuration File
---------------------------
The pipeline configuration file allows specifying settings for the preprocessing and alignment modules
of the pipeline. This file is in a the [YAML](http://www.yaml.org/start.html)
file format, which makes it very readable.

An example config.yaml file is shown below,

    preprocess:
        kit:        NEB
        gzip:       true
        stop-oligo: false
        barcode:    sampleBarcode
    
    alignment:
        type: single
        human_miRNA:     2
        human_miRNA_sub: 2
        human_piRNA:     2
        human_snoRNA:    2

Preprocess Options
------------------
**kit** - specifies which sRNA library construction kit was used so the adapters
can be properly trimmed. Options are "NEB", "Illumina", and "Bioo". Required if adapter-3p
and adapter-5p are not provided. The sequences for these kits are as follows,

- Illumina - 3' TGGAATTCTCGGGTGCCAAG, 5' GTTCAGAGTTCTACAGTCCGACGATC
- NEB - 3' AGATCGGAAGAGCACACGTCT, 5' GTTCAGAGTTCTACAGTCCGACGATC
- Bioo - 3' NNNNTGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC, 5' GTTCAGAGTTCTACAGTCCGACGATC

**adapter-3p** - specifies the 3 prime adapter sequence to be trimmed. Required if the **kit** option
is not provided.

**adapter-5p** - specifies the 5 prime adapter sequence to be trimmed. Required if the **kit** option
is not provided.

**gzip** - if this option is set to "true", the pipeline will read gzipped .fast.gz files instead of
plain .fastq files. Optional (default is false).

**stop-oligo** - if this option is set to "true", stop-oligo sequences will be trimmed. Optional
(default is false)

**barcode** - specifies the sample barcode file to use when reading barcodes. Optional. 

**min-length** - the minimum length of reads to keep. Optional. Default is 15.

Alignment Options
-----------------
Each row in the alignment section should be formatted like,

    DATABASE_NAME: MAX_MISMATCH

For example,

    human_miRNA: 2

The order of the databases in the config file will be the order the databases are aligned to in
the pipeline. The database names are the names defined in the database configuration file, as described above.

**type** - this specifies whether to use single assignment or multiple assignment for read mapping.
Can be "single" or "multiple". It is recommended that multiple assignment only be used for small RNA
mapping. Optional (default is single assignment). Note that when using the pre-built sRNA indexes, use the
`human_miRNA_mult` database when using multiple assignment, and use the `human_miRNA` and `human_miRNA_sub` databases
when using single assignment.

**cores** - the number of cores that bowtie to use for alignment. Default is 15

