In this exercise we will compute total and strand-specific coverage from RNA-Seq alignments. The coverage files will be examined visually in IGV, and strand information compared with known annotations.
Data for this tutorial was provided by Brian Haas and can be found in the "data/rna" directory. This dataset was derived from S.Pombesamples and includes RNA-Seq strand-specific paired-end reads, a genome fasta reference file, annotations, and genome annotations
igvtools requires Java version 1.6 or later, so before proceeding verify that Java is installed by executing the following from a command terminal.
java -version
Download the latest version of igvtools from www.broadinstitute.org/igv/download and save the zip archive to your home directory. Some browsers, notably safari, will unzip the archive automatically. In this case just move the IGVToolsfolder to your home directory and skip the unzip step below.
Open a command terminal in your home directory and unzip the archive (excute "cd" to insure you are in the home directory)
unzip igvtools_2.3.37.zip
Verify the installation by executing the following
~/IGVTools/igvtools
You should see a version and usage string followed by a summary of commands.
Now add the igvtools command to your path environment variable.This is optional but makes typing the command convenient. If you skip this step you will need to type the full path (or ~/IGVTools/igvtool) to execute the program. The syntax depends on the shell you are running.
For csh and tcsh
set path = ( $path ~/IGVTools)
For bash:
PATH=$PATH:=IGVTools
If you aren't sure which shell you are running try both commands above (one should work and the other is harmless).
Verify the path setting by executing
igvtools
To compute a total coverage file in "tdf"format first open a terminal in the data/rna directory then execute the following:
igvtools count Sp_ds.bam Sp_ds.bam.tdf genome.fa
This should produce a file called "Sp_ds.bam.tdf" containing coverage data for the entire bam.Now compute a strand-specific coverage file. This is possible because the RNA library used for the sample dataset was strand preserving. Specifically, for this library, the strand of the first read of each pair corresponds to the strand of the fragment sequenced. Use the optional "--strands" arugment to separate the coverage count by strand
count --strands first Sp_ds.bam Sp_ds.bam.strands.tdf genome.fa
Launch igv and load the custom genome, annotations, and bam file.
Note: the total coverage file is loaded automatically.
Load the strand specific coverage file "Sp_ds.bam.strands.tdf"
Zoom in on a peak until alignments are shown
Color alignments by strand by right-clicking in the alignment track, then selecting "Color alignments by > first-of-pair strand".
Verify that coverage tracks and alignments correspond with the known strands of the gene annotations.