Computing Coverage with igvtools

In this exercise we will compute total and strand-specific coverage from RNA-Seq alignments. The coverage files will be examined visually in IGV, and strand information compared with known annotations.

Data

Data for this tutorial was provided by Brian Haas and can be found in the "data/rna" directory. This dataset was derived from S.Pombesamples and includes RNA-Seq strand-specific paired-end reads, a genome fasta reference file, annotations, and genome annotations

Prerequisites

igvtools requires Java version 1.6 or later, so before proceeding verify that Java is installed by executing the following from a command terminal.

java -version

Install igvtools

Download the latest version of igvtools from www.broadinstitute.org/igv/download and save the zip archive to your home directory. Some browsers, notably safari, will unzip the archive automatically. In this case just move the IGVToolsfolder to your home directory and skip the unzip step below.

Open a command terminal in your home directory and unzip the archive (excute "cd" to insure you are in the home directory)

unzip igvtools_2.3.37.zip

Verify the installation by executing the following

~/IGVTools/igvtools

You should see a version and usage string followed by a summary of commands.

Set the path variable

Now add the igvtools command to your path environment variable.This is optional but makes typing the command convenient. If you skip this step you will need to type the full path (or ~/IGVTools/igvtool) to execute the program. The syntax depends on the shell you are running.

For csh and tcsh

set path = ( $path ~/IGVTools)

For bash:

PATH=$PATH:=IGVTools

If you aren't sure which shell you are running try both commands above (one should work and the other is harmless).

Verify the path setting by executing

igvtools

Compute total and strand-specific coverage

To compute a total coverage file in "tdf"format first open a terminal in the data/rna directory then execute the following:

igvtools count Sp_ds.bam Sp_ds.bam.tdf genome.fa

This should produce a file called "Sp_ds.bam.tdf" containing coverage data for the entire bam.Now compute a strand-specific coverage file. This is possible because the RNA library used for the sample dataset was strand preserving. Specifically, for this library, the strand of the first read of each pair corresponds to the strand of the fragment sequenced. Use the optional "--strands" arugment to separate the coverage count by strand

count --strands first Sp_ds.bam Sp_ds.bam.strands.tdf genome.fa

View the coverage files in IGV

Launch igv and load the custom genome, annotations, and bam file.

From the menu "Genomes > Load Genome from File..."select "genome.fa"in the data/rna folder.
From the menu "File > Load from File..." select "genes.bed" in the data/rna folder
From the menu "File > Load from File..." select "Sp_ds.bam" in the data/rna folder

Note: the total coverage file is loaded automatically.

Load the strand specific coverage file "Sp_ds.bam.strands.tdf"

Zoom in on a peak until alignments are shown

Color alignments by strand by right-clicking in the alignment track, then selecting "Color alignments by > first-of-pair strand".

Verify that coverage tracks and alignments correspond with the known strands of the gene annotations.