Problem Set for Bioperl

Preparation:

1. Download uniprot_sprot:

curl -O "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz"

2. Unzip the file.

gunzip uniprot_sprot.fasta.gz

PROBLEM 1: Bio::DB::Fasta

1. write a script to retrieve all IDs from uniprot_sprot.fasta using the get_all_ids method from Bio::DB:Fasta.

2. Search through the list of IDs for all IDs that contain the term "HDAC"

3. Print the sequences for these proteins, in FASTA format.

PROBLEM 2: Bio::SeqIO

1. Write a script using Bio::SeqIO to retrieve the CDS translation from a genbank file.

PROBLEM 3: Bio::SearchIO

* On the command line, run the following command to format the file for using as a blast database:

makeblastdb -in uniprot_sprot.fasta -dbtype prot

Here is how you can blast your favorite seq against swissprot:

blastp -query query.fasta -db uniprot_sprot.fasta -evalue 1e-10 -out query_v_sprot.blastout

You can find additional information on BLAST+ at: http://www.ncbi.nlm.nih.gov/books/NBK1763/

4. Run Blast with your 3 protein sequences from earlier.
* use the uniprot_sprot.fasta as your database
* run blast with an e-value cut-off of 1e-10

5. Parse your Blast output. For Hits with "significance" less than or equal to 1e-50 retrieve every HSP and print in a tab delimited format:
* QUERY Name
* HIT Name
* HSP Evalue