Problem Set for Bioperl
Preparation:
1. Download uniprot_sprot:
curl -O "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz"
2. Unzip the file.
gunzip uniprot_sprot.fasta.gz
PROBLEM 1: Bio::DB::Fasta
1. write a script to retrieve all IDs from uniprot_sprot.fasta using the get_all_ids method from Bio::DB:Fasta.
2. Search through the list of IDs for all IDs that contain the term "HDAC"
3. Print the sequences for these proteins, in FASTA format.
PROBLEM 2: Bio::SeqIO
1. Write a script using Bio::SeqIO to retrieve the CDS translation from a genbank file.
PROBLEM 3: Bio::SearchIO
* On the command line, run the following command to format the file for using as a blast database:
makeblastdb -in uniprot_sprot.fasta -dbtype prot
Here is how you can blast your favorite seq against swissprot:
blastp -query query.fasta -db uniprot_sprot.fasta -evalue 1e-10 -out query_v_sprot.blastout
You can find additional information on BLAST+ at: http://www.ncbi.nlm.nih.gov/books/NBK1763/
4. Run Blast with your 3 protein sequences from earlier.
* use the uniprot_sprot.fasta as your database
* run blast with an e-value cut-off of 1e-10
5. Parse your Blast output. For Hits with "significance" less than or equal to 1e-50 retrieve every HSP and print in a tab delimited format:
* QUERY Name
* HIT Name
* HSP Evalue