Problem Set: BioPerl

Useful links for Problem Set: BioPerl
  1. BioPerl
  2. BLAST Command Line Applications User Manual

Preparation (from command line):
 
1.  Download uniprot_sprot:
 
  curl -O "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz"
 
2. Unzip the file.
 
  gunzip uniprot_sprot.fasta.gz
 
 
PROBLEM 1: Bio::DB::Fasta
 
1. write a script to retrieve all IDs from uniprot_sprot.fasta using the Bio::DB:Fasta method:
   $db->get_all_primary_ids
 
2. Search through the list of IDs for all IDs that contain the term "HDAC"  
 
3. Print the sequences for these proteins, in FASTA format.
 
 
PROBLEM 2: Bio::SeqIO
 
1. Write a script using Bio::SeqIO to retrieve the CDS translation from a genbank file. 
 
 
PROBLEM 3: Bio::SearchIO
 
1. On the command line, run the following command to format the file for using as a blast database:
  makeblastdb -in uniprot_sprot.fasta  -dbtype prot
 
2. Here is how you can blast your favorite seq against swissprot:
 
  blastp -query query.fasta -db uniprot_sprot.fasta -evalue 1e-10 -out query_v_sprot.blastout
 
   **You can find additional information on BLAST+ at: http://www.ncbi.nlm.nih.gov/books/NBK1763/
 
 
3. Run Blast with 3 of your protein sequences from Problem 1.
      * use the uniprot_sprot.fasta as your database
      * run blast with an e-value cut-off of 1e-10
 
4. Parse your Blast output. For Hits with "significance" less than or equal to 1e-50,
   retrieve every HSP and print in a tab delimited format:
      * QUERY Name
      * HIT Name
      * HSP Evalue

Comments are closed.