Problem Set for Bioperl

Preparation:

	1.  Download uniprot_sprot:

		curl -O "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz"

	2. Unzip the file.

		gunzip uniprot_sprot.fasta.gz


PROBLEM 1: Bio::DB::Fasta

	1. write a script to retrieve all IDs from uniprot_sprot.fasta using the get_all_ids method from Bio::DB:Fasta.

	2. Search through the list of IDs for all IDs that contain the term "HDAC"  

	3. Print the sequences for these proteins, in FASTA format.


PROBLEM 2: Bio::SeqIO

	1. Write a script using Bio::SeqIO to retrieve the CDS translation from a genbank file. 

	You can download a genbank file from http://courses/problem_sets/sequence.gb		


PROBLEM 3: Bio::SearchIO


          * On the command line, run the following command to format the file for using as a blast database:


					makeblastdb -in uniprot_sprot.fasta  -dbtype prot
					

            Here is how you can blast your favorite seq against swissprot:

            		blastp -query query.fasta -db uniprot_sprot.fasta -evalue 1e-10 -out query_v_sprot.blastout
            
            You can find additional information on BLAST+ at: http://www.ncbi.nlm.nih.gov/books/NBK1763/


   4. Run Blast with your 3 protein sequences from earlier.
          * use the uniprot_sprot.fasta as your database
          * run blast with an e-value cut-off of 1e-10

   5. Parse your Blast output. For Hits with "significance" less than or equal to 1e-50 retrieve every HSP and print in a tab delimited format:
          * QUERY Name
          * HIT Name
          * HSP Evalue