Problem Set for Bioperl Preparation: 1. Download uniprot_sprot: curl -O "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz" 2. Unzip the file. gunzip uniprot_sprot.fasta.gz PROBLEM 1: Bio::DB::Fasta 1. write a script to retrieve all IDs from uniprot_sprot.fasta using the get_all_ids method from Bio::DB:Fasta. 2. Search through the list of IDs for all IDs that contain the term "HDAC" 3. Print the sequences for these proteins, in FASTA format. PROBLEM 2: Bio::SeqIO 1. Write a script using Bio::SeqIO to retrieve the CDS translation from a genbank file. You can download a genbank file from http://courses/problem_sets/sequence.gb PROBLEM 3: Bio::SearchIO * On the command line, run the following command to format the file for using as a blast database: makeblastdb -in uniprot_sprot.fasta -dbtype prot Here is how you can blast your favorite seq against swissprot: blastp -query query.fasta -db uniprot_sprot.fasta -evalue 1e-10 -out query_v_sprot.blastout You can find additional information on BLAST+ at: http://www.ncbi.nlm.nih.gov/books/NBK1763/ 4. Run Blast with your 3 protein sequences from earlier. * use the uniprot_sprot.fasta as your database * run blast with an e-value cut-off of 1e-10 5. Parse your Blast output. For Hits with "significance" less than or equal to 1e-50 retrieve every HSP and print in a tab delimited format: * QUERY Name * HIT Name * HSP Evalue