Preparation (from command line): 1. Download uniprot_sprot: curl -O "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz" 2. Unzip the file. gunzip uniprot_sprot.fasta.gz PROBLEM 1: Bio::DB::Fasta 1. write a script to retrieve all IDs from uniprot_sprot.fasta using the Bio::DB:Fasta method: $db->get_all_primary_ids 2. Search through the list of IDs for all IDs that contain the term "HDAC" 3. Print the sequences for these proteins, in FASTA format. PROBLEM 2: Bio::SeqIO 1. Write a script using Bio::SeqIO to retrieve the CDS translation from a genbank file. PROBLEM 3: Bio::SearchIO 1. On the command line, run the following command to format the file for using as a blast database: makeblastdb -in uniprot_sprot.fasta -dbtype prot 2. Here is how you can blast your favorite seq against swissprot: blastp -query query.fasta -db uniprot_sprot.fasta -evalue 1e-10 -out query_v_sprot.blastout **You can find additional information on BLAST+ at: http://www.ncbi.nlm.nih.gov/books/NBK1763/ 3. Run Blast with 3 of your protein sequences from Problem 1. * use the uniprot_sprot.fasta as your database * run blast with an e-value cut-off of 1e-10 4. Parse your Blast output. For Hits with "significance" less than or equal to 1e-50, retrieve every HSP and print in a tab delimited format: * QUERY Name * HIT Name * HSP Evalue |
Comments are closed.