# Databases and Perl DBI # For this problem set, we have installed MySQL servers in each of the machines. # In each MySQL server, you'll find a database named progbio2011. # This database has 3 tables: genes, genes_go, expression # 'genes' table lists the location and evidence class of each gene # 'genes_go' table contains the Gene Ontology terms for genes that have a Gene Ontology annotation # 'expression' table contains the expression level of genes in 4 experiments. 1. Follow the steps below to enter the MySQL shell and use the database progbio2011. - on the unix command line: $ mysql -u root You're now in the MySQL shell. - To use the database progbio2011 mysql> use progbio2011; View the tables in this database mysql> show tables; 2. Use the "explain" command to see the schema of each table mysql> explain genes; - Try out SHOW command to see the SQL-CREATE syntax for each table: mysql> show create table genes; 3. SQL Queries: 3.1. count the number of rows in the gene table 3.2. count the number of genes in the first 100Mb of chr1 from the genes table. 3.3. count the number of distnct genes in the genes_go HINT: count(distinct gene_id) 3.4. Using a COUNT...GROUP BY query, list the number of genes for each evidence class in the genes table 3.5. Using a SQL query that joins the genes_go and expression table, select the exp1 value of genes that have the go_term 'chromatin binding'; 3.6. Try the query above again, but limit it to genes on chr1. 4. Do the following using Perl-DBI 4.1. Using a similar query to #3.5 above, write a Perl DBI script that produces a tab-delimited text file for genes with go_term 'nucleic acid binding'. The text file should have the columns: gene_id, go_term, exp1, exp2, exp3, exp4 4.2. Construct a query to find genes where the expression level in exp4 is greater than exp1. Print out this list of genes. 5. Additional DBI problems 5.1 Compute the gene density on chr10 across 1Mb windows. Assume that Chr10 has a total length of 150Mb. 5.2 Compute the average expression level in each experiment (exp1, exp2, exp3, exp4) for the genes in each GO term.