Gene table
Expression data
Gene Ontology
"What are the classes of highly expressed genes in region 50Mb-55Mb of chromosome 5?"
Through a user-interface
Programmatically through SQL
Structured Query Language
E.g: “select gene_id from gene_table where chromosome = ‘chr2’;”
Programmatically through an API in another programming language
Perl DBI
Java JDBC, C ODBC
Start the MySQL client from the command line, and this will bring up a MySQL shell, connected to the MySQL server on you local machine:
$ mysql –u root
To connect to a remote server, for example the public Ensembl MySQL:
$ mysql –h ensembldb.ensembl.org –P 5306 –u anonymous
In the MySQL shell, each line of commands must terminate with a ‘;’
mysql> show databases;
mysql> create database myTestDB; # myTestDB is the database name
mysql> use myTestDB; # Use database myTestDB
mysql> help;
To quit:
mysql> \q;
To cancel a command:
mysql> \c;
CREATE TABLE tablename ( column_1_name datatype [optional constraint] column_2_name datatype [optional constraint] ... );
CREATE TABLE genes (
`gene_id` char(25) NOT NULL DEFAULT '',
`chr` char(5) NOT NULL DEFAULT '',
`start` int(9) NOT NULL DEFAULT '0',
`end` int(9) NOT NULL DEFAULT '0',
PRIMARY KEY (`gene_id`)
);
mysql> use progbio2012; # Use database progbio2012
mysql> SELECT gene_id from genes; #SELECT Column [columns] from Table
SELECT gene_id, chr, start, end FROM genes limit 10;
SELECT * from genes limit 10;
SELECT gene_id FROM genes WHERE chr=‘chr5’;
SELECT gene_id FROM genes
WHERE chr=‘chr5’ and end >= 50000000 and start <= 55000000;
SELECT gene_id , class
FROM genes
WHERE chr=‘chr5’
and (evidence=‘cdna’ OR evidence= ‘est’);
SELECT gene_id, chr, start, end FROM genes
WHERE chr=‘chr10’
ORDER BY end desc
LIMIT 20;
SELECT DISTINCT evidence from genes;
SELECT COUNT(*) FROM genes;
SELECT COUNT(*) From genes where chr=‘chr5’;
SELECT chr, COUNT(*) FROM genes GROUP BY chr;
Gene table
Expression data
Gene Ontology
SELECT genes_go.gene_id, go_id, day1
FROM genes_go, expression
WHERE genes_go.gene_id = expression.gene_id
and go_id = 'GO:0030528';
DBI is a module that provides access to DBMS in Perl
It hides the nuts and bolts for connecting to each type of DBMS, leaving a consistent interface for connecting to a database
The key object in DBI is the database handle ($dbh), which represents a connection to a DBMS.
Create a database handle
Execute a SQL statement using the database handle
Disconnect the handle
my $dbname = "progbio2012";
my $driver = ‘mysql’;
my $user = 'root’;
my $passwd = '';
my $host = 'localhost';
my $port = 3306;
my $dsn = "DBI:$driver:database=$dbname;host=$host;port=$port";
my $dbh = DBI->connect($dsn,$user,$passwd);
my $dbname = "prog2012";
my $driver=‘SQLite’;
my $dsn = "DBI:$driver:$dbname";
my $dbh = DBI->connect($dsn);
my $sql = “SELECT count(*) From gene”;
my $results_array_ref = $dbh->selectall_arrayref($sql);
For example, to create a table using
=pseudocode $dbh = DBI->connect($dsn) $dbh->do(SQL) $dbh->disconnect =end
#! /usr/bin/perl
use strict;
use warnings;
use DBI;
my $dbname = ‘progbio2012’;
my $user = ‘root’;
my $passwd = ‘’;
my $host = ‘localhost’;
my $port = 3306;
my $dsn = "DBI:mysql:database=$dbname;host=$host;port=$port";
my $dbh = DBI->connect($dsn,$user,$passwd);
my $sql = "CREATE table foo (bar char(10));" ;
$dbh->do($sql);
$dbh->disconnect;
exit;
=pseudocode $dbh = DBI->connect($dsn) $fetched_results = $dbh->fetch SQL query do something with results $dbh->disconnect =end
#! /usr/bin/perl
use strict;
use warnings;
use DBI;
my $dbname = 'progbio2012';
my $user = 'root';
my $passwd = '';
my $host = 'localhost';
my $port = 3306;
my $dsn = "DBI:mysql:database=$dbname;host=$host;port=$port";
my $dbh = DBI->connect($dsn,$user,$passwd);
my $query = "SELECT gene_id, chr, start, end from genes";
my @results = @{$dbh->selectall_arrayref($query)};
foreach my $row_ref ( @results){
my $str = join "\t",@{$row_ref};
print $str,"\n";
}
$dbh->disconnect;
exit;
For SQL queries which will only return a single row,
e.g: SELECT COUNT query
=pseudocode $dbh = DBI->connect($dsn) $fetched_row = $dbh->fetch SQL query do something with results $dbh->disconnect =end
#! /usr/bin/perl
use strict;
use warnings;
use DBI;
my $dbname = 'progbio2012';
my $user = 'root';
my $passwd = '';
my $host = 'localhost';
my $port = 3306;
my $dsn = "DBI:mysql:database=$dbname;host=$host;port=$port";
my $dbh = DBI->connect($dsn,$user,$passwd);
my $query = "SELECT COUNT(*) from genes where chr ='chr10'";
my @row = @{$dbh->selectrow_arrayref($query)};
print join ("\t",@row),"\n";
$dbh->disconnect;
exit;
Fetch the expression level for a list of genes
=pseudocode $dbh = DBI->connect($dsn) $sth = $dbh->prepare(SQL) loop: $sth->execute $sth->fetchrow_array; end loop $dbh->disconnect =end
my $dbh = DBI->connect( $dsn, $user, $passwd );
my $query = "SELECT day1, day2, day3, day4
FROM expression
WHERE gene_id = ?";
my $sth = $dbh->prepare($query);
my $file =shift;
open IN,"<",$file || die ("Can't open file $file $!");
while (my $gene_id = <IN>){
chomp $gene_id;
$sth->execute($gene_id);
my @results = @{ $sth->fetchall_arrayref};
foreach my $row_ref (@results) {
my $str = join "\t", @{$row_ref};
print $gene_id,"\t",$str, "\n";
}
}
close IN;
$dbh->disconnect;
exit;
Placeholders:
=pseudocode @genelist = read from file $dbh = DBI->connect($dsn) $sql = “SELECT day1 FROM expression WHERE gene_id = ?” $sth = $dbh->prepare($sql) loop through genelist: $sth->execute($gene) $expr = $sth->fetchrow_array print $expr end loop $dbh->disconnect =end
SelectAll:
=pseudocode @genelist = read from file $dbh = DBI->connect($dsn) $sql = “SELECT gene, day1 FROM expression”; %gene_expr hash for each result in $dbh->selectall_arrayref($sql): $gene_expr hash{$gene} = $expr end for each result loop through genelist: if exist in %gene_expr hash print $expr end loop $dbh->disconnect =end
With $dbh->selectall_hashref
my $dsn = "DBI:mysql:database=$dbname;host=$host;port=$port";
my $dbh = DBI->connect($dsn,$user,$passwd);
my $query = "SELECT gene_id, chr, start, end from genes";
my %results = %{$dbh->selectall_hashref($query,'gene_id')};
foreach my $gene (keys %results){
print $gene,"\t";
$results{$gene}->{'chr'},"\t",
$results{$gene}->{'start'},"\t",
$results{$gene}->{'end'},"\n";
}
For loading a text file into a table from the unix command line:
$ mysqlimport --local –u root databasename filename.txt
For dumping data from a table:
$ mysqldump –u root databasename tablename > table.sql
Or for dumping the entire MySQL database:
$ mysqldump –u root databasename > database.sql
Inserting new rows into table
INSERT into genes (gene_id, chr, start, end, evidence)
values (‘foo’,’chrX’,1000,1500,’cdna’);
Updating data in table
UPDATE genes SET gene_id = ‘bar’ WHERE gene_id = ‘foo’;
mysql> SELECT genes_go.gene_id, go_id, day1
from genes_go
join ( expression )
on ( genes_go.gene_id = expression.gene_id )
where go_id = 'GO:0030528';
on the unix command line:
$ mysql –u root
You're now in the MySQL shell.To use the database progbio2012:
mysql> use progbio2012;
To list the tables in the database:
mysql> show tables;
on the unix command line:
mysql> explain genes;
Try out SHOW command to see the SQL-CREATE syntax for each table:
mysql> show create table genes;