Unix Basics: Quick Review ========================= ls -- list contents cd -- change directory mkdir -- make a directory rm -- use caution, it is easy to delete more that you would like head -- prints the top few lines to the terminal window tail -- prints the last few lines to the terminal window sort -- sorts the lines uniq -- prints the unique lines grep -- filnds the lines that contain a pattern wc -- counts the number of lines, characters and words mv -- move files cp -- copy files date -- returns the current date and time pwd -- return working directory name ssh -- remote login scp -- remote secure copy ~ -- represents your home directory man [command] -- manual page for the command man ls try: ls -l ls -lt you can string more than one command together with a pipe (|) , such that the output of the first command is received by the second command. ls -lt | head you can string more than one command together with a semi-colon (;) , such that the commands run sequentially, but that output does not get passed into the next command. date; some program command ; date you can redirect the output of a command into a file grep PATTERN > PATTERN.txt you can append the output of a command to a file grep PATTERN2 >> PATTERN.txt you can redirect stderr to a file command 2> filename you can redirect the output (stdout) and stderr to a file command &> filename text editors: text wrangler is a good app to start with. =============== Unix Problem Set ================ 1. Log into your machine or account. 2. What is the full path to your home directory? 3. Go up one directory? - How many files does it contain? - How many directories? 4. Using your text editor (nano is a good one to start with) create a fasta file and name it sequences.fasta. Make sure it ends up in the proper directory, locally or remotely. This is fasta file format: >seqName description ATGGCGTCTTGGCCTTAAAAGCTC 5. Without using a text editor examine the contents of the file sequences.fasta. - How many lines does this file contain? - How many characters? (Hint: check out the options of wc) - What is the first line of this file? (Hint: read the man page of head) - What are the last 3 lines? (Hint: read the man page of tail) - How many sequences are in the file? (Hint: use grep) 6. Rename sequences.fasta to something more informative of the sequences the file contains. (Hint: read the man page for mv) 7. Create a directory called fasta. (Hint: use mkdir) 8. Copy the fasta file that you renamed to the fasta directory. (Hint: use cp) 9. Verify that the file is within the fasta directory. (Hint: use ls fasta/) 10. Delete the the original file that you used for copying. (Hint: use rm, be careful) 11. Read the man page for rm and cp to find out how to remove and copy a directory. 12. Print out your history and redirect it to a file called unixBasics.history.txt 13. In /home/pfb2014/data there is a file called: cuffdiff.txt - the descriptions of each column in the file are below - look at the first few lines of the file - sort the file by log fold change 'log2(fold_change)', from highest to lowest, and save in a new file in your directory called sorted.cuffdiff.out - sort the file (log fold change highest to lowest) then print out only the first 100 lines. Save in a file called top100.sorted.cuffdiff.out - sort the file, print only first column. Get a unique list of the genes, then print only the top 100. Save in a file called differentially.expressed.genes.txt Cuffdiff file format -------------------- Column number Column name Example Description 1 Tested id XLOC_000001 A unique identifier describing the transcipt, gene, primary transcript, or CDS being tested 2 Tested id XLOC_000001 A unique identifier describing the transcipt, gene, primary transcript, or CDS being tested 3 gene Lypla1 The gene_name(s) or gene_id(s) being tested 4 locus chr1:4797771-4835363 Genomic coordinates for easy browsing to the genes or transcripts being tested. 5 sample 1 Liver Label (or number if no labels provided) of the first sample being tested 6 sample 2 Brain Label (or number if no labels provided) of the second sample being tested 7 Test status NOTEST Can be one of OK (test successful), NOTEST (not enough alignments for testing), LOWDATA (too complex or shallowly sequenced), HIDATA (too many fragments in locus), or FAIL, when an ill-conditioned covariance matrix or other numerical exception prevents testing. 8 FPKMx 8.01089 FPKM of the gene in sample x 9 FPKMy 8.551545 FPKM of the gene in sample y 10 log2(FPKMy/FPKMx) 0.06531 The (base 2) log of the fold change y/x 11 test stat 0.860902 The value of the test statistic used to compute significance of the observed change in FPKM 12 p value 0.389292 The uncorrected p-value of the test statistic 13 q value 0.985216 The FDR-adjusted p-value of the test statistic 14 significant no Can be either "yes" or "no", depending on whether p is greater then the FDR after Benjamini-Hochberg correction for multiple-testing