Unix Basics: Quick Review
=========================

ls       -- list contents
cd       -- change directory
mkdir    -- make a directory
rm       -- use caution, it is easy to delete more that you would like
head     -- prints the top few lines to the terminal window
tail     -- prints the last few lines to the terminal window
sort     -- sorts the lines
uniq     -- prints the unique lines
grep     -- filnds the lines that contain a pattern
wc       -- counts the number of lines, characters and words
mv       -- move files
cp       -- copy files
date     -- returns the current date and time
pwd      -- return working directory name
ssh      -- remote login
scp      -- remote secure copy
~        -- represents your home directory

man [command] -- manual page for the command
man ls

try:

ls -l

ls -lt

you can string more than one command together with a pipe (|) , such that the output of the first command is received by the second command.

ls -lt | head

you can string more than one command together with a semi-colon (;) , such that the commands run sequentially, but that output does not get passed into the next command.

date; some program command ; date

you can redirect the output of a command into a file

grep PATTERN > PATTERN.txt


you can append the output of a command to a file
   
    grep PATTERN2 >> PATTERN.txt


you can redirect stderr to a file
   
    command 2> filename

you can redirect the output (stdout) and stderr to a file
   
    command &> filename

text editors:
   
    text wrangler is a good app to start with.

===============
Unix Problem Set
================


    1. Log into your machine or account. 
    2. What is the full path to your home directory?
    3. Go up one directory?
        - How many files does it contain?
        - How many directories?
    4. Using your text editor (nano is a good one to start with) create a fasta file and name it sequences.fasta. Make sure it ends up in the proper directory, locally or remotely.

         This is fasta file format:
         >seqName description
         ATGGCGTCTTGGCCTTAAAAGCTC
 
    5. Without using a text editor examine the contents of the file sequences.fasta.
        - How many lines does this file contain?   
        - How many characters?    (Hint: check out the options of wc)
        - What is the first line of this file?    (Hint: read the man page of head)
        - What are the last 3 lines?    (Hint: read the man page of tail)
        - How many sequences are in the file?    (Hint: use grep)
    

    6. Rename sequences.fasta to something more informative of the sequences the file contains. (Hint: read the man page for mv)


    7. Create a directory called fasta.     (Hint: use mkdir)
    8. Copy the fasta file that you renamed to the fasta directory. (Hint: use cp)
    9. Verify that the file is within the fasta directory.    (Hint: use ls fasta/)
    10. Delete the the original file that you used for copying.    (Hint: use rm, be careful)
    11. Read the man page for rm and cp to find out how to remove and copy a directory.
    12. Print out your history and redirect it to a file called unixBasics.history.txt

    13. In /home/pfb2014/data there is a file called: cuffdiff.txt 
        - the descriptions of each column in the file are below
	- look at the first few lines of the file
	- sort the file by log fold change 'log2(fold_change)', from highest to lowest, and save in a new file in your directory called sorted.cuffdiff.out
  	- sort the file (log fold change highest to lowest) then print out only the first 100 lines. Save in a file called top100.sorted.cuffdiff.out
	- sort the file, print only first column. Get a unique list of the genes, then print only the top 100. Save in a file called differentially.expressed.genes.txt


Cuffdiff file format
--------------------
Column number    Column name       Example           Description
1                Tested id         XLOC_000001       A unique identifier describing the transcipt, gene, primary transcript, or CDS being tested
2                Tested id         XLOC_000001       A unique identifier describing the transcipt, gene, primary transcript, or CDS being tested
3                gene              Lypla1            The gene_name(s) or gene_id(s) being tested
4                locus             chr1:4797771-4835363    Genomic coordinates for easy browsing to the genes or transcripts being tested.
5                sample 1          Liver             Label (or number if no labels provided) of the first sample being tested
6                sample 2          Brain             Label (or number if no labels provided) of the second sample being tested
7                Test status       NOTEST            Can be one of OK (test successful), NOTEST (not enough alignments for testing),
                                                       LOWDATA (too complex or shallowly sequenced), HIDATA (too many fragments in locus), or FAIL,
                                                       when an ill-conditioned covariance matrix or other numerical exception prevents testing.
8                FPKMx             8.01089           FPKM of the gene in sample x
9                FPKMy             8.551545          FPKM of the gene in sample y
10                log2(FPKMy/FPKMx) 0.06531           The (base 2) log of the fold change y/x    
11               test stat         0.860902          The value of the test statistic used to compute significance of the observed change in FPKM
12               p value           0.389292          The uncorrected p-value of the test statistic
13               q value           0.985216          The FDR-adjusted p-value of the test statistic
14               significant       no                Can be either "yes" or "no", depending on whether p is greater then the FDR after
                                                     Benjamini-Hochberg correction for multiple-testing