You are browsing the archive for Problem Sets.

Avatar of stajich

by stajich

RNA-Seq workflow

November 12, 2012 in Problem Sets

I have posted RNA-Seq pipeline script you can use to run the RNA-Seq analysis with TopHat and Cufflinks on the yeast 1hr and 15hr fermentation time points.

Try and figure out how many genes are differentially expressed between these two time points – look in the cuffdiff folder that will be produced.

Remember to run this as a qsub job by doing

qsub -d `pwd`

you don’t have to run it in the same directory as the checked out script but you can if you like. This will create a folder of about 3Gb in total so make sure you have enough space when you are running this.

Avatar of stajich

by stajich

Tutorial script with working paths and command lines

November 5, 2012 in Problem Sets

The slides on the website present the proper command line arguments for the java tools.

These examples also remind you to run qsub -I before starting your work.
I also have added modules to the biocluster system now so you don’t have the set the GATK and PICARD environment variables.
you would simply do
module load picard
module load GATK
and now you can run the gatk with
java -Xmx2g -jar $GATK …
or picard tools with
java -Xmx2g -jar $PICARD/SortSam.jar I=In.sam O=Out.bam SORT_ORDER=coordinate
I have written a file which pretty much does the whole tutorial for you in a single shell script. You can experiment with this. One option is for you to work on downloading a different dataset and then comparing two strains to each other.  When you run GATK UnifiedGenotyper you will want to provide multiple BAM files which are the different strains.
Avatar of stajich

by stajich

Homework instructions

October 8, 2012 in Administrivia, Problem Sets

Homework is due at 11:59PM on Tuesday unless specified at a different time.

It will be graded for completeness (did you get the right answer).
The best way to turn in the homework is to upload your code as a gist at – send to Dr Stajich the link to each of your solutions.  If you cannot use the gist system for this homework you can also email the two scripts to me – however, it will be harder to show you how to fix your code if you do not upload it to a system like github.
Make sure it is public or we cannot see it.
Please write GEN220 homework 2 in your message or subject.
Avatar of stajich

by stajich

Code to get the sequence for problem 1

October 8, 2012 in Problem Sets

To avoid you worrying about how to get the sequence for problems 1-4 into your program, here is some code that does it for you. I don’t think it is that informative for you deal with cut and paste and removing the newlines (though it is good to learn something that is annoying so you are more potentially excited with your new tools).

Here it is in a gist or in my class repository for the problems.

I also mentioned that the second script could be solved using the ‘index’ function. This is true but may be more involved than other solutions. You should think about how you would go through and examine the codons (based on the fact that you should know the start codon) and think about how you test whether or not a particular codon is a stop codon.