Avatar of stajich

by stajich

Project/Presentation guidelines

November 30, 2012 in Administrivia

Reports: One report per team. 3-4 pages. Should have the following items/sections: title page (Title, names and IDs of the team members), Introduction, Data and Method, Results. Describe the question(s) that you are trying to address, the data that you used to address the question(s), analysis pipeline or workflow, and the main findings from their analysis (results, summarized in table or graphical form as appropriate).  Provide a link to the code that you developed for this project as github repository or an archive of your source code. You should provide references to any tools, datasets, or background concepts you refer to with the use of literature references.

Presentations: 10-15 mins. One presentation per team. Present the same material as paper, but summarize for audience what you were trying to address and what results you were able to find. Discuss your methods and code you needed to develop to solve the problems in your project. Each team needs to prepare a set of power point slides as they would do for a lab meeting or conference. One member can represent the team and do the whole presentation, or several members can talk about different parts of the presentation.

Avatar of stajich

by stajich

RNA-Seq workflow

November 12, 2012 in Problem Sets

I have posted RNA-Seq pipeline script you can use to run the RNA-Seq analysis with TopHat and Cufflinks on the yeast 1hr and 15hr fermentation time points.

https://github.com/hyphaltip/htbda_perl_class/blob/master/examples/NGS/all_commands_RNASeq.sh

Try and figure out how many genes are differentially expressed between these two time points – look in the cuffdiff folder that will be produced.

Remember to run this as a qsub job by doing

qsub -d `pwd` all_commands_RNASeq.sh

you don’t have to run it in the same directory as the checked out script but you can if you like. This will create a folder of about 3Gb in total so make sure you have enough space when you are running this.

Avatar of stajich

by stajich

Tutorial script with working paths and command lines

November 5, 2012 in Problem Sets

The slides on the website present the proper command line arguments for the java tools.

These examples also remind you to run qsub -I before starting your work.
I also have added modules to the biocluster system now so you don’t have the set the GATK and PICARD environment variables.
you would simply do
module load picard
module load GATK
and now you can run the gatk with
java -Xmx2g -jar $GATK …
or picard tools with
java -Xmx2g -jar $PICARD/SortSam.jar I=In.sam O=Out.bam SORT_ORDER=coordinate
I have written a file which pretty much does the whole tutorial for you in a single shell script. You can experiment with this. One option is for you to work on downloading a different dataset and then comparing two strains to each other.  When you run GATK UnifiedGenotyper you will want to provide multiple BAM files which are the different strains.
Avatar of stajich

by stajich

Homework instructions

October 8, 2012 in Administrivia, Problem Sets

Homework is due at 11:59PM on Tuesday unless specified at a different time.

It will be graded for completeness (did you get the right answer).
The best way to turn in the homework is to upload your code as a gist at http://gist.github.com – send to Dr Stajich the link to each of your solutions.  If you cannot use the gist system for this homework you can also email the two scripts to me – however, it will be harder to show you how to fix your code if you do not upload it to a system like github.
Make sure it is public or we cannot see it.
Please write GEN220 homework 2 in your message or subject.
Avatar of stajich

by stajich

Code to get the sequence for problem 1

October 8, 2012 in Problem Sets

To avoid you worrying about how to get the sequence for problems 1-4 into your program, here is some code that does it for you. I don’t think it is that informative for you deal with cut and paste and removing the newlines (though it is good to learn something that is annoying so you are more potentially excited with your new tools).

Here it is in a gist or in my class repository for the problems.

I also mentioned that the second script could be solved using the ‘index’ function. This is true but may be more involved than other solutions. You should think about how you would go through and examine the codons (based on the fact that you should know the start codon) and think about how you test whether or not a particular codon is a stop codon.

Avatar of stajich

by stajich

Lecture notes and examples in github repository

October 5, 2012 in Administrivia

Github is a repository for source code we will attempt to use for the class as a way to turn in homework so that if changes need to be made, we can more easily show you what is needed.

You can upload code at this site: https://gist.github.com/  and get a tracked “gist” repository for your answers.  This can then be shared with the instructors by emailing us the link.  The best scenario, so you can track all of your gists, is to create a github account, this will allow you keep track of your assignments. I will go over this in class too, but this is something you will want to check out.

All the example code and lectures are available on github in this repository. For example, there are many sample bits of code pulled out from the lectures already available as scripts you can run.

I’m working to get the lecture notes rendered as PDF so you can download them, but having a little bit of trouble. For the time being you can read them in the web rendered form at gitub, just ignore the lines that have !perl on them. For example, here is a link to lecture 1 and is rendered by the github markup.

 

 

Avatar of stajich

by stajich

Week 2 announcements

October 2, 2012 in Administrivia

This week’s lectures will be Wednesday 2-4PM and 2-3PM on Friday (but we will stick around to answer questions) and will cover an introduction to Perl. There are many tutorials out there on the web, many linked at the Resources page. Feel free to browse there in addition to the material that will be covered in class.

Homework problem sets

No problemsets were required to be turned in from last week’s UNIX problems and we will have problemsets for you to work for this week’s lectures on Wednesday and Friday.

Cluster accounts

Accounts are now ready. If you can come to class early on Wednesday we can get you setup. Or you are welcome to stop by Jason’s office on Wednesday and get your account information ahead of time.

Mailing list for the course

To accomodate those not registered but following along in the class, and encourage discussions among all of us, I have started a mailing list you can try out for asking questions. I will also use it to announce information in addition to what goes on iLearn. Please go there to read the list or subscribe if you want to get the regular emails (you can also just read the list via the web if you don’t want to get more emails – just don’t forget to check it or subscribe to the RSS feed).

Avatar of stajich

by stajich

The lectures and problems from Week 1 are posted

September 28, 2012 in Administrivia

The problem sets and lectures have been updated to present the material on UNIX and command line usage on this site. The same material has been posted on iLearn.

Accounts for biocluster will be issued on Monday Oct 1 and will be able to provide these to you before class so you can practice.

Avatar of stajich

by stajich

Syllabus available

September 27, 2012 in Resources

Syllabus for the course is available as a Google Doc.  We will update if any of the order of lectures needs to change. The course will progress through Perl and then data driven problems.

The course will be organized with alternating 30 minute lectures and 30 minute workshops during the 2hr class period. The problem sets will be assigned in the middle of the week and are due the following week. Many of them you will have time to work during class and some will require out of class effort.

For the projects, you will be expected to work in teams of 3-4 people for the projects and the topics will be a blend of research topics you are interested in and the projects we have designed.

Avatar of stajich

by stajich

Class begins Friday Sept 28

September 25, 2012 in Resources

We will kick of the class on Sept 28 with coverage of an introduction to UNIX. This will be possibly an intense course for some, learning a new language and the command line. There will be problem sets during class and for homework so you can practice and improve your skills.

Focus on learning how to make directories, move in and out of the directories. Running some simple programs to search for text (grep) and counting the number of lines in files (wc).

Check out a few of these tutorials in case that might help you get started