Recap some of the main concepts

  • Scalar variables: Strings and numbers
  • Arrays
  • Hashes
  • remember you can read more and see examples with perldoc -f METHODNAME (e.g. put length in for METHODNAME) You can also read the perl documentation and intro at perl.org/docs.html

Presenter Notes

Scalars and Strings

  • Scalar variables start with $
  • Scalar variables can hold a string or number.

    1 my $single = 'one';
    2 my $double = 2;
    3 my $triple = 33/11;
    
  • Strings can be manipulated with several functions: reverse, lc, uc, length index to find a substring, substr to retrieve a substring

  • you can print out a value with print or printf
  • @array = split($sep,$string) and $string = join($sep,@array) to convert string to array and array to a string

Presenter Notes

Arrays

  • Arrays are used to store multiple values, in an ordered list
  • You can access an item in an array with [] operator like $array[2] to get the value at position 2 in the array. This can be used to retrieve or set a value at that position. For example

    1 my @array = (1,2,3);
    2 print "array is @array\n";
    3 $array[2] = 44;
    4 print "array is @array\n";
    5 print "The 3rd item in the array is $array[2]\n";
    
  • Arrays can be manipulated as well with push, pop, shift, unshift

  • The function sort will return a reordered array
  • The function map can be used to map all values in an array from one value to another

Presenter Notes

Array access

  • Initialize an array in several ways:

    1 my @array = (); #empty array
    2 my @array = (1,2,'seven'); # with values, comma separated
    3 my @array = qw(Kansas Indiana Texas); # quote words
    
  • Retrieve items from an array by assigning to variables

    1 my @array = qw(First Second Third);
    2 my ($one,$two,$three) = @array;
    3 print "$array[1] == $two\n";
    
  • Set items in an array by assigning to variables

    1 my @array = qw(First Second Third);
    2 $array[3] = 'Fourth';
    3 print "$array[3]\n";
    

Presenter Notes

Hashes

  • Also called associate arrays or dictionaries, lookup or store a Key and associate it with a value

  • Use the {} operators to access data in the hash

    1 my %favs;
    2 $favs{'color'} = 'orange';
    3 $favs{'car'}   = 'ford';
    4 $favs{'fruit'} = 'orange';
    5 
    6 print "I like to eat $favs{'fruit'} and drive my
    7 $favs{'color'} $favs{'car'}\n";
    
  • You can access the names of the keys with keys function.

Presenter Notes

Looping

  • for and foreach loops useful for iterating through set of numbers or an array

    1 my @shapes = qw(square triange circle);
    2 foreach my $shape ( @shapes ) {
    3    print "shape is $shapes\n";
    4 }
    5 for( my $i = 0; $i < scalar @shapes; $i++ ) {
    6      print "shape is $shapes[$i]\n";
    7 }
    
  • Can also loop through a hash of keys and values

     1 my %shapes = ('triangle' => 3,
     2           'square'   => 4,
     3           'pentagon' => 5,
     4           'hexagon'  => 6);
     5 # go through the keys, in random (undefined) order
     6 foreach my $shape ( keys %shapes ) {
     7    print "a $shape has $shapes{$shape} sides\n";
     8 }
     9 # sort by number of sides
    10 foreach my $shape ( sort { $shapes{$a} <=> $shapes{$b} } keys %shapes ) {
    11    print "a $shape has $shapes{$shape} sides\n";
    12 }
    

Presenter Notes

Other useful functions

  • rand - returns a random number greater than or equal to 0 and less than value provided to rand (or 1 if no value is provided). my $num = rand(5)
  • int - return only the integer part of a number
  • scalar treat something like a scalar - when there is ambiguous context this can be helpful
  • localtime - return an array of current time in 9 elements (seconds to year), or my $date = localtime will return it as text string
  • ord to convert a symbol (charcter) into ASCII code and chr to convert ASCII code number into a symbol (character)

Reading and Writing (I/O or Input/Output)

  • How to get the command-line
  • Input data from files
  • Read data from a program
  • Print data into a file or print data out to a program

Presenter Notes

Command-line input

  • When you run something on the command-line all the arguments are available to perl. Try to write this simple program and call it 'args.pl'

    1 print "There are ",scalar @ARGV, " arguments\n";
    2 print "The arguments are: @ARGV\n";
    3 print "The first argument is: $ARGV[0]\n";
    
  • Try running the program with this on the command line

    $ perl args.pl arg1 arg2 these_are more args

  • The name of the program run is in the $0 variable

    1 print "The program run is $0 the arguments are @ARGV\n";
    
  • We will explain some helper modules later that make it easy to get all the command line options easy to parse (see Getopt::Long )

Presenter Notes

Implicit variable and Defaults

  • $_ is the implicit variable.
  • Perl lets you be lazy. If you forget to pass in variable to a function it is assumes you want to use the implicit variable. Sometimes there are also defaults which it will use. For example:

    1 $_ = "This is some data";
    2 my @ret = split; # assumes you want to split on white-space,
    3                  # on the implicit variable
    4 @ret = split(/\s+/,$_); # same as the line above
    
  • $_ is also used when something is returned but you do not assign it to something.

  • It is also referred to when you you do not pass in a argument to something

Presenter Notes

Input/Ouput

Open a file with the open function. It takes 2 arguments a filehandle which will be a pointer to the opened file, and a string representing the file to open.

1 open(IN, "input.txt") || die $!;
2 # read a line in
3 my $line = <IN>;
4 # read the whole file
5 while(<IN>) {
6  my ($col1, $col2) = split;
7 }
8 close(IN);

Presenter Notes

Filehandles

Filehandles can also be stored in variables

 1 my $fh;
 2 open($fh => "gene.dat") || die $!;
 3 while(<$fh>) {
 4  print $_;
 5 }
 6 #I like to use this in one line
 7 open(my $fh2 => "gene2.dat") || die $!;
 8 while(<$fh2>) {
 9  print $_;
10 }

Presenter Notes

Writing to a file

open is also used to open a file for printing out to.

1 open(OUT => ">outputfile.txt") || die $!;
2 print OUT join("\t", qw(NAME RANK TOWN)), "\n";
3 print OUT join("\t", qw(GRIFFITH SHERIFF MAYBERRY)), "\n";
4 print OUT join("\t", qw(RAWLS DEPUTY BALTIMORE)), "\n";
5 close(OUT);
6 
7 open(my $fh => ">outfile2.txt") || die $!;
8 print $fh "A data report\n";
9 print $fh "This is also a report line\n";

Presenter Notes

Data embedded in a script

1 while(<DATA>) {
2  my ($col1,$col2) = split(/\s+/,$_);
3 }
4 
5 __DATA__
6 Color  Size Model
7 red    10   Jumbo
8 yellow 8    Large
9 pink   2    Mini

Presenter Notes

Practice

Write a script that will open and print out the first 5 lines of a file. The name of the file to open should be passed in on the command line as the first argument.

 1 use strict;
 2 use warnings;
 3 
 4 # get the name of the file from the cmdline
 5 
 6 # open the file
 7 
 8 # read a line at a time of the file
 9 while( ... ) {
10 # print the line out
11 # and stop after 5 lines
12 }

Presenter Notes

Pipes for processes

You can combine operations that are on the command line with the | operator in UNIX. This can also be used in Perl when specifying an open command to actually run a program and have the output sent back to the Perl program. This can be used to obtain data from a program. For example here we run the grep command to find lines in a file that have the string 'gene'. Only those lines will be returned and thus be seen by the Perl program.

1 open(IN,"grep 'gene' gene.dat | ") || die $!;
2 while(my $line = <IN>) {
3  print "line is $line\n";
4 }

Or it can be used to send data to a program. For example this script will print out data to a program which will then compress the data. Notice how the pipe comes at the beginning because we plan to send data into the program.

1 open(my $fh => "| gzip -c > file.gz") || die $!;
2 print "hello there\n";
3 print "can you see this?\n";

Now look at the file in your directory. It is compressed - you can read it with more and see it is binary file. However you can read it with zcat or you can uncompress it with gunzip.

Presenter Notes

Pipe trick

Can use it to open a compressed file on the fly.

1 open(my $fh => "zcat file.gz |") || die $!;
2 while(<$fh>) {
3  # process data in a file that was compressed, without making a new copy of the file as uncompressed
4 }

Presenter Notes

Read data from the web with cmdline

 1 my $url = 'http://s.fungidb.org/1es5REC';
 2 # -S option will not print any statistics
 3 open(my $fh => "curl -S '$url' |") || die $!;
 4 while(<$fh>) {
 5   print $_;
 6 }
 7 
 8 open($fh => "GET '$url' |") || die $!;
 9 while(<$fh>) {
10   print $_;
11 }
  • the shortened link was to NCBI: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=163644330&retmode=text&rettype=fasta to retrieve a sequence in FASTA

Presenter Notes

Let's try this together

Login to biocluster, Download data files. Data are in this http://courses.stajich.org/public/gen220/data/ which represent some time points in growth for a fungus.

http://courses.stajich.org/public/gen220/data/Nc20H.expr.tab http://courses.stajich.org/public/gen220/data/Nc3H.expr.tab

wget http://courses.stajich.org/public/gen220/data/Nc3H.expr.tab
(or on biocluster)
/shared/gen220/data_files/expression/Nc3H.expr.tab
/shared/gen220/data_files/expression/Nc20H.expr.tab

Write a script to read in the Nc20H and Nc3H data into a hash (one hash for each datafile). Store in the hash the gene name (the 1st column) and the FPKM data. Each gene will appear once in each file.

  • Print out a new file which has the gene name, the expression in 3H and the expression in 20H.
  • Extra - print it out so that it is sorted by the 3HR timepoint

Presenter Notes

A solution

 1 use strict;
 2 use warnings;
 3 my (%expr3H,%expr20H);
 4 open(my $fh => 'Nc3H.expr.tab') || die $!;
 5 while(<$fh>) {
 6  my @row = split("\t",$_);
 7  next if $row[0] eq 'gene_id'; # skip when it is the header line
 8  $expr3H{$row[0]} = $row[5];
 9 }
10 
11 open($fh => 'Nc20H.expr.tab') || die $!;
12 while(<$fh>) {
13  my @row = split("\t",$_);
14  next if $row[0] eq 'gene_id'; # skip when it is the header line
15  $expr20H{$row[0]} = $row[5];
16 }
17 
18 open(my $outfh => ">Combined.tab") || die $!;
19 my $gene;
20 for $gene ( keys %expr3H) {
21  print $outfh join("\t", $gene, $expr3H{$gene}, $expr20H{$gene}), "\n";
22 }
23 
24 open($outfh => ">Combined_sorted.tab") || die $!;
25 for my $gene ( sort { $expr3H{$b} <=> $expr3H{$a} } keys %expr3H) {
26  print $outfh join("\t", $gene, $expr3H{$gene}, $expr20H{$gene}), "\n";
27 }

Presenter Notes