Homework 1
CS 5046 (Spring 2011)

Assigned on February 14, 2011
Due by 4pm, February 21, 2011
Submit by email to murali AT cs DOT vt DOT edu

The aim of this homework is to help you get comfortable with multi-dimensional data structures. In class, we developed the Expression.pm module that contains several methods to read in and process a two-dimensional matrix of gene expression data. In this homework, you will augment the subroutines in the module. Start from the non-object-oriented Expression.pm we wrote in class. Be sure to write documentation in POD format for the subroutines and for the script you write. See the note on how to submit your solutions below.
Problem 1
(25 points) Implement a subroutine called print that prints a gene expression data set in the same format as the file from which you would have read the data. It is not important that the order of the genes or of the samples be the same as in the input file. What is important is that (a) the output file contain a header line describing the column names and the sample names and (b) every other line contain the gene identifier and the gene expression values for one gene.
Problem 2
(50 points) Suppose your gene expression data is split over multiple files. Implement a subroutine called merge that takes two or more files as arguments, reads the gene expression data set in each file, and merges all these data sets into a single data structure. You can assume that the data for each sample will be in a single input file. One input file can contain data for multiple samples. The format of each input file will be exactly like the format we have parsed in Expression.pm, except that the order of the genes may be different from file to file. You can assume that all the genes will be in each file.
Problem 3
(25 points) Write a script called expression.pl that supports multiple options. The user can supply the -expression-file option one or more times to specify the input files containing the expression data. The user must provide this option at least once. The -merge option will cause the script to merge all the expression data sets into one data structure. The script should print out an error message if the user provides the -merge option but provides the -expression-file option only once. Finally, the -print option should cause the script to the print the merged expression data set to the output file specified as the argument to the -print option. Make sure your script does something reasonable if the user provides the -print option, multiple -expression-file options but does not provide the -merge option.
Submitting your Homework
  • You can submit one script named expression.pl and the Expression.pm module that solve all problems.
  • Create a directory called <YourName>-Homework1 and put expression.pl in that directory. Place Expression.pm in the CS5046 subdirectory.
  • Submit your homework by zipping (or tarring and gzipping) the directory and emailing the zipped file to me.
  • Last modified: Tue Feb 15 13:56:47 EST 2011