Assigned on February 14, 2011
Due by 4pm, February 21, 2011
Submit by email to murali AT cs DOT vt DOT edu
The aim of this homework is to help you get comfortable with
multi-dimensional data structures. In class, we developed the
Expression.pm module that contains several methods to
read in and process a two-dimensional matrix of gene expression
data. In this homework, you will augment the subroutines in the
module. Start from the non-object-oriented Expression.pm
we wrote in class. Be sure to write documentation in POD format for
the subroutines and for the script you write. See the note on how to
submit your solutions below.
Problem 1
(25 points) Implement a subroutine called print
that prints a gene expression data set in the same format as the
file from which you would have read the data. It is not
important that the order of the genes or of the samples be the
same as in the input file. What is important is that (a) the
output file contain a header line describing the column names
and the sample names and (b) every other line contain the gene
identifier and the gene expression values for one gene.
Problem 2
(50 points) Suppose your gene expression data is split over
multiple files. Implement a subroutine called
merge that takes two or more files as arguments,
reads the gene expression data set in each file, and merges all
these data sets into a single data structure.
You can assume that the data for each sample will be in
a single input file. One input file can contain data for multiple
samples. The format of each input file will be exactly like the
format we have parsed in Expression.pm, except that the order of
the genes may be different from file to file. You can assume
that all the genes will be in each file.
Problem 3
(25 points) Write a script called
expression.pl that supports multiple options. The user
can supply the -expression-file option one or more times to
specify the input files containing the expression data. The user
must provide this option at least once. The -merge
option will cause the script to merge all the expression data
sets into one data structure. The script should print out an
error message if the user provides the -merge option
but provides the -expression-file option only
once. Finally, the -print option should cause the
script to the print the merged expression data set to the output
file specified as the argument to the -print
option. Make sure your script does something reasonable if the
user provides the -print option, multiple
-expression-file options but does not provide the
-merge option.
Submitting your Homework
You can submit one script named expression.pl and the
Expression.pm module that solve all problems.
Create a directory called
<YourName>-Homework1 and put expression.pl
in that directory. Place Expression.pm in the
CS5046 subdirectory.
Submit your homework by zipping (or tarring and
gzipping) the directory and emailing the zipped file to me.