CS 5984: Computation for the Life Sciences
Fall Semester, 2002
Programming Assignment 5
The point value (if any) of each step is given in square brackets [ ].
This programming assignment requires you to write some Perl programs and
some database commands.
The solutions (details below)
should be turned in by email to naren@cs.vt.edu, by 5pm on Dec 11, 2002.
The goal of this assignment is three fold: (i) to populate a database
using data that is given in files, (ii) to gain experience in writing queries,
and (iii) getting the answers to queries out of the database and printing them.
The following five parts to the assignment take you through all these steps.
- [10] The story begins with 384 clones (genes), information about whom
is distributed across three files:
mild.txt, severe.txt, and
members.txt.
Each of these files gives some information about clones used in
microarray experiments. Download them and save them onto your home directory.
mild.txt and severe.txt are files that describe two different experimental
conditions - mild stress and severe stress. Each file lists the expression
level of each of the 384 clones, which can be a "+" (up-expressed), "-"
(down-expressed), or "0" (not expressed or no information available).
Verify that both these files have 384 lines, and two columns delimited
by white space. Within each file, one clone can have only one of the
three possible expression levels.
members.txt is a file that assigns clones to functional
categories. This file is delimited with a "|" character; here notice
that one clone can be a member of many categories. For instance, clone 1
is a member of both "Lignin biosynthesis" and the "Phenylpropanoid pathway".
Familiarize yourself with this data; logon to the postgres system, and
create three tables, one for storing the information in each of the above
three files. You need to decide the names of the tables, the names of
the columns, and choose the data types appropriately. Make sure the PRIMARY
KEY is defined suitably for each table.
- [10] If you have successfully created the tables, they will be empty
at first. Do a "\d" and convince yourself that the tables are indeed
there! Now we need to load the data from the files into the tables. To
do this, you can write a Perl program that takes the data from each input
file and prepares "INSERT INTO ..." commands that are written onto
a new file. Then, you can load this file (and hence the commands in it)
into the database system using the
"psql" command line options (e.g., the "-f" option).
To gain full points for this step, you should write a correct Perl program,
execute it, load the prepared file into the database,
and go into the database and verify that the data is indeed
there! Notice that depending on how you defined the table columns, the
Perl program must suitably prepare the INSERT commands. If your Perl
program is behaving funnily, notice that you may have to go in and delete
any bad data and try again!
It is hence prudent to first try out "INSERT INTO" commands manually in
the database, convince yourself that they work, and only then prepare
the Perl program that spits out these "INSERT INTO" commands.
- [10] At this stage, you have successfully gotten the data from the files
into the database. We are safe for now. Login to the database
and write a database view to find the categories of clones that
responded positively in either the mild condition or the severe condition.
Then write another database view to find the categories of clones that
responded negatively in either the mild condition or the severe condition.
Write a third database view to find the categories that are in common
among the above two views.
Query these views and convince yourself that they are producing the
desired results.
- [10] The last view is what we are interested in. You will have to
somehow get this data out of the database into a file (suitably name
it yourself). We have seen
many psql command line options in class for doing this. So at the end of
this step, you should have a file that contains a list of categories.
We are back in file territory.
- [10] Write a Perl program to read this last file, sort the categories,
and print the categories in alphabetical order.
- [0] Congratulate yourself on a job well done!
How to turnin the assignment
- Collect all the work you have done (data files, files containing
database definition and data insertion
commands, output files) in
a directory under your home directory on arabidopsis. Write a README file
(plain text, no Microsoft Word please)
describing how you did each step, what assumptions you made, what troubles
you ran into, how you solved the problems, and the names of the files/tables/views that are
created in the process. Email this file to naren@cs.vt.edu so that I can use it as a guide to go into your directory
and check these files and commands out. We will also login to the database
and verify that your tables and views are working as they should.
Useful Information
Please report any problems found in these pages by email to:
Naren Ramakrishnan.