CS 5984: Programming Assignment 5

CS 5984: Computation for the Life Sciences

Fall Semester, 2002
Programming Assignment 5

The point value (if any) of each step is given in square brackets [ ]. This programming assignment requires you to write some Perl programs and some database commands. The solutions (details below) should be turned in by email to naren@cs.vt.edu, by 5pm on Dec 11, 2002.

The goal of this assignment is three fold: (i) to populate a database using data that is given in files, (ii) to gain experience in writing queries, and (iii) getting the answers to queries out of the database and printing them. The following five parts to the assignment take you through all these steps.

[10] The story begins with 384 clones (genes), information about whom is distributed across three files: mild.txt, severe.txt, and members.txt. Each of these files gives some information about clones used in microarray experiments. Download them and save them onto your home directory. mild.txt and severe.txt are files that describe two different experimental conditions - mild stress and severe stress. Each file lists the expression level of each of the 384 clones, which can be a "+" (up-expressed), "-" (down-expressed), or "0" (not expressed or no information available). Verify that both these files have 384 lines, and two columns delimited by white space. Within each file, one clone can have only one of the three possible expression levels. members.txt is a file that assigns clones to functional categories. This file is delimited with a "|" character; here notice that one clone can be a member of many categories. For instance, clone 1 is a member of both "Lignin biosynthesis" and the "Phenylpropanoid pathway".

Familiarize yourself with this data; logon to the postgres system, and create three tables, one for storing the information in each of the above three files. You need to decide the names of the tables, the names of the columns, and choose the data types appropriately. Make sure the PRIMARY KEY is defined suitably for each table.
[10] If you have successfully created the tables, they will be empty at first. Do a "\d" and convince yourself that the tables are indeed there! Now we need to load the data from the files into the tables. To do this, you can write a Perl program that takes the data from each input file and prepares "INSERT INTO ..." commands that are written onto a new file. Then, you can load this file (and hence the commands in it) into the database system using the "psql" command line options (e.g., the "-f" option).

To gain full points for this step, you should write a correct Perl program, execute it, load the prepared file into the database, and go into the database and verify that the data is indeed there! Notice that depending on how you defined the table columns, the Perl program must suitably prepare the INSERT commands. If your Perl program is behaving funnily, notice that you may have to go in and delete any bad data and try again!

It is hence prudent to first try out "INSERT INTO" commands manually in the database, convince yourself that they work, and only then prepare the Perl program that spits out these "INSERT INTO" commands.
[10] At this stage, you have successfully gotten the data from the files into the database. We are safe for now. Login to the database and write a database view to find the categories of clones that responded positively in either the mild condition or the severe condition.

Then write another database view to find the categories of clones that responded negatively in either the mild condition or the severe condition.

Write a third database view to find the categories that are in common among the above two views.

Query these views and convince yourself that they are producing the desired results.
[10] The last view is what we are interested in. You will have to somehow get this data out of the database into a file (suitably name it yourself). We have seen many psql command line options in class for doing this. So at the end of this step, you should have a file that contains a list of categories. We are back in file territory.
[10] Write a Perl program to read this last file, sort the categories, and print the categories in alphabetical order.
[0] Congratulate yourself on a job well done!

How to turnin the assignment

Collect all the work you have done (data files, files containing database definition and data insertion commands, output files) in a directory under your home directory on arabidopsis. Write a README file (plain text, no Microsoft Word please) describing how you did each step, what assumptions you made, what troubles you ran into, how you solved the problems, and the names of the files/tables/views that are created in the process. Email this file to naren@cs.vt.edu so that I can use it as a guide to go into your directory and check these files and commands out. We will also login to the database and verify that your tables and views are working as they should.

Useful Information

How to Get Started (Logging in, passwords etc.)
An interactive session with Postgres, covered in class on Dec 4, 2002.

Please report any problems found in these pages by email to: Naren Ramakrishnan.