Homework 4
CS 5046 (Spring 2007)

Assigned on March 22, 2007
Due by 11AM, March 29, 2007
Submit by email to murali AT cs DOT vt DOT edu

This homework deals with problems that you can solve using depth-first search and/or breadth-first search. We will use these algorithms to explore properties of the Gene Ontology.

Problem 1
(5 points) Add a method called get_root to the GO::Ontology class that takes a GO namespace (category) as argument and returns the instance of GO::Term that corresponds to the root function of that GO namespace. For example, if the argument is biological_process, then the method should return the instance for "GO:0008150". The only legal arguments for this method are "biological_process", "cellular_component", or "molecular_function".
Problem 2
(30 points) Since functions in GO are arranged in a directed acyclic graph (DAG), there are multiple paths from a function to the root of the namespace the function belongs to. The length of any such path from a function f to the root is the number of other functions on the path (including the root function). The depth of a function is the length of the shortest such path. (The depth of the root is 0). For example, consider the two paths from GO:0005975 to the root. The first path has the functions GO:0005975, GO:0043170, GO:0008152, and GO:0008150. The second path has the functions GO:0005975, GO:0044238, GO:0008152, and GO:0008150. Both paths have length three. Therefore, the depth of the GO:0005975 is three.

Implement a method called get_depth in the GO::Ontology class that takes a string containing a GO id as an argument and returns the depth of the corresponding function. Make sure that you return a reasonable value when no function exists that corresponds to the GO id.

Problem 3
(15 points) Augment get_depth so that the first time you invoke the method with a particular GO id as argument, the method stores the computed depth. In this manner, the next time the method is invoked with the same argument, you can simply return the cached information rather than recompute the depth from scratch. Think carefully about all the depth information that you might be able to cache in a single invocation.
Problem 4
(30 points) Write a method called get_depths in the GO::Ontology class that takes no arguments and computes the depths of all functions in GO. You should not invoke the get_depth method from the previous problems repeatedly to solve this problem. Consider how you may perform depth-first or breadth-first search from the roots of GO to solve this problem.
Problem 4
(20 points)Write a script called go.pl that uses these methods and others we have developed in class to support the following command line options:
all-depths
Computes the depths of all the functions in GO and prints GO id-depth pairs to the output file, if that option is provided.
depth
Takes a argument that is a GO id and prints out the GO id and its depth on standard output (the terminal).
input-file
The name of the file containing the description of GO (e.g., gene_ontology.obo).
output-file
The name of the file to print results to.
Note that the point of using command line options is that you should not hard code any file names or constants in your programmes. Therefore, you should add any other options that you may deem necessary. Write clear POD for your script.

Submitting your Homework

Last modified: Wed Apr 11 22:31:08 EDT 2007