Homework 4
CS 5046 (Spring 2011)

Assigned on April 13, 2011
Due by 4pm, April 20, 2011
Submit by email to murali AT cs DOT vt DOT edu

This homework deals with problems that you can solve using depth-first search and/or breadth-first search. We will use these algorithms to explore properties of the Gene Ontology.

Problem 1
(5 points) Add a method called get_root to the GO::Ontology class that takes a GO namespace (category) as argument and returns the instance of GO::Term that corresponds to the root function of that GO namespace. For example, if the argument is biological_process, then the method should return the instance for "GO:0008150". The only legal arguments for this method are "biological_process", "cellular_component", or "molecular_function".
Problem 2
(30 points) Since functions in GO are arranged in a directed acyclic graph (DAG), there are multiple paths from a GO term to the root of the namespace the term belongs to. Each edge in such a path corresponds to an is_a or part_of relationship between two terms. The length of any such path from a term f to the root is the number of other term on the path (including the root term). The depth of a term is the length of the shortest such path. (The depth of the root is 0). For example, consider the two paths from GO:0043413 to the root. The first path has the terms GO:0070085, GO:0005975, GO:0044238, GO:0008152, and GO:0008150. The second path has the terms GO:0043412, GO:0043170, GO:0008152, and GO:0008150. The first path has length five and the second path has length four. Therefore, the depth of GO:0043413 is four.

Implement a method called get_depth in the GO::Ontology class that takes a string containing a GO id as an argument and returns the depth of the corresponding term. Make sure that you return a reasonable value when no term exists that corresponds to the GO id.

Problem 3
(15 points) Augment get_depth so that the first time you invoke the method with a particular GO id as argument, the method stores the computed depth. In this manner, the next time the method is invoked with the same argument, you can simply return the cached information rather than recompute the depth from scratch. Think carefully about all the depth information that you might be able to cache in a single invocation.
Problem 4
(40 points) Write a method called get_depths in the GO::Ontology class that takes no arguments and computes the depths of all terms in GO. You should not invoke the get_depth method from the previous problems repeatedly to solve this problem. Consider how you may perform depth-first or breadth-first search from the roots of GO to solve this problem.
Problem 4
(10 points) Write a script called go.pl that uses these methods and others we have developed in class to support the following command line options:
all-depths
Computes the depths of all the terms in GO and prints GO id-depth pairs to the output file, if that option is provided.
depth
Takes a argument that is a GO id and prints out the GO id and its depth on standard output (the terminal).
input-file
The name of the file containing the description of GO (e.g., gene_ontology.obo).
output-file
The name of the file to print results to.
Note that the point of using command line options is that you should not hard code any file names or constants in your programmes. Therefore, you should add any other options that you may deem necessary. Write clear POD for your script.

Submitting your Homework

Last modified: Wed Apr 13 13:59:28 EDT 2011