CS 4804 Homework #5

Date Assigned: October 24, 2003
Date Due: October 31, 2003, in class, before class starts
  1. (60 points) For the dataset given in http://courses.cs.vt.edu/~cs4804/Fall03/assignments/dt.txt, train a decision tree using a gain (change in entropy) metric at each level. Use all the given data as training data. The first five columns are the values of the features and the last column identifies the class (there are three classes). Notice that each feature has a different range of possible values. Then using the learned tree, classify the data points:

    • (B, G, I, K, N)
    • (C, D, J, L, M)
    • (C, D, J, L, N)
    • (B, F, J, K, M)

    For full credit, display your decision tree, and give the classifications. What do you learn from this exercise?

  2. (20 points) Construct a dataset D on which, if we run a decision tree algorithm, we will get a tree where every internal node has a leaf on its "true" branch. You may assume that the root is counted as an internal node. You may further assume that all attributes are boolean, that every node branches on the value of a single attribute, and that we are using gain (improvement in entropy) as the driving criterion.

    Your example must have at least four internal nodes. What general property do you see in dataset D?

  3. (20 points) Problem 18.10 of your textbook.


Return Home