CS 4804 Homework #5
Date Assigned: October 24, 2003
Date Due: October 31, 2003, in class, before class starts
- (60 points) For the dataset given in
http://courses.cs.vt.edu/~cs4804/Fall03/assignments/dt.txt,
train a decision tree using a gain (change in entropy) metric at each
level. Use all the given data as training data.
The first five columns are the values of the features and the
last column identifies the class (there are three classes). Notice that
each feature has a different range of possible values. Then using
the learned tree, classify the data points:
- (B, G, I, K, N)
- (C, D, J, L, M)
- (C, D, J, L, N)
- (B, F, J, K, M)
For full credit, display your decision tree, and give the classifications.
What do you learn from this exercise?
- (20 points) Construct a dataset D on which, if we run a decision tree algorithm,
we will get a tree where every internal node has a leaf on its "true"
branch. You may assume that the root is counted as an internal node.
You may further assume that all attributes are boolean, that every node
branches on the value of a single attribute, and that we are using
gain (improvement in entropy) as the driving criterion.
Your example must have at least four internal nodes. What general property
do you see in dataset D?
- (20 points) Problem 18.10 of your textbook.