Oct 20, 2003 ------------- - Learning decision trees - what is a good tree? - e.g., doesn't branch on "Day" in PlayTennis! - Entropy - # bits necessary to encode something - \Sum_i p(i) log_2 p(i) - log_2 so you can think in bits - Examples - Entropy of a coin that always falls heads - Entropy of a coin that always falls tails - Entropy of a fair coin - Entropy of a loaded die - What does this have to do with decision trees? - think of entropy of "classifications" - encode PlayTennis="Yes" versus PlayTennis="No" - Example decision tree for PlayTennis dataset - why did we pick the attributes we did? - Answer: because they cause the most reduction in entropy! - Details, details - picking the first node - calculate entropy improvement - divide up the dataset - recurse the calculations! - a greedy approach - Worked out example - PlayTennis dataset - Learning methodology - training set and test set - error curves as function of decision tree size - knowing when to stop learning - overfitting boundary