Oct 20, 2003
-------------

- Learning decision trees
	- what is a good tree?
		- e.g., doesn't branch on "Day" in PlayTennis!

- Entropy 
	- # bits necessary to encode something
	- \Sum_i p(i) log_2 p(i)
		- log_2 so you can think in bits

- Examples
	- Entropy of a coin that always falls heads
	- Entropy of a coin that always falls tails
	- Entropy of a fair coin
	- Entropy of a loaded die

- What does this have to do with decision trees?
	- think of entropy of "classifications"
		- encode PlayTennis="Yes" versus
		         PlayTennis="No"

- Example decision tree for PlayTennis dataset
	- why did we pick the attributes we did?
		- Answer: because they cause the
		  most reduction in entropy!

- Details, details
	- picking the first node
		- calculate entropy improvement 
	- divide up the dataset
		- recurse the calculations!
	- a greedy approach

- Worked out example
	- PlayTennis dataset

- Learning methodology
	- training set and test set
	- error curves as function of decision tree size
	- knowing when to stop learning
		- overfitting boundary