Nov 10, 2003 ------------- - Reinforcement learning - grandest topic in AI - can subsume all of AI itself - Simple example - Q: "what is the value of the function when A,B=1?" - A: "2" - you get a slap! - how/what do you learn? - no instructive feedback is provided! - General setting - states - actions - state transition table - rewards table - A simple problem: gridworld1 - three states, organized as - S1 - S2 - S3 - four possible actions - 1l - 2l - 1r - 2r - bumping into wall keeps state same, but - gives reward of -5 or -10 (depends on force of impact) - zero rewards for legal actions - Policies - a facet of the agent - give mapping from states to actions - have value functions - V^pi(s) - expected cumulative sum of discounted rewards, if you start from state "s" and apply policy "pi" - Discounting: gamma - between 0 and 1 - If 1, V^pi(s) becomes straight sum - If 0, V^pi(s) becomes a short sighted agent - Use of gamma - makes infinite sums converge! - Working out V^pi(s) for a given agent (from its policy) - is this the best policy possible? - how would you improve it? - Different agents - have different policies - A more complicated policy - agent has non-zero probability of taking two actions in a given state - Note - two different policies can have same value(s)