Nov 14, 2003 ------------- - Two ways of solving for V, given a pi - direct solve (solve Ax=b) - iterative solve - What is involved in iterative solution - update V^pi values in place - start with initial guess - improve V^pi, and use it in right side again - keep doing till convegence! - called "iterative policy evaluation" - Formulating games as RL problems - formulating Towers of Hanoi as a RL problem - Another game: knights move - visit all states exactly once! - can you formulate it similarly? - Ans: NO! - Why? - because the rewards matrix is not "Markov" - reward depends not just on previous state but also "how you got there" - these games are considerably more difficult to implement! - General schemas and loopy diagrams - policy evaluation - iterative policy evaluation - policy updating or policy improvement - policy iteration - Yet another approach - value iteration - uses equation for V* directly - without reference to pi - has a "max" inside it: nonlinearity! - but still can build an iteration around it!