Nov 14, 2003
-------------

- Two ways of solving for V, given a pi
	- direct solve (solve Ax=b)
	- iterative solve

- What is involved in iterative solution
        - update V^pi values in place
        - start with initial guess
        - improve V^pi, and use it in right side again
                - keep doing till convegence!
        - called "iterative policy evaluation"

- Formulating games as RL problems
        - formulating Towers of Hanoi as a RL problem

- Another game: knights move
        - visit all states exactly once!
        - can you formulate it similarly?
                - Ans: NO!

- Why?
        - because the rewards matrix is not "Markov"
        - reward depends not just on previous state
          but also "how you got there"
        - these games are considerably more difficult
          to implement!

- General schemas and loopy diagrams
        - policy evaluation
        - iterative policy evaluation
        - policy updating or policy improvement
        - policy iteration

- Yet another approach 
        - value iteration
        - uses equation for V* directly
                - without reference to pi
        - has a "max" inside it: nonlinearity!  
                - but still can build an iteration
                  around it!