Nov 10, 2003
-------------

- Reinforcement learning
        - grandest topic in AI
        - can subsume all of AI itself

- Simple example
        - Q: "what is the value of the function when A,B=1?"
        - A: "2"
        - you get a slap!
                - how/what do you learn?
                - no instructive feedback is provided!

- General setting
        - states
        - actions
        - state transition table
        - rewards table

- A simple problem: gridworld1
        - three states, organized as
                - S1 - S2 - S3
        - four possible actions
                - 1l
                - 2l
                - 1r
                - 2r
        - bumping into wall keeps state same, but
                - gives reward of -5 or -10
                  (depends on force of impact)
        - zero rewards for legal actions

- Policies
        - a facet of the agent
        - give mapping from states to actions
        - have value functions

- V^pi(s) 
        - expected cumulative sum of discounted rewards,
          if you start from state "s" and apply policy "pi"

- Discounting: gamma
        - between 0 and 1
        - If 1, V^pi(s) becomes straight sum
        - If 0, V^pi(s) becomes a short sighted agent

- Use of gamma
        - makes infinite sums converge!

- Working out V^pi(s) for a given agent (from its policy)
        - is this the best policy possible?
        - how would you improve it?

- Different agents
	- have different policies

- A more complicated policy
        - agent has non-zero probability of taking two actions in 
          a given state

- Note
	- two different policies can have same value(s)