Math of Intelligence : Markov Decision Process
Math Of Intelligence : Markov Decision Process
What is a Markov Decision Process?
A Markov Decision Process consists of 5 elements: S, A, P, R,
S set of states
A set of actions
R reward function
| P transition probability function: P(s’,r | s,a) |
discounting factor
The states of an MDP have a property that:
It means that the future depends on the current state and not on the history of all previous states.
Bellman Equations
is the state value function. It describes the expected return given the current state s and Q(s,a) is the action value function which describes the expected return given the current state s and the action a, that the agent takes from state s.
Here, is the expected return at time t. That is the expected sum of rewards that we will get after time t. So, can be represented as where is the discount factor.
Now, Eq. (1) becomes:
Similarly, for Q-value,
Bellman Expectation Equations:
Bellman Optimality Equations
Lets find out the optimal values for state value and action value functions: