A far cry from robot domination

A key characteristic of an MDP is that each state must contain all the information the agent needs in order to make an informed decision, a requirement called the “Markov property.” The Markov property basically says that the agent can’t be expected to have any historical memory of its own, outside the state itself. For example, the current state of the Chess board tells me everything I need to know about which is the best move to make next — I don’t need to remember the moves that were made prior in the game. In practice, the real-world problem you are trying to model doesn’t have to be perfectly Markov in order for RL to be able to solve the problem. For example, my memory of how a particular opponent…


Link to Full Article: A far cry from robot domination