Special Machine Learning / Statistics Seminar

State-of-the-art reinforcement learning systems cannot solve hard problems because they lack systematic exploration mechanisms, instead relying on variants of uniform exploration to collect diverse experience. Systematic exploration mechanism exist for Markov Decision Processes (MDPs) but have sample complexity that scales polynomially with the number of unique observations, making them intractable for modern reinforcement learning applications where observations come from a visual sensor. Are there reinforcement learning algorithms that can effectively handle rich (high-dimensional, infinite) observations by engaging in systematic exploration?  To aid in the development of such an algorithm, I will first describe a new model for reinforcement learning with rich observations that we call the Contextual-MDP. This model generalizes both stochastic contextual bandits and MDPs, but is considerably more tractable than Partially Observable Markov Decision Processes (POMDPs). I will…


Link to Full Article: Special Machine Learning / Statistics Seminar