Speaker
Description
We look the reinforcement learning dynamics. As the dynamics is a stochastic process, the adequate mathematical tool is the master equation. We introduce the probability distributions for the actions and value functions, then get a master equation, describing the reinforcement learning process. We derived a Hamilton-Jacobi equation for the latter equation. We verify a unique feature of the model (compared to the Master equation of the chemical reaction with few molecules or evolution models with finite population): the variance of distribution disappeared at the steady state, which gives a good credit for the application of the moment closing approximation. Our method (recursive equations) gives accurate expressions both for the mean and variance of variables, while HJE provides only correct results for the mean values. Looking the recursive equations, we express the value function distribution via the solution of a system of ordinary differential equations.