Evolution Strategies and Reinforcement Learning

Reinforcement learning is a machine learning task that involves an agent interacting with a dynamic environment to find the correct parameters (typically out of 1,000,000) in a policy function that best links the input with the output, with the key difference against supervised learning that the agent gains feedback as either punishments (for less-than-optimal behavior) and rewards (which increase based on efficacy). The output suggests a way to improve the agent, rather than a set of correct initial vector variables, which is then iterated into a new process to collect new episodes of interaction and further update the agent (Salimans et al.). According to OpenAI, the older form of what is now known as RL, called evolution strategies (ES), perhaps rivals the performance of standard RL considering there is no agent or environment, in addition to being more easily scalable in a distributed system (Salimans et al.). The ES algorithm is most closely related to a "guess and check" environment. Reinforcement learning is being applied to cases such as self-driving cars and video game programming.