Sai's AI Lab

Reinforcement Learning notes

Video -1 :

Alan Turing in his paper first questioned if machines can think? The author then shows a paragraph from the paper which describes that to simulate a adult humans brain we need three things:

In the paper Turing states that it's easier to take a Childs Brain which he compares to the empty notebook and then subject it to a course of an education to obtain a adult human brain - He does this so because he thinks that a child brain will have a very little mechanism which meant that it could be easily programmable.

So the question that arises here through the be paragraph is - Is it easier to create a program that learn over time or to create a program that can learn will achieve over time.

Intelligence was then described as - To be able to learn to make decisions to achieve goals.

This brings us to RL:

The interaction loop :

We have an agent and environment mainly. The agent takes some action which may or may not affect the environment and this then creates an observation from the environment which the agent takes in. The goal of this whole interaction loop is to maximize the sum of the rewards through repeated interaction.

If we don't have a goal specified for the interaction loop, we don't know what the agent will learn through repeated interaction.

RL - based on reward hypothesis : Any goal -> outcome of maximizing cumulative reward.

Formalizing RL:

At each timestamp t :