Reinforcement Learning: Definition, Types, Approaches, Algorithms and Applications



2021-11-07

There are a lot of subsets of machine learning such as Supervised Learning, Unsupervised Learning, Deep Learning or Neural Networks and Reinforcement Learning.

Each subset of machine learning has its own advantages, disadvantages, and applications used in within various industries.

There are a lot of characteristics of Reinforcement Learning, mechanism, several applications and advantages which kept it apart from other types of machine learning.

And that is what we are going to discuss about.

So, Let’s get started!

What is Reinforcement Learning?

Reinforcement learning can be understood as a feedback-based machine learning algorithm or technique. In which an agent kept trying to learn within an environment through looking at it outputs or results. For each positive feedback, the agent gets rewards, but if it does not perform well or performs badly, it gets negative feedback or punishments. Let’s try to understand some common terms used within reinforcement learning:

  • Agent: it can be understood as the entity which takes action to improve its performance through increasing positive feedbacks or rewards.
  • Environment: It is the scenario or set of situations which is faced by the agent
  • State: Current situation of an agent returned by an environment
  • Reward: feedback from the environment to evaluate the agent’s tasks
  • Policy: Method to map agent’s state to actions
  • Value: Future reward that an agent would receive by taking an action in a particular state
  • Q-Value or Action Value: quite similar to value, difference is introduced between both due to an additional parameter of Q-Value.

Unlike Supervised Learning, it does not have any labelled dataset. Agent explores the environment, and learns automatically from its own experiences.

In reinforcement learning, the primary goal of agent is to improve the performance through maximizing the positive feedbacks.

As we got that RL agents learn from their own experience. Reinforcement learning can be understood as the core part of Artificial Intelligence, and that’s why most of the AI agents uses the reinforcement concepts.

Characteristics of Reinforcement Learning

There are mainly 5 characteristics of reinforcement learning are given as below:

  1. No Supervisor: Unlike supervised learning, reinforcement learning does not have any pre-experience. It does not have any given example of outcome or targeted variable. it learns from itself only.
  2. Sequential Decision Making: It can be understood as an algorithm or a technique that takes dynamics of the world into decision, it won’t stop itself, and delays parts of the problem until it must not be solved.
  3. Time as a crucial factor: Time always plays as a crucial factor in reinforcement learning.
  4. Feedback takes time: Feedback gets delayed. it never comes instantly.
  5. Agent’s actions determine subsequent data it receives.

Approaches of Reinforcement Machine Learning

There are mainly three approaches of reinforcement learning:

  1. Value Based: Within Value-based method, we try to maximize the value function V(s). In this approach method, the agent can be dreamt of a long-term return at any state under policy (п) at any state.
  2. Policy Based: Within Policy-based approach, we focus on choosing the best policy in which the performed action in every state, maximizes the reward in the future.

There are two types of policy-based approaches:

  • Deterministic: For every state, the same will be taken under a policy (п).
  • N {a,s} = P/A = a/S = S]

    Stochastic: All of the actions have their own certain probabilities, which are determined by the given stochastic function:

  1. Model Based: In Model-based approach, we do create a virtual model for each environment, and the agent try learn from specific environments.

Types of Reinforcement Learning

There are mainly two types of reinforcement learning:

  1. Positive Reinforcement: It can be understood as an event, which occurs due to specific behaviour. Positive Reinforcement Learning gives a positive impact on the action, which is taken by the agent, and it increases the two factors of the behaviour:
  • Strength
  • Frequency

Positive reinforcement can sustain the change for a long-interval. But more than limit, it may lead overloading of states that can reduce the consequences.

  1. Negative Reinforcement: In terms of nature of process, it is contrary to or opposite of positive Reinforcement Learning. It can be said as the event, which aims to strengthen the behaviour that occurs due to a negative condition which have to be stopped, or avoided. It is defined to have minimum performance.

Reinforcement Machine Learning Algorithms

These reinforcement machine learning algorithms are used in many AI applications and gaming. There are mainly three reinforcement learning algorithms:

  • Q-Learning:
  1. Q-learning is also known as off-policy RL algorithms, as it is used for temporal difference learning, and these temporal difference learning methods are technique through which we compare temporal successive predictions.
  2. Usually it learns the value function, which consider itself as Q (s, a). It tells that how much good to take an action “a” at the particular state “s”.
  • State Action Reward State action (SARSA)
  1. State Action Reward State Action (SARSA) is an on-policy temporal difference algorithm, which selects the action for all states through learning using a specific policy.
  2. Usually. the aim of SARSA is to calculate the values of Q п (a, s) for all pairs of a and s, and for the selected current policy п.
  3. Unlike Q-learning, the max. reward for the next state is not required for the upgradation of Q-table in the table. Which introduces the main difference between Q-learning and SARSA.
  4. SARSA uses quintuple such as Q (s, a, r, s’, a’), in which
  1. s: original state
  2. a: original action
  3. r: reward
  4. s’ and a’: net state & new pair

  • Deep Q Neural Network
  1. Deep Q Neural Network or DQN is of the RL algorithms.
  2. DQN can be used in big set environment. As it is really a challenging and complex task to define to define and update a q-table. So, instead of defining a Q-table we use DQN algorithm for each action and state.

Applications of Reinforcement Learning

  1. Robotics
  2. Business Strategy Planning
  3. Automobile Industry
  4. Finance
  5. Game-Playing

Powered by Froala Editor


social media beautiful illustration

Follow Us

linkedin icon
instagram icon
youtube icon


Join Us