Reinforcement learning and its uses, the Real world applications

Image for post
Image for post

In reinforcement learning, we use reward and punishment mechanism to train agents. Agents will be rewarded for correct behaviors, and they will be punished for wrong behaviors. In this case, the agent will try to minimize its own wrong behaviors and maximize its own correct behaviors.

Image for post
Image for post

Application in unmanned driving

Many papers have mentioned the application of deep reinforcement learning in the field of autonomous driving. In unmanned driving, there are many issues that need to be considered, such as: different speed limits in different places, whether it is a drivable area, and how to avoid obstacles.

Some autonomous driving tasks can be combined with reinforcement learning, such as trajectory optimization, motion planning, dynamic paths, optimal control, and situational learning strategies in highways.

For example, the automatic parking strategy can complete automatic parking. Lane changes can be implemented using q-learning. Overtaking can apply overtaking learning strategies to complete overtaking while avoiding obstacles and maintaining a stable speed thereafter.

AWS DeepRacer is a self-driving racing car designed to test the realization of reinforcement learning algorithms on actual tracks. It can use the camera to visualize the track, and it can use reinforcement learning models to control the throttle and direction.

Image for post
Image for post has successfully applied reinforcement learning to train a car how to drive during the day. They used deep reinforcement learning algorithms to deal with the problem of lane following tasks. Their network structure is a deep neural network with 4 convolutional layers and 3 fully connected layers. An example is shown in the figure. The middle image represents the driver’s perspective.

Image for post
Image for post

In industrial automation, robots based on reinforcement learning are used to perform various tasks. Not only are these robots more efficient than humans, they can also perform dangerous tasks.

Deepmind’s use of AI agents to cool the Google data center is a successful application case. In this way, 40% of energy expenditure is saved. Now, these data centers are completely controlled by artificial intelligence systems, except for a few data center experts, almost no need for other manual intervention. The system works as follows:

  • Take data snapshots from the data center every five minutes and feed them into the deep neural network
  • Predict how different combinations will affect future energy consumption
  • In compliance with safety standards, take measures with minimum power consumption
  • Send corresponding measures to the data center and implement operations

Of course, the specific measures are still implemented by the local control system.

Application of Reinforcement Learning in Financial Trade

Supervised models can be used to predict future sales, as well as stock prices. However, these models cannot determine what actions should be taken at a specific stock price. Reinforcement learning (RL) is precisely for this problem. The RL model is evaluated through market benchmark standards to ensure that the RL agent makes the correct decision to hold, buy or sell to ensure the best return.

Through reinforcement learning, analysts no longer make every decision in financial trade as before, and truly realize automatic decisions by machines. For example, a powerful reinforcement learning platform for financial transactions is constructed, which adjusts the reward function according to the loss or profit of each financial transaction.

Application of Reinforcement Learning in Natural Language Processing NLP

RL can be used for NLP tasks such as text summarization, question answering, and machine translation.

Eunsol Choi, Daniel Hewlett and Jakob Uszkoret proposed a long text question answering method based on RL in their paper. Specifically, first select a few sentences related to the question from the document, and then combine the selected sentence and the question sentence to generate an answer through RNN.

Image for post
Image for post

The paper combines supervised learning and reinforcement learning to generate abstract text summaries. The authors of the paper Romain Paulus, Caiming Xiong and Richard Socher hope to solve the problems faced by the attention-based RNN codec model in abstract generation. The paper proposes a new internal attention neural network, through which attention can focus on input and continuously generate output. Supervised learning and reinforcement learning are used for model training.

Image for post
Image for post

As for machine translation, researchers at the University of Colorado and the University of Maryland have proposed a reinforcement learning-based machine translation model that can learn to predict whether a word is credible and use RL to determine whether more information is needed to help translation.

Image for post
Image for post

Researchers from Stanford University, Ohio State University, and Microsoft Research Institute proposed Deep-RL, which can be used for dialogue generation tasks. Deep-RL uses two virtual agents to simulate dialogue and learns the modeling of future rewards in multiple rounds of dialogue. At the same time, it applies policy gradient methods to obtain higher rewards for high-quality dialogues, such as coherence, information richness and simplicity Wait.

Image for post
Image for post

Click this link to see more RL applications in NLP.

Application of Reinforcement Learning in Medical Care

In the field of healthcare, the RL system can only provide treatment strategies for patients. The system can use past experience to find the optimal strategy without the need for prior information such as mathematical models of biological systems, which makes the RL-based system more widely applicable.

RL-based dynamic treatment plans (DTRs) for healthcare include chronic diseases or intensive care, automated medical diagnosis, and other fields.

Image for post
Image for post

The input of DTRs is a set of clinical observation and evaluation data of the patient, and the output is the treatment plan for each stage. Through RL, DTRs can determine the best treatment plan for a patient at a specific time and realize time-dependent decision-making.

In healthcare, the RL method can also be used to improve long-term results based on the delayed effect of treatment.

For chronic diseases, the RL method can also be used to find and generate the best DTRs.

Through, you can study the application of RL in healthcare.

Application of Reinforcement Learning in Engineering

In the field of engineering, Facebook proposed an open source reinforcement learning platform — — , which uses reinforcement learning to optimize large-scale production systems. Within Facebook, Horizon is used to:

  • Personalized guide
  • Send more meaningful notifications to users
  • Optimize video stream quality

The main processes of Horizon include:

  • Simulated environment
  • Distributed data platform for data processing
  • Model training and output

A typical example is that reinforcement learning can optionally provide users with low-bit-rate or high-bit-rate videos based on the state of the video buffer and the estimation of other machine learning systems.

Horizon can also handle the following issues:

  • Large-scale deployment
  • Feature normalization
  • Distributed learning
  • Processing and services of ultra-large-scale data, such as data sets containing high-dimensional data and thousands of features.

In the field of news recommendation, user preferences are not static, and recommending news to users based on comments and (historical) preferences cannot be done once and for all. A system based on reinforcement learning can dynamically track reader feedback and update recommendations.

To build such a system, we need to obtain news characteristics, reader characteristics, context characteristics and news characteristics read by readers. Among them, news features include but are not limited to content, title, and publisher; reader features refer to the way readers interact with content, such as clicking and sharing; context features include news time and freshness. Then define the reward function according to user behavior and train the RL model.

The application of reinforcement learning in games

The application of RL in the game field has attracted much attention and has been extremely successful. The most typical is the well-known AlphaGoZero in previous years. Through reinforcement learning, AlphaGoZero can learn the game of Go from scratch and learn by itself. After 40 days of training, AlphaGoZero’s performance. The model only contains a neural network, and only black and white chess pieces are used as input features. Due to the singularity of the network, a simple tree search algorithm is used to evaluate position movement and sample movement without any expansion.

Real-time bidding-the application of reinforcement learning in advertising and marketing

A real-time bidding strategy based on multi-agent reinforcement learning is proposed. Cluster a large number of advertisers, and then assign a strategic bidding agent to each cluster to achieve bidding. At the same time, in order to balance the competition and cooperation between advertisers, the paper also proposed distributed collaborative multi-agent bidding (DCMAB).

In marketing, choosing the right target group can bring high returns, so accurate personal positioning is crucial. The thesis takes China’s largest e-commerce platform as the research object, showing that the above-mentioned multi-agent reinforcement learning is superior to the existing single-agent reinforcement learning methods.

Application of Reinforcement Learning in Robot Control

Through deep learning and reinforcement learning methods, it can be able to grasp various objects, even objects that have not appeared in training. Therefore, it can be used to manufacture products on assembly lines.

The above ideas are realized by combining large-scale distributed optimization and (a variant). Among them, QT-Opt supports continuous action space operations, which makes it possible to handle robot problems well. In practice, the model is trained offline first, and then deployed and fine-tuned on a real robot.

For the crawling task, Google AI spent 4 months, using 7 robots to run 800 robot hours.

Experiments show that in 700 experiments, the QT-Opt method has a 96% probability of successfully grabbing unfamiliar objects, while the previous method has only a 78% success rate.

Reinforcement learning is a very interesting field worthy of extensive research. The advancement of RL technology and its application in various fields of reality are bound to achieve greater success.

Written by

Digital Nomad

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store