Open-ended learning in robotics

Presentation

Robotics represents a challenge for learning methods because it combines difficulties: large and continuous state and action spaces, scarce rewards, dynamic, open and partially observable world with noisy perceptions and actions. Their implementation is therefore delicate and requires a thorough analysis of the tasks to be performed, which reduces their potential for application. In the European DREAM project, we have defined the basis of a developmental approach that allows us to combine different methods to reduce these constraints and thus increase the adaptation capabilities of robots through learning.

Context

The design of robots requires anticipating all the conditions they may face and predicting the appropriate behaviour. An unforeseen situation can therefore cause a malfunction that may recur if the same conditions occur again. This lack of adaptation is a hindrance to many robotics applications, especially when they target an uncontrolled environment such as our daily environment (for companion robots, for example) or more generally for collaborative robots, i.e. those acting in contact with humans. Artificial learning methods could help to make robots more adaptive, if they can overcome the multiple difficulties linked to the robotics context. It is these specific difficulties that this project aims to address.

Objectives

The objective of the project is to help design robots interacting with an uncontrolled environment, on tasks for which the desired behaviour is partially known or even totally unknown.

In this context, learning allows the robot to explore its environment autonomously, in order to extract relevant sensory, sensory-motor or purely motor representations. For example, learning to recognise objects, identifying which ones are manipulable, learning to pick them up, push them, throw them, etc. In this context, exploring the vast sensory-motor space in a relevant way is central, especially as many interactions are rare (the probability of catching an object with a purely random movement is almost zero).

We are therefore interested in the construction of these representations and rely on a modular and iterative approach aiming at exploring the robot’s capabilities and deducing representations that will facilitate the resolution of the tasks that arise, either with planning or learning methods.

Results

The creation of state and action representations that can be used later requires first of all the generation of behaviours that are relevant to the robot’s capabilities. A behaviour is relevant if it highlights the robot’s ability to achieve a particular effect by interacting with its environment. Knowing that many of the robot’s movements do not create any effect, discovering the effects that the robot is likely to generate is difficult. This is compounded by the difficulty of exploring to learn behaviours without appropriate representations.

We therefore rely on exploration algorithms based on novelty search and Quality-Diversity algorithms to generate a large number of exploration behaviours and to deduce appropriate state and action spaces for further learning.

*Figure 1: The robot Baxter has learned a repertoire of joystick actions which it uses to learn to control a small wheeled robot.*

Partnerships ans collaborations

The European project DREAM, coordinated by Sorbonne University (FET H2020 2015-2018), launched this research theme in the laboratory (http://dream.isir.upmc.fr/).

– ENSTA-ParisTech, in France,

– Sorbonne University, in France,

– the University of Coruna, in Spain,

– the University of Edinburgh in the United Kingdom,

– the Vrije Universiteit Amsterdam in the Netherlands.

This was an academic project, with no industrial partner.

It is being pursued in several projects to apply this work to an industrial context. The adaptive learning capability is intended to help engineers in the design phase and in updating the behaviour of a robot. The European SoftManBot project (http://softmanbot.eu) aims at applications to the manipulation of deformable objects. It has a consortium of 11 partners, including SIGMA in Clermont-Ferrand, IIT in Genoa and companies such as Decathlon and Michelin. The VeriDREAM project, in collaboration with DLR, ENSTA-Paristech, Magazino GmbH, Synesis and GoodAI, aims to facilitate the transfer of these methods to a wider industrial context, including in particular small and medium-sized enterprises with a focus on the logistics and video game sectors.