Ernest

Ernest is an environmentally agnostic and intrinsically motivated agent, that learns autonomously through experience. This page demonstrates the principles of such a learning procedure proposed by Olivier Georgeon , through a NetLogo implementation of version 8 of Ernest.

This page was partially generated by NetLogo 4.1.

The applet requires Java 5 or higher. Java must be enabled in your browser settings. Mac users must have Mac OS X 10.4 or higher. Windows and Linux users may obtain the latest Java from Sun's Java site.

view/download model file: Ernest_V6b.nlogo

WHAT IS IT?

This model demonstrates an agent that is environmentally agnostic and intrinsically motivated. Environmentally agnostic means that the agent's decisional process was implemented with no preconception of the environment. There is not even any parameter specifying that the agent operates in a two dimensional space.Intrinsically motivated means that the agent's decisional process was implemented with no preconception of a specific task or a specific strategy to perform. Instead, the agent follows intrinsic drives.

The nuance between a predefined task and intrinsic drives is subtle. This models seeks to clarify this nuance.

HOW IT WORKS

The agent has predefined possibilities of interaction with the environment.

On each step, the agent can choose between three primitive actions: move one step forward, turn 90 degrees to the right, or turn 90 degrees to the left.

Primitive actions result in primitive feedback: bump into a wall, target appears in visual field, target enlarges in visual field (got closer), target disappears from visual field (each eye has a 90 degree visual span).

Primitive actions associated with primitive feedback are called primitive interactions. Primitive interactions have values that you can predefine.

The agent's algorithm tries to perfom primitive interactions with high values and to avoid primitive interactions with negative values. You can, therefore, understand these values as the agent's likeness (positive values) or dislike (negative values) of each primitive interaction.

In fact, the model provides an ouput window that shows the interactions enacted by the agent and the satisfaction it received by performing these in the environment.

HOW TO USE IT

Click Run to run the agent.
Click anywhere on the grid to insert a target.
Click Re-initialize to reset the agent's memory and restart the learning process from scratch.
You can use the cursors to specify the value of each aspect of interactions:
- Step: the value of moving one step forward.
- Bump: the value of bumping into a wall.
- Turn: the value of turning 90 degrees (left or right).
- Appear: the value of a target appearing in an eye's visual field.
- Closer: the value of a target enlarging in an eye's visual field.
- Disappear: the value of all targets disappearing from an eye's visual field.
The values of step, bump, turn are summed with the values associated with each eye to form the total primitive interaction's value. After setting the above values, click Re-initialize to apply the new values and clear and restart the learning process from scratch.
Click Reset Values to Default to reset the values to default, indeed.
Batch Experiment Area:This set of controls allows to run the agent multiple times. Each run leaves a trail behind to see behavioral patterns. Timeout parameter controls the maximum time (in ticks) the agent has to get to the target (This makes the agent get bored when it is cought in an infinite loop). Note that it does not reinitializes the agent, so you can train the agent and then see how it behaves.

THINGS TO NOTICE

Notice how the values of interactions impact the agent's behavior:

Give positive values for stepping and for bumping, and all other values negative (click Re-initialize to apply your changes). Notice that the agent will move forward and keep bumping a wall. Of course, he loves that!
Give positive values for turning, and all other values negative. Notice that the agent will keep spinning in place. Now, he loves that!

Notice that the agent is deterministic:

Use the Place Agent and Targets button to define a specific initial configuration. Click the Re-initialize button. Click Run. Repeat these operations with the same initial configuration. Notice that the agent behaves the same in each run (until you introduce variation by inserting targets in different places or at different times).

Notice that the agent learns different strategies depending only on its early experience:

Click the Diagonal Behavior button. Click Run. After the agent ate the first target, introduce new targets by clicking on the grid. Notice that the agent has learned a behavior consisting of reaching the target in a diagonal stair-step progression.
Click the Tangential Behavior button. Click Run. After the agent ate the first target, introduce new targets by clicking on the grid. Notice that the agent has learned another behavior that we call the tangential strategy. The tangential strategy results from that the targets disappears from the visual field only when the agent has gone one step too far. The agent thus needs to go back one step to align itself with the target.
(Notice that the Diagonal Behavior button and the Tangential Behavior button reset the values to default. This shows that "twin" agents (agents with identical inborn parameters) that have different experiences during their "youth" will behave differently when they are "grown up". Also, this demonstrates that the strategies are not pre-encoded in the agent but learned).

THINGS TO TRY

Try to generate new behaviors by giving different values to interactions. For example, try to make an agent that moves away from targets.

EXTENDING THE MODEL

Define other interaction, for example: turning 45degrees and possibly moving in diagonal.
Define different kinds of targets and make an agent that alternatively seeks targets of each kind.
Introduce several agents.

NETLOGO FEATURES

This model implements the IMOS extension (Intrinsic Motivation System).

RELATED MODELS

There is no related NetLogo model that we would be aware of.

CREDITS AND REFERENCES

This model was implemented by Ilias Sakellariou (University of Macedonia, Greece), using the IMOS NetLogo extension implemented by Olivier Georgeon (Universite de Lyon / CNRS, France).

PROCEDURES

The code is available in Ernest_V6b.nlogo. Any questions regarding the model or the IMOS extension can be sent to Olivier Georgeon (olivier.georgeon at gmail.com) or Ilias Sakellariou (iliass at uom.gr).