AI Olympics Competition for IROS 2024 – The 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024) will be held October 14

Participating Teams

4 teams made it to the real-robot stage and will participate in the IROS competition:

Average-Reward Maximum Entropy Reinforcement Learning for Underactuated Double Pendulum Tasks, by Jean Seong Bjorn Choe, BumKyu Choi and Jong-kook Kim, School of Electrical Engineering, Korea University, Seoul, South Korea
Learning control of underactuated double pendulum with Model-Based Reinforcement Learning, by Niccolo Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres, University of Padova, Italy and Mitsubishi Electric Research Laboratories, Cambridge, USA
AI Olympics challenge with Evolutionary Soft Actor Critic, by Marco Cali, Alberto Sinigaglia, Niccolo Turcato, Ruggero Carli and Gian Antonio Susto, University of Padova and Human-Inspired Technology Research Center, University of Padova, Italy
Velocity-History-Based Soft Actor-Critic Tackling IROS’24 Competition “AI Olympics with RealAIGym”, by Erfan Aghadavoodi, Tim Lukas Faust, Habib Maraqten and Boris Belousov, Technical University of Darmstadt, Germany

Motivation

As artificial intelligence gains new capabilities, it becomes important to evaluate it on real-world tasks. While software such as ChatGPT has recently revolutionized certain areas of AI, athletic intelligence seems to still be elusive in the AI community. To have better robots in the future which can perform a wide variety of dynamic tasks in uncertain environments, the physical or athletic intelligence of robots must be improved. However, this is quite challenging. In particular, the fields of robotics and reinforcement learning (RL) lack standardized benchmarking tasks on real hardware. To facilitate reproducibility and stimulate algorithmic advancements, the 2nd AI Olympics competition is being proposed to be held at IROS 2024 in Abu Dhabi following the inaugural run at IJCAI 2023 in Macau (see here for a video summary), based on the RealAIGym project.

This time, the focus is not only on achieving the best scores but also on achieving robustness to randomized external disturbances during the execution (external torques shown in the figure on the right). So, prepare your controllers for dealing with nastier dynamics!”

The challenge will involve two stages: simulation and real-robot experiments where teams (and their agents) can compete to get the highest score to win some cool prizes! We invite people from all communities (AI/ML/RL, Optimal Control, Heuristics, etc.) to participate in this competition on a set of standardized dynamic tasks on well-known prototypical systems using standardized hardware.

The Challenge

For the challenge, we will use a canonical 2-link robot system with two different configurations. When the actuator in the shoulder joint isactive and the elbow is passive, it functions as a Pendubot. And when the shoulder actuator is passive and the elbow is active, it functions as an Acrobot (inspired by the acrobat athlete seen above). The challenge consists of the following task that has to be carried out first in simulation and then the best teams will be selected to carry out the experiments on real robots: Swing-up and Stabilize an Underactuated 2-link System Acrobot and/or Pendubot. The swing-up is carried out from an initial position which is the robot pointing straight down. The participating teams can decide to either work on the Acrobot swing-up or the Pendubot swing-up or both. For scoring and prizes, Acrobot and Pendubot will be treated as 2 separate tracks i.e. the Acrobot scores/papers will be compared only against other Acrobot teams. For each track, 2 teams will be selected from the simulation stage to participate in the real robot stage. One final winner will be selected for each track. The performance and robustness of the swing-up and stabilize controllers will be judged based on a custom scoring system. The final score is the average of the performance score and the robustness score for the acrobot/pendubot system. The final scores of the submissions will be added to the RealAIGym leaderboard.

The competition consists of two stages: simulation stage and real-robot stage. We provide a realistic simulation (with identified system parameters) along with several baseline controllers. The participants are asked to develop new or improve existing algorithms for this task which may exploit state of the art machine learning, optimal control and heuristic methods or any of their possible combinations. They should compete against the performance of the already available baseline controller for the benchmarking criteria set by the organizing committee. The algorithms will be evaluated on those criteria and the final score will be published on a leaderboard. The participants are encouraged to submit their contributions in the form of 2–4-page paper with a link to their GitHub code. The best performers will receive prizes and special recognition.

System Description

Our dual-purpose setup implements a double pendulum platform built using two quasi-direct drive actuators (QDDs). Due to high mechanical transparency offered by QDDs, one of the actuators can be used as a passive encoder. When the shoulder motor is passive and elbow motor is active, the system is an acrobot and when the shoulder is active and elbow is passive, the system is a pendubot.

The acrobot/pendubot is simulated with a Runge-Kutta 4 integrator with a timestep of $dt=0.002s$ for $T=10s$ .

Detailed system description can be found in the related article and its supplementary material.

Task

Perform a swing up and upright stabilization using a single actuator either in acrobot or pendubot configuration. The initial configuration is $\mathbf{x}_{0}=(0.0,0.0,0.0,0.0)$ (hanging down) and the goal is the unstable fixpoint at the upright configuration $\mathbf{x}_{g}=(\pi,0.0,0.0,0.0)$ . The following rules apply:

Each attempt must not exceed a total time duration of 60s (swing up + stabilization).
Friction compensation on both joints is allowed in both pendubot and acrobot.
The authors are free to choose a friction compensation model of their choice but the utilized torque on the passive joint must not exceed 0.5 Nm.
The controller must inherit from the AbstractController class provided in the repository.
The following hardware restrictions must be respected by the controller:
- Control loop frequency: 500 Hz maximum
- Torque limit: 6 Nm
- Velocity limit: 15 rad/s
- Position limits: ± 360 degrees for both joints

The upright position is considered to be reached for the performance score when above the threshold line. Note that the pendulum has to also stay in the goal region.

Performance Score

The performance score compares the performance of the controllers in simulation and on the real hardware.

For the evaluation, multiple criteria are evaluated and weighted to calculate an overall score (Real AI Score). The criteria are:

Swingup Success $c_{success}$ : Whether the swing-up was successful, i.e. if the end-effector is above the threshold line at the end of the simulation.
Swingup time $c_{time}$ : The time it takes for the acrobot to reach the goal region above the threshold line and stay there. If the end-effector enters the goal region but falls below the line before the simulation time is over the swing-up is not considered successful! The swing-up time is the time when the end-effector enters the goal region and does not leave the region until the end.
Energy $c_{energy}$ : The mechanical energy used during the execution.
Torque Cost $c_{\tau, cost}$ : A quadratic cost on the used torques. ( $c_{\tau, cost} = \sum \tau^T R \tau$ with $R=1$ )
Torque Smoothness $c_{\tau, smooth}$ : The standard deviation of the changes in the torque signal.
Velocity Cost $c_{vel, cost}$ : A quadratic cost on the joint velocities ( $\dot{\mathbf{q}}$ ) that were reached during the execution.( $c_{vel} = \sum \mathbf{\dot{q}}^T \mathbf{Q} \mathbf{\dot{q}}$ with $\mathbf{Q}=$ identity)

These criteria are used to calculate the overall Real AI Score with the formula:

S=c_{succ} \left( 1-\frac{1}{N}\sum_{i=0}^N tanh(\pi \frac{x_i}{k_i}) \right)

Where $x_i$ are the criteria, $k_i$ are scaling constants, $N=5$ the number of criteria and $c_{succ}$ is the success rate out of 1 trial in simulation and 10 trials on the real hardware.

The performance leaderboards for the acrobot and pendubot systems can be found here

Robustness Score

The robustness leaderboard compares the performance of different control methods by perturbing the simulation e.g. with noise or delay. The task for the controller is to swing-up and balance the acrobot/pendubot even with these perturbations. For the evaluation, multiple criteria are evaluated and weighted to calculate an overall score (Real AI Score). The criteria are:

Model inaccuracies $c_{model}$ : The model parameters, that have been determined with system identification, will never be perfectly accurate. To assess inaccuracies in these parameters, we vary the independent model parameters one at a time in the simulator while using the original model parameters in the controller.
Measurement noise $c_{vel, noise}$ : The controllers’ outputs depend on the measured system state. In the case of the QDDs, the online velocity measurements are noisy. Hence, it is important for the transferability that a controller can handle at least this amount of noise in the measured data. The controllers are tested with and without a low-pass noise filter.
Torque noise $c_{\tau, noise}$ : Not only the measurements are noisy, but also the torque that the controller outputs is not always exactly the desired value.
Torque response $c_{\tau, response}$ : The requested torque of the controller will in general not be constant but change during the execution. The motor, however, is sometimes not able to react immediately to large torque changes and will instead overshoot or undershoot the desired value. This behavior is modeled by applying the torque $\tau = \tau_{t-1} + k_{resp} (\tau_{des} – \tau_{t-1})$ instead of the desired torque $\tau_{des}$ . Here, $\tau_{t-1}$ is the applied motor torque from the last time step and $k_{resp}$ is the factor that scales the responsiveness. $k_{resp}=1$ means the torque response is perfect while $k_{resp}\neq 1$ means the motor is over/undershooting the desired torque.
Time delay $c_{delay}$ : When operating on a real system there will always be time delays due to communication and reaction times.
Perturbances $c_{pert}$ : During the motion the random torque perturbations are applied on the joints.

For each criterion, the quantities are varied in $21$ steps (for the model inaccuracies for each independent model parameter) and the score is the percentage of successful swings. For the perturbances 50 different perturbance profiles are generated and the score is the successfull percentage of swing-ups with the perturbances applied.

These criteria are used to calculate the overall Real AI Score with the formula:

S =  \frac{1}{6}(c_{model} +  c_{vel, noise} +  c_{\tau, noise} +  c_{\tau, response} + c_{delay} + c_{pert} )

The robustness leaderboards for the acrobot and pendubot systems can be found here.

The Challenge (OLD)

The challenge consists of the following task that has to be carried out first in simulation and then the 4 best teams will be selected to carry out the experiments on real robots: Swing-up and Stabilize an Underactuated 2-link System Acrobot and/or Pendubot. The swing-up is carried out from an initial position which is the robot pointing straight down. The participating teams can decide to either work on the Acrobot swing-up or the Pendubot swing-up or both. For scoring and prizes, Acrobot and Pendubot will be treated as 2 separate tracks i.e. the Acorbot scores/papers will be compared only against other Acrobot teams. For each track, 2 teams will be selected from the simulation stage to participate in the real robot stage. One final winner will be selected for each track.

The performance and robustness of the swing-up and stabilize controllers will be judged based on a custom scoring system. The final score is the average of the performance score and the robustness score for the acrobot/pendubot system. The final scores of the submissions will be added to the RealAIGym leaderboard.

The acrobot/pendubot is simulated with a Runge-Kutta 4 integrator with a timestep of $dt=0.002s$ for $T=10s$ . The initial configuration is $\mathbf{x}_{0}=(0.0,0.0,0.0,0.0)$ (hanging down) and the goal is the unstable fixpoint at the upright configuration $\mathbf{x}_{g}=(\pi,0.0,0.0,0.0)$ . The upright position is considered to be reached for performance score when above the threshold line and for the robustness score when the distance in the state coordinates are below $\mathbf{\epsilon} = (0.1, 0.1, 0.5, 0.5)$ .

Protocol

The two stages of the challenge are as follows:

Simulation Stage

For the simulation stage of the competition, we use the following repository from the RealAIGym Project: Double Pendulum (https://github.com/dfki-ric-underactuated-lab/double_pendulum). The documentation of the project for installation, double pendulum dynamics, repository structure, hardware, and controllers can be found here (https://dfki-ric-underactuated-lab.github.io/double_pendulum/index.html). Please follow the installation instructions to start developing your controllers.

You have to develop a new controller for the given simulator (plant). The controller can then be tested for the leaderboard using the instructions given for the Acrobot here: Robustness Scoring, and Performance Scoring. Similar Pendubot scoring scripts are available here (performance) and here (robustness).

To develop a new controller, you can use any of the many many examples given in the repo. A good starting point would be to look at the controllers given here. Your controller must inherit from the AbstractController class provided in the repository. See here for the documentation on how to write your controller using the AbstractController class.

The double pendulum model parameters for the simulation stage are those from ‘design_C.1/model_1.1‘.

Once you’ve developed a new controller and are happy with the results, please follow the following submission guidelines:

Create a fork of the repository.
Add a Dockerfile to your forked repository that includes all the custom libraries you’ve installed/used that are not part of the double pendulum dependencies. This allows us to use the Dockerfile to recreate your environment with the correct libraries to run the submitted controller. For a tutorial on how to make a Dockerfile, we can recommend the official Docker website.
Add your developed controllers to the forked repository. Important: Do not change the plant/dynamics/integrator (This may result in an outright disqualification of the team)!! Remember to use the AbstractController class.
Submit the URL of the fork along with a 2-4 page paper about the method developed and the results to Dennis.Mronga@dfki.de with [AI Olympics] in the email subject. Please follow the following guidelines for the paper:
- Page Limit: 2-4 Pages including references
- Include the standard plots for position, velocity, and torque with respect to time in the paper. For an example, see timeseries.png here. These plots are generated after simulation if you use the provided function plot_timeseries(T, X, U).
- Include the tables for performance and robustness metrics against the baseline controllers made available on the RealAIGym leaderboards.
- Include the robustness bar chart as generated here.
- Use the following template: IEEEConfig.zip

The submitted code and papers will be reviewed and the leaderboard benchmarks will be re-run by us to compute the final scores. The scores as well as the paper reviews will be used to determine the best 4 teams which will carry out the experiments using their controllers on the real systems at IROS 2024 AI Olympics!

The results are in! The following teams are selected from the Simulation Stage to go on to the Real-Robot Stage:

Average-Reward Maximum Entropy Reinforcement Learning for Underactuated Double Pendulum Tasks by Jean Seong Bjorn Choe, BumKyu Choi and Jong-kook Kim
Learning control of underactuated double pendulum with Model-Based Reinforcement Learning by Niccolo Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres
AI Olympics challenge with Evolutionary Soft Actor Critic by Marco Cali, Alberto Sinigaglia, Niccolo Turcato, Ruggero Carli and Gian Antonio Susto
Artificial Intelligence Olympics with State History-Based Soft Actor-Critic by Erfan Aghadavoodi, Tim Lukas Faust, Habib Maraqten and Boris Belousov

Real-Robot Stage

We’ve created the following protocol for the remote hardware experiments for the Real-Robot stage of the competition.

Protocol for Scheduling Experiment Slots:

The scheduling will be handled by a common Google calendar sent to the teams. The calendar can be found here
Each team is allotted a total of 15 hours for experiments. They can create 1-3 hour slots in the shared calendar and invite the following organizers for the meeting slot: Felix Wiebe (felix.wiebe@dfki.de), and Dennis Mronga (dennis.mronga@dfki.de). Once any one of the organizers confirms the meeting, the experiment slot is confirmed.
From the provided 15 hours maximum time, the last 2 hours are reserved for the final test where the controllers will be evaluated for the hardware leaderboard.
At the start of the slot, a Microsoft Teams meeting will be started for the live stream along with Q&A for debugging.
After the end of the slot, teams will be provided up to 1 hour extra for copying the data back to their computers.

Protocol For Running Experiments in the given Slot:

The Double Pendubum Acrobot/Pedubot is prepared at DFKI RIC, Bremen such that the teams can access the robot via a local control PC running Ubuntu.
The experiments on the real robot will be carried out remotely using VPN+SSH.
A video stream via Microsoft Teams call and video file post-experiment runs will be provided.
First, a VPN must be connected to enter the private network setup for the experiments. For this, each team will be provided with a VPN config file.
We use/support the wireguard VPN on Ubuntu. For installing the VPN, the teams have to install the following packages via apt: wireguard-tools, wireguard, and resolvconf. This can be done via the command: sudo apt-get install wireguard-tools wireguard resolvconf
After installing, you can go to the folder containing the provided VPN config file and run the following to start the VPN: wg-quick up wg-client.conf (Hint: Sometimes one has to provide the full path of wg-client.conf)
To exit the VPN, run: wg-quick down wg-client.conf (Hint: Sometimes one has to provide the full path of wg-client.conf)
Once you are within the VPN, you can SSH to the control computer whose IP address will be provided at the start of each experiment session.
For SSH, a username and password will be provided to each team. For SSH, the following command can be used: ssh <username>@<IP Address>. (Hint: ssh –Y <username>@<IP Address> can be used to view the plots after experiments without copying the data. This can sometimes cause issues though.)
Once in the control PC via SSH, teams can execute scripts remotely and copy data in/out from the PC. The data can be foundTools such as scp/git are suggested to be used for transferring code/data. (Hint: A tutorial on scp to copy data: https://linuxize.com/post/how-to-use-scp-command-to-securely-transfer-files/)
The double pendulum repo library along with motor drivers are installed on the control PC at the home folder. Hence, they should be available for each individual team.

Some rules and information for the hardware experiments regarding experiment duration and safety limits:

Each attempt must not exceed a total time duration of 10 seconds (swing-up + stabilization)
Friction compensation on both joints is allowed in both pendubot and acrobot configurations. The teams are free to choose a friction compensation model of their choice but the utilized torque on the passive joint must not exceed 0.5 Nm.
The controller must inherit from the AbstractController class provided in the project repository.
The following hardware restriction must be respected by the controller:
- Control Loop Frequency: 500Hz Max. Usually around 400Hz.
- Torque Limit: 6Nm
- Velocity Limit: 20 rad/s
- Position Limits: +- 720 degrees for both joints
When the motors exceed these limits, the controller is (usually) automatically switched off and a damper is applied to bring the system to zero velocity. Once zero velocity is achieved, experiments can start again.
When the motors are initially enabled, they set the “zero position”. This happens every time they are enabled.
For the hardware experiments, the Acrobot Pendubot system parameters are the same but different from the ones in the simulation. We have done the basic system identification and the teams can re-train their controllers using the following system parameters for the hardware: https://github.com/dfki-ric-underactuated-lab/double_pendulum/blob/main/data/system_identification/identified_parameters/design_C.1/model_1.0/model_parameters.yml
A person will be watching the experiments and will have access to an Emergency Stop.

Simulation Stage

Once you’ve developed a new controller and are happy with the results, please follow the following submission guidelines:

Create a fork of the repository.
Add a Dockerfile to your forked repository that includes all the custom libraries you’ve installed/used that are not part of the double pendulum dependencies. This allows us to use the Dockerfile to recreate your environment with the correct libraries to run the submitted controller. For a tutorial on how to make a Dockerfile, we can recommend the official Docker website.
Add your developed controllers to the forked repository. Important: Do not change the plant/dynamics/integrator (This may result in an outright disqualification of the team)!! Remember to use the AbstractController class.
Submit the URL of the fork along with a 2-4 page paper about the method developed and the results to ijcai-23@dfki.de with [AI Olympics] in the email subject. Please follow the following guidelines for the paper:
- Page Limit: 2-4 Pages including references
- Include the standard plots for position, velocity, and torque with respect to time in the paper. For an example, see timeseries.png here. These plots are generated after simulation if you use the provided function plot_timeseries(T, X, U).
- Include the tables for performance and robustness metrics against the baseline controllers made available on the RealAIGym leaderboards.
- Include the robustness bar chart as generated here.
- Use the following template: IJCAI 2023 Formatting Guidelines.

Real-Robot Stage

We’ve created the following protocol for the remote hardware experiments for the Real-Robot stage of the competition.

Protocol for Scheduling Experiment Slots:

The scheduling will be handled by a common Google calendar sent to the teams. The calendar is available to the public to as well and can be seen here: https://calendar.google.com/calendar/u/1?cid=NGQxMjg0NmE3MGFlNzQ5YmU1YWE1NWI0NTM3OTI1NDViYzZiMDQ5NmMxMjY3ZDMyZTc3MGY3MTBiZWMzMTFlMEBncm91cC5jYWxlbmRhci5nb29nbGUuY29t
Each team is allotted a total of 20 hours for experiments. They can create 1-3 hour slots in the shared calendar and invite the following organizers for the meeting slot: Shivesh Kumar, Felix Wiebe, and Shubham Vyas. Once any one of the organizers confirms the meeting, the experiment slot is confirmed.
From the provided 20 hours maximum time, the last 2 hours are reserved for the final test where the controllers will be evaluated for the hardware leaderboard.
At the start of the slot, a Microsoft Teams meeting will be started for the live stream along with Q&A for debugging.
After the end of the slot, teams will be provided up to 1 hour extra for copying the data back to their computers.

Protocol For Running Experiments in the given Slot:

The Double Pendubum Acrobot/Pedubot is prepared at DFKI RIC, Bremen such that the teams can access the robot via a local control PC running Ubuntu.
The experiments on the real robot will be carried out remotely using VPN+SSH.
A video stream via Microsoft Teams call and video file post-experiment runs will be provided.
First, a VPN must be connected to enter the private network setup for the experiments. For this, each team will be provided with a VPN config file.
We use/support the wireguard VPN on Ubuntu. For installing the VPN, the teams have to install the following packages via apt: wireguard-tools, wireguard, and resolvconf. This can be done via the command: sudo apt-get install wireguard-tools wireguard resolvconf
After installing, you can go to the folder containing the provided VPN config file and run the following to start the VPN: wg-quick up wg-client.conf (Hint: Sometimes one has to provide the full path of wg-client.conf)
To exit the VPN, run: wg-quick down wg-client.conf (Hint: Sometimes one has to provide the full path of wg-client.conf)
Once you are within the VPN, you can SSH to the control computer whose IP address will be provided at the start of each experiment session.
For SSH, a username and password will be provided to each team. For SSH, the following command can be used: ssh <username>@<IP Address>. (Hint: ssh –Y <username>@<IP Address> can be used to view the plots after experiments without copying the data. This can sometimes cause issues though.)
Once in the control PC via SSH, teams can execute scripts remotely and copy data in/out from the PC. The data can be foundTools such as scp/git are suggested to be used for transferring code/data. (Hint: A tutorial on scp to copy data: https://linuxize.com/post/how-to-use-scp-command-to-securely-transfer-files/)
The double pendulum repo library along with motor drivers are installed on the control PC at the root. Hence, they should be available for all teams/users.

Some rules and information for the hardware experiments regarding experiment duration and safety limits:

Each attempt must not exceed a total time duration of 60 seconds (swing-up + stabilization)
Friction compensation on both joints is allowed in both pendubot and acrobot configurations. The teams are free to choose a friction compensation model of their choice but the utilized torque on the passive joint must not exceed 0.5 Nm.
The controller must inherit from the AbstractController class provided in the project repository.
The following hardware restriction must be respected by the controller:
Control Loop Frequency: 500Hz Max. Usually around 400Hz.
Torque Limit: 6Nm
Velocity Limit: 20 rad/s
Position Limits: +- 360 degrees for both joints
When the motors exceed these limits, the controller is (usually) automatically switched off and a damper is applied to bring the system to zero velocity. Once zero velocity is achieved, experiments can start again.
When the motors are initially enabled, they set the “zero position”. This happens every time they are enabled.
For the hardware experiments, the Acrobot Pendubot system parameters are the same but different from the ones in the simulation. We have done the basic system identification and the teams can re-train their controllers using the following system parameters for the hardware: https://github.com/dfki-ric-underactuated-lab/double_pendulum/blob/main/data/system_identification/identified_parameters/design_C.1/model_1.0/model_parameters.yml
A person will be watching the experiments and will have access to an Emergency Stop.

Schedule/Important Dates

The on-site competition will require 1 full day: half-day for teams to setup, and half-day open to public. Prior to the IROS, the teams will have both access to the simulation and to the real system via ssh, therefore they will already come with developed controllers which will be showcased at IROS.

Additionally, we would organize a workshop where we will invite all participating teams to present their algorithms and discuss the solutions.

Important Dates

Competition Day Schedule

Important Dates