AI Olympics Competition for ICRA 2025 – 3rd AI Olympics with RealAIGym at ICRA 2025. As artificial intelligence gains new capabilities, it becomes important to evaluate it on real-world tasks.

Benchmarking Global Swing-Up Policies on CloudPendulum Hardware

The Challenges

As artificial intelligence gains new capabilities, it becomes important to evaluate it on real-world tasks. While software such as ChatGPT has recently revolutionized certain areas of AI, athletic intelligence seems to still be elusive in the AI community. To have better robots in the future which can perform a wide variety of dynamic tasks in uncertain environments, the physical or athletic intelligence of robots must be improved. However, this is quite challenging. In particular, the fields of robotics and reinforcement learning (RL) lack standardized benchmarking tasks on real hardware. To facilitate reproducibility and stimulate algorithmic advancements, the 4th AI Olympics competition will be held at IJCAI 2026 in Bremen.

For the challenge, we will use a canonical 2-link robot system with two different configurations. When the actuator in the shoulder joint is active and the elbow is passive, it functions as a Pendubot. And when the shoulder actuator is passive and the elbow is active, it functions as an Acrobot (inspired by an acrobat athlete). The challenge consists of the following task: Swing-up and Stabilize an Underactuated 2-link System Acrobot and/or Pendubot in its upright position. While the previous three editions of the competition involved a simulation and a real-robot stage, at IJCAI 2026, the teams, will have the opportunity to develop their swing-up policies directly on real hardware, hosted on the CloudPendulum, a platform for remote  experimentation and standardized benchmarking of control and machine learning algorithms. 

The participants are asked to develop new or improve existing algorithms from the state of the art in model-based control, learning-based control, or their combinations. In contrast to the previous AI Olympics competitions, which involved a simulation and a real-robot stage, at IJCAI 2026, the teams train their policies on real hardware from the start. They have limited time access and no model information beforehand; they will only receive state and torque information from the remote hardware. In the qualification stage, we will evaluate the team’s approaches regarding their effectiveness in achieving a successful global swing-up policy from scratch by remotely connecting to the cloud pendulum. The best performing teams will be invited to the competition at IJCAI, where they will be confronted with a different double pendulum system (in terms mass-inertial parameters). Thus, their approaches must adapt to the new hardware as quickly as possible! Note that random torque and state dependent disturbances will be applied during the evaluation to quantify their robustness. The algorithms will be evaluated on pre-defined criteria, and the final score will be published on a leaderboard. The participants are encouraged to submit their contributions in the form of 2–4-page paper with a link to their GitHub code. The best performers will receive prizes and special recognition. 

See below for the detailed rules of the competition.

System Description

We provide multiple dual-purpose double pendulum platforms built using two direct drive actuators, using a cloud-based robotics platform called CloudPendulum. Due to the high mechanical transparency offered by direct drives, one of the actuators can be used as a passive encoder. When the shoulder motor is passive and elbow motor is active, the system is an acrobot and when the shoulder is active and elbow is passive, the system is a pendubot. The platform provides a simulation environment, allows just-in-time remote access, parallel training on multiple hardware systems, time access restriction, and automatic evaluation all with the comfort of your web-browser.

Task

The task is to perform a swing up and upright stabilization using a single actuator either in acrobot or pendubot configuration. The participants will connect remotely to the cloud pendulum. The following rules apply:

Each swing-up attempt must not exceed a total time duration of 60s (swing up + stabilization).

The teams are encouraged to practice in simulation using the double pendulum plant. However, no model parameters of the system will be provided to the participants, the participants only receive sensor feedback (position, velocity, torque) from both actuators of the double pendulum. The teams may do their own system identification or devise their strategy for the sim2real transfer.

In contrast to previous competitions, friction compensation on the passive joint is NOT allowed during the evaluation. This significantly increases the challenge of training a successful policy. You are allowed to use friction compensation during training or controller development. If you want to use friction compensation, you should define experiment_type = “DoublePendulum” in start_experiment() call as experiment_type = “Acrobot” will only allow you to send torques on the elbow joint and experiment_type = “Pendubot” will only allow you to send torques on the shoulder joint.

The controller design must be compatible with the CloudPendulum interface. Please check open-source examples in the above system description for inspiration.

The following hardware restrictions must be respected by the controller:

- Control loop frequency: 500 Hz maximum

- Torque limit: 0.15 Nm

- Velocity limit: 40 rad/s

- Position limits: +/- 360 degrees for both joints

To evaluate the robustness of the policy, the teams are recommended to use the disturbance API of the CloudPendulum which lets you inject torque or state disturbances on the server side. The teams are allowed to implement their own disturbance injection logic on the client side but for the final evaluation only server-side disturbance injection will be used for robustness evaluation. For the final code submission, please remove any client-side disturbance injection code.

The teams are welcome to use the gym interface of the CloudPendulum to collect experimental data from multiple hardware cells in parallel. Note that the default CP token does not give you access to the gym interface. However, the access can be granted on a case-to-case basis if necessary for your approach.

In the qualification stage, the participants are asked to achieve swing up and upright stabilization by learning a policy or tuning a controller on the real system from scratch. They have limited access time, which they can use for, e.g., training an RL agent directly on hardware from scratch or fine-tune a pre-trained RL policy in simulation with sim2real transfer on hardware or doing online system identification for model-based controller etc. and then achieve a swing up. For evaluation, we will measure the average up-time of the pendulum, i.e., the time the pendulum is stabilized in the upright position. To make the challenge even greater, we will apply randomized torque and state disturbances to the motors of the pendulum to also assess the robustness of the controller. The best performing teams will be invited to the second stage. In your reports, please report how much time you need for automatic tuning of your controller assuming you are evaluating on an unseen hardware and how long you can stay up given random state or torque disturbances.

The final evaluation will be done at the IJCAI conference. In the finals, the teams will be confronted with a different double pendulum, as we will change the hardware parameters in terms of mass-inertial distribution. Thus, the trained controllers will have to adapt to the new situation in the minimal amount of time and achieve the swing-up. Again, the performance score is measured as the total time the system stays in the topmost/swing-up position within the goal region.

Frequently Asked Questions (FAQ)

Do I need to contribute a new controller or policy to participate in the competition?

No, novelty is not considered as a criterion for entering the competition. Feel free to improve any of the existing model-based controllers or RL policies in the double pendulum code base.

Why combine controller tuning and evaluation under a single time metric for this competition? This seems to provide preferential treatment to model-based approaches.

We believe that the whole process can be divide into 3 main phases: controller development, tuning and evaluation. We do not count how much time you need to develop your control approach (the only upper limit being decided by the period of the qualification phase). However, with this competition we want to push the boundaries of automatic real-world transfer of learning and control methods to robot hardware. You are welcome to pre-train controllers within simulation and even with CP hardware provided to you in the qualification phase but for the final evaluation you will be evaluating on a previously unseen DP hardware and hence your policy or controller must adapt to the new conditions as fast as possible with no human intervention and start to swing up. The performance score is measured as the total time the system stays in the topmost/swing-up position within the goal region despite any external state or torque disturbances acting on your controller.

What do you mean exactly with changing the DP hardware for the final stage?

We may change properties such as link lengths, end effector mass for the double pendulum so that you can demonstrate that your controller adapts to the new hardware in least amount of time. The motor properties (rotor inertia, friction etc.) will remain the same.

Where can I find references for the previous iterations of this competition?

We highly recommend you go through the results of the previous iterations of this competition.

Felix Wiebe, Niccolò Turcato, Alberto Dalla Libera, Chi Zhang, Theo Vincent, Shubham Vyas, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres, Akhil Sathuluri, Markus Zimmermann, Boris Belousov, Jan Peters, Frank Kirchner, and Shivesh Kumar. 2024. Reinforcement learning for athletic intelligence: lessons from the 1st “AI olympics with RealAIGym” competition. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI ’24). Article 1043, 8833–8837. doi: 10.24963/ijcai.2024/1043

Felix Wiebe, NiccolòTurcato, Alberto Dalla Libera, Jean Seong Bjorn Choe, Bumkyu Choi, Tim Lukas Faust, Habib Maraqten, Erfan Aghadavoodi, Marco Cali, Alberto Sinigaglia, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres, Jong-kook Kim, Gian Antonio Susto, Shubham Vyas, Dennis Mronga, Boris Belousov, Jan Peters, Frank Kirchner, and Kumar, Shivesh. 2025. Reinforcement Learning for Robust Athletic Intelligence: Lessons Learned From the Second AI Olympics With RealAIGym Competition. IEEE Robotics & Automation Magazine, pages 2–12. doi:10.1109/MRA.2025.3631571

Franek Stark, Felix Wiebe, Blanka Burchard, Jean Seong Bjorn Choe,Bumkyu Choi, Jong-Kook Kim, Niccolò Turcato, Marco Calì, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres, Igor Alentev, Ivan Domrachev, Lev Kozlov, Jonathan Frey, Armin Nurkanović, Shubham Vyas, Dennis Mronga, Frank Kirchner, Shivesh Kumar. 2025. Towards Global Swing-Up Policies for the Underactuated Double Pendulum: Results from the 3rd AI Olympics with RealAIGym Competition. In: IEEE Control Systems Magazine (under review).

Performance Score

The performance score is measured as the total time the system stays in the topmost/swing-up position within the goal region during the 300s trial. The total time includes the time for automatic controller tuning/sim2real transfer as well as achieving the swing-up. Note that once the swing-up is achieved the system will be disturbed through torque or state-based disturbances to evaluate its robustness.

Protocol

Please follow the following procedure for qualification and final stage of the competition.

Qualification Stage

To get started with CloudPendulum (CP), first register for the competition using the link here.
Signup for a user account on CloudPendulum using the sign-up link here. In affiliation field, please write “IJCAI 2026”. For detailed sign-up restrictions, please go through the tutorial here. To have fair allocation of CP server resources, only one user account will be granted per participating team which should correspond to the details of the primary contact. Any sign-up requests that do not follow this protocol may be rejected.
Please follow the onboarding instructions and tutorials here to understand the working of our platform. It is important that you understand user privileges – how to request new tokens for more experiments and request extra compute resources (CPU, RAM etc.) if necessary for your method. All token/compute extension requests should be well explained, and the admin has the right to refuse requests that are not well justified.
Please also go through all the tutorials and FAQs on the onboarding website. It is good to start with the simple pendulum system before moving to the double pendulum.
To get started with your own code based for double pendulum, here are some good starting points:
- Forward and Inverse Dynamics of a Double Pendulum: https://github.com/cloudpendulum/dp_fwd_inv_dynamics
- System Identification: https://github.com/cloudpendulum/dp_sysid
- Acados-based MPC for Swing up: https://github.com/cloudpendulum/dp_acados_mpc
- Model-free RL for Swing up: https://github.com/cloudpendulum/dp_model_free_rl
- Note: For a plethora of methods (e.g. from previous rounds of this competition) already implemented for swing-up control on the DFKI version of the double pendulum, please refer to the repository here: https://github.com/dfki-ric-underactuated-lab/double_pendulum. These have not been ported yet to the CloudPendulum variant of the system.
Submit the URL of your Github code repo along with a 2–4-page paper about the method developed and the results to Dennis.Mronga@dfki.de with [AI Olympics with RealAIGym] in the email subject. Please follow the following guidelines for the paper:
- Page Limit: 2-4 Pages including references
- Include the standard plots for position, velocity, and torque with respect to time in the paper. For an example, see timeseries.png here. These plots are generated after simulation if you use the provided function plot_timeseries(T, X, U).
- Include the tables for performance metrics against the baseline controllers made available on the RealAIGym leaderboards.
- Use the following template: IEEEConfig.zip
The submitted code and papers will be reviewed, and the leaderboard benchmarks will be re-run by us to compute the final scores. The scores as well as the paper reviews will be used to determine the best 4 teams which will carry out the final experiments using their controllers IJCAI 2026 AI Olympics!

To cite CloudPendulum in your work, please use:

Kumar, S. (2025). Swinging pendulums on the cloud: Digitalization of simulation & experimental infrastructure for feedback-based active learning. Chalmers Konferens om undervisning och lärande (KUL2025).

To cite double pendulum in your work, please use:

Wiebe, S. Kumar, L. J. Shala, S. Vyas, M. Javadi and F. Kirchner, “Open Source Dual-Purpose Acrobot and Pendubot Platform: Benchmarking Control Algorithms for Underactuated Robotics,” in IEEE Robotics & Automation Magazine, vol. 31, no. 2, pp. 113-124, June 2024,doi: 10.1109/MRA.2023.3341257.

To cite RealAIGym, please use:

Felix Wiebe, Shubham Vyas, Lasse Jenning Maywald, Shivesh Kumar, and Frank Kirchner. RealAIGym: Education and Research Platform for Studying Athletic Intelligence. In Proceedings of Robotics Science and Systems Workshop Mind the Gap: Opportunities and Challenges in the Transition Between Research and Industry, New York, July 2022.

Final Stage

The exact protocol for the final stage of the competition at IJCAI 2026 will be announced 1 week before the conference dates.

Schedule/Important Dates

The on-site competition will require one full day. Prior to the IJCAI, the teams will have access to the real system via the cloud platform, therefore they will already come with trained controllers which will be tested on a new system at IJCAI. Additionally, there will be 0.5 day where we invite all participating teams to present their algorithms and discuss the solutions.

Important Dates

Competition Day Schedule

Important Dates

Competition Day Schedule