RevengeBench

Reverse-engineering code space policies from behavioral experiments.

Given only behavioral traces of an opaque target agent acting in a game arena, can AI agents design experiments to reconstruct a runnable program that reproduces the target's behaviour?

Leaderboard

Distance reduction across the five arenas. Higher is better.

Rank Model Avg BattleSnake Halite HuskyBench RoboCode RobotRumble

Per-arena breakdown

Pipeline

Observation

The learner watches the hidden policy play against a diverse pool of opponents.

Interaction

The learner authors probe opponents to elicit targeted behaviour and disambiguate competing hypotheses.

Evaluation

The learner submits one runnable hypothesis, scored by how often it picks the same action as the hidden policy on held-out trajectories.

Arenas

Five code-based arenas from CodeClash, spanning four programming languages and a range of game mechanics.

Protocol

RevengeBench operationalises an inverse problem in code-space: given only behavioural traces of an opaque target agent in a programming-game arena, can a learner reconstruct a runnable program that reproduces its decisions? Because targets are themselves executable, hypotheses can be scored mechanically against ground truth, a property that behavioural inverse problems normally lack.

  • Targets: top 15 strongest policies by Elo per arena, extracted from CodeClash tournaments. 75 in total.
  • Opponent pool: 20 opponents sampled each round from the remaining pool.
  • Starter policy: arena-specific naive baseline that every learner edits from.
  • Protocol: closed loop of observation, intervention through probe opponents, hypothesis formulation, and evaluation. 5 rounds with persistent memory; best round reported for each model.
  • Probe budget: 5 probe opponents per round in the intervention regime.
  • Harness: mini-SWE-agent.
  • Metric: distance reduction $$\Delta = \frac{D_0 - D_R}{D_0}$$ where $D_0$ and $D_R$ are mean action distances of the starter policy $\hat{\pi}_0$ and the final hypothesis $\hat{\pi}_R$. Reporting $\Delta$ controls for differences in baseline difficulty across targets and arenas.

Findings

Team

*Equal contribution

Citation

If you found RevengeBench useful, please cite us as:

@article{revengebench_2026,
  title  = {RevengeBench: Reverse Engineering Code-Space Policies from Behavioral Experiments},
  author = {Babak Rahmani and Sebastian Dziadzio and Joschka Strüber and Sergio Hernández Gutiérrez and Matthias Bethge},
  year   = {2026},
}