Quant-finance coding agents
Build a Docker agent that solves quantitative finance coding tasks under pytest and financial-invariant checks.
- Verb
- solve
- Metric
- pass@1 / pass@3
- Gate
- pytest + invariants
Agenthon
NeurIPS 2026 Competition Track
Verifiable AI for quantitative finance.
A four-track competition testing whether AI agents can produce finance outputs that survive automated, leakage-controlled, cheat-resistant evaluation.
Core question
Agenthon extends the Alphathon program into its first NeurIPS edition. The competition keeps the four-track structure and hard finance setting, while adding sealed held-out data, automated leakage controls, reproducible reruns, and public leaderboards.
Every official submission runs offline in a sandboxed Docker container. It must pass integrity, schema, cutoff/resource, and domain-semantics gates before any leaderboard metric is computed.
Four tracks
Build a Docker agent that solves quantitative finance coding tasks under pytest and financial-invariant checks.
Forecast future panels using time-series data plus a time-stamped text corpus, then prove information uplift over text-blind baselines.
Submit an ABIDES-compatible simulator that is faster while preserving matching-engine semantics and market stylized facts.
Predict labels, values, or rankings over tabular entities with citations from a frozen evidence corpus and confidence intervals.
Protocol
The visible rule is simple. A submission is ranked only after it passes the same admissibility sequence used by every track.
A Docker image implements one stable CLI verb for its track.
g0-g3 verify integrity, schema, cutoff/resource rules, and domain semantics.
Only admissible runs receive a track metric and bootstrap confidence interval.
Public/private firewall
Each track has a public practice repo and a private sealed exam repo. Public repos include practice tasks, baselines, smoke scorers, and participant docs. Private repos hold held-out tasks, oracle keys, final scoring logic, and audit material.
Leaderboard
Each team row will show its total score first. Expanding a row reveals the per-track score breakdown used to compute or explain that total.
2026 phases
Upcoming
Public repos open, teams practice, and the validation leaderboard goes live.
Public repos open. Teams practice, iterate, and use the live validation leaderboard.
One submission per team is evaluated on sealed private-test units with a hidden leaderboard.
Organizers rerun top submissions with fresh seeds and review reproducibility.
Scientific outputs
Gate failures are labeled and aggregated to show where finance agents break down.
T2 isolates whether text and reasoning beat strong text-blind forecasting baselines.
T3 measures throughput only after semantic fidelity and stylized facts survive checks.
T4 requires evidence-backed predictions that do not cite future or unsupported facts.
Resources
Question Partners
Agenthon Questions and Tasks are of real relevance to hedge funds and asset managers, spanning alpha forecasting, portfolio optimization, computational statistics, machine learning, and AI. Questions are provided jointly by the SQA, CEWIT and our Question Partners.
See Questions 2025 for last year's questions and a feel for Agenthon priorities.
Our Supporters
A number of sponsorship tiers and opportunities are available, including monetary and awards sponsorship, event space, data, infrastructure, and compute.