Quant-finance coding agents
Build a Docker agent that solves quantitative finance coding tasks under pytest and financial-invariant checks.
- Verb
- solve
- Metric
- pass@1 / pass@3
- Gate
- pytest + invariants
Agenthon
NeurIPS 2026 Competition Track
Verifiable AI for quantitative finance.
A four-track competition testing whether AI agents can produce finance outputs that survive automated, leakage-controlled, cheat-resistant evaluation.
Core question
Agenthon extends the Alphathon program into its first NeurIPS edition. The competition keeps the four-track structure and hard finance setting, while adding sealed held-out data, automated leakage controls, reproducible reruns, and public leaderboards.
Every official submission runs offline in a sandboxed Docker container. It must pass integrity, schema, cutoff/resource, and domain-semantics gates before any leaderboard metric is computed.
Four tracks
Build a Docker agent that solves quantitative finance coding tasks under pytest and financial-invariant checks.
Forecast future panels using time-series data plus a time-stamped text corpus, then prove information uplift over text-blind baselines.
Submit an ABIDES-compatible simulator that is faster while preserving matching-engine semantics and market stylized facts.
Predict labels, values, or rankings over tabular entities with citations from a frozen evidence corpus and confidence intervals.
Protocol
The visible rule is simple. A submission is ranked only after it passes the same admissibility sequence used by every track.
A Docker image implements one stable CLI verb for its track.
g0-g3 verify integrity, schema, cutoff/resource rules, and domain semantics.
Only admissible runs receive a track metric and bootstrap confidence interval.
Public/private firewall
Each track has a public practice repo and a private sealed exam repo. Public repos include practice tasks, baselines, smoke scorers, and participant docs. Private repos hold held-out tasks, oracle keys, final scoring logic, and audit material.
Leaderboard
Each team row will show its total score first. Expanding a row reveals the per-track score breakdown used to compute or explain that total.
2026 phases
Upcoming
Public repos open, teams practice, and the validation leaderboard goes live.
Public repos open. Teams practice, iterate, and use the live validation leaderboard.
One submission per team is evaluated on sealed private-test units with a hidden leaderboard.
Organizers rerun top submissions with fresh seeds and review reproducibility.
Scientific outputs
Gate failures are labeled and aggregated to show where finance agents break down.
T2 isolates whether text and reasoning beat strong text-blind forecasting baselines.
T3 measures throughput only after semantic fidelity and stylized facts survive checks.
T4 requires evidence-backed predictions that do not cite future or unsupported facts.
Resources
Question Partners
Agenthon Questions and Tasks are of real relevance to hedge funds and asset managers, spanning alpha forecasting, portfolio optimization, computational statistics, machine learning, and AI. Questions are provided jointly by the SQA, CEWIT and our Question Partners.
See Questions 2025 for last year's questions and a feel for Agenthon priorities.
Our Supporters
A number of sponsorship tiers and opportunities are available, including monetary and awards sponsorship, event space, data, infrastructure, and compute.
Organizing Committee
Chief Investment Officer, Atlas Ridge Capital
Adjunct Professor, NYU Courant
Executive Advisory Board, Columbia Business School, Program for Financial Studies
Assistant Professor, Department of Applied Mathematics and Statistics, Stony Brook University
Vice President, Society of Quantitative Analysts
Head of Machine Learning Strategy, CTO Office
Bloomberg, Toronto, Canada
Head of Quant Technology Strategy, Office of the CTO
Bloomberg, New York, USA
Global Head of Capital Markets Strategy
NVIDIA Corporation, USA
T1 · Coding
Zhikang Dong
Track Lead T1
Independent Researcher
T2 · Forecasting
Ruolan Sun
Track Lead T2
Ph.D. Student, Stony Brook University
T3 · Simulation
Haohan Xu
Track Lead T3
Ph.D. Student, Stony Brook University
T4 · Tabular
Mathew Thiel
Track Lead T4
Quant Research Analyst, validityBase