SQA Agenthon
More Information Coming Soon

NeurIPS 2026 Competition Track

Agenthon 2026

Verifiable AI for quantitative finance.

A four-track competition testing whether AI agents can produce finance outputs that survive automated, leakage-controlled, cheat-resistant evaluation.

Agenthon 2026 logo
Submission Docker agent
Admissibility g0-g3 gates
Ranking Track metrics

Core question

Can AI agents produce finance answers that can be checked by machine?

Agenthon extends the Alphathon program into its first NeurIPS edition. The competition keeps the four-track structure and hard finance setting, while adding sealed held-out data, automated leakage controls, reproducible reruns, and public leaderboards.

Every official submission runs offline in a sandboxed Docker container. It must pass integrity, schema, cutoff/resource, and domain-semantics gates before any leaderboard metric is computed.

Four tracks

Same competition spine, four finance problems.

T1 Coding

Quant-finance coding agents

Build a Docker agent that solves quantitative finance coding tasks under pytest and financial-invariant checks.

Verb
solve
Metric
pass@1 / pass@3
Gate
pytest + invariants
T2 Forecasting

Reasoning-augmented time series

Forecast future panels using time-series data plus a time-stamped text corpus, then prove information uplift over text-blind baselines.

Verb
forecast
Metric
CRPS composite
Gate
as-of cutoff + calibration
T3 Simulation

Accelerated market simulation

Submit an ABIDES-compatible simulator that is faster while preserving matching-engine semantics and market stylized facts.

Verb
simulate
Metric
events/sec
Gate
semantic regression
T4 Tabular

Evidence-grounded prediction

Predict labels, values, or rankings over tabular entities with citations from a frozen evidence corpus and confidence intervals.

Verb
analyze
Metric
quality + coverage
Gate
faithfulness + embargo

Protocol

Submit, check, score.

The visible rule is simple. A submission is ranked only after it passes the same admissibility sequence used by every track.

01

Submit

A Docker image implements one stable CLI verb for its track.

02

Check

g0-g3 verify integrity, schema, cutoff/resource rules, and domain semantics.

03

Score

Only admissible runs receive a track metric and bootstrap confidence interval.

Public/private firewall

Transparent grading, sealed answers.

Each track has a public practice repo and a private sealed exam repo. Public repos include practice tasks, baselines, smoke scorers, and participant docs. Private repos hold held-out tasks, oracle keys, final scoring logic, and audit material.

Public practice Private exam
Public-dev and validation units Private-test held-out units
Runnable baselines and smoke scorer Oracle solutions and final scorer
Manifest and canary safety checks Canary registry and audit logs

Leaderboard

Competition scores will appear here.

Each team row will show its total score first. Expanding a row reveals the per-track score breakdown used to compute or explain that total.

Rank Team Total Breakdown
-- Team placeholder 01 -- View tracks
T1 Coding--
T2 Forecasting--
T3 Simulation--
T4 Tabular--
-- Team placeholder 02 -- View tracks
T1 Coding--
T2 Forecasting--
T3 Simulation--
T4 Tabular--
-- Team placeholder 03 -- View tracks
T1 Coding--
T2 Forecasting--
T3 Simulation--
T4 Tabular--

2026 phases

Development, final, verification.

Upcoming

Development opens July 20

Public repos open, teams practice, and the validation leaderboard goes live.

0% complete
  1. Upcoming Jul 20 - Sep 28

    Development

    Public repos open. Teams practice, iterate, and use the live validation leaderboard.

  2. Upcoming Sep 29 - Oct 12

    Final

    One submission per team is evaluated on sealed private-test units with a hidden leaderboard.

  3. Upcoming Oct 13 - Oct 25

    Verification

    Organizers rerun top submissions with fresh seeds and review reproducibility.

Scientific outputs

Agenthon is designed to explain failures, not just rank winners.

Cross-track failure map

Gate failures are labeled and aggregated to show where finance agents break down.

Information uplift

T2 isolates whether text and reasoning beat strong text-blind forecasting baselines.

Speed-realism frontier

T3 measures throughput only after semantic fidelity and stylized facts survive checks.

Faithfulness under embargo

T4 requires evidence-backed predictions that do not cite future or unsupported facts.

Resources

Built from the Agenthon 2026 source docs.

Question Partners

Real problems from real desks.

Agenthon Questions and Tasks are of real relevance to hedge funds and asset managers, spanning alpha forecasting, portfolio optimization, computational statistics, machine learning, and AI. Questions are provided jointly by the SQA, CEWIT and our Question Partners.

See Questions 2025 for last year's questions and a feel for Agenthon priorities.

Our Supporters

Sponsor Agenthon 2026.

A number of sponsorship tiers and opportunities are available, including monetary and awards sponsorship, event space, data, infrastructure, and compute.