Questions
Questions for Alphathon 2025 are designed to be of relevance to the investment industry, and cover many areas of quant research, including return forecasting and alpha generation; data science, machine learning and large language models (LLMs); dealing with unstructured, asynchronous, alternative data; as well as risk management, market microstructure, market impact and portfolio construction. The month-long coding competition allows for thoughful solutions, rather than a single-weekend brute data mining approach.
Question 1
Point72 / Cubist provided the following question and will join the SQA and Stony Brook in judging the submissions
Real-Time Streaming Application
Problem:
CSP is an open-source reactive stream processing library. Using CSP, create an innovative real-time streaming application with a strong visualization component that leverages multiple asynchronous data sources.
Expected Outcome:
A prototype of a real-time data processing application that can generate actionable insights using a variety of techniques in quantitative finance. The application should have a visualization component and should demonstrate how it handles streaming data from multiple sources as well as the usefulness and accuracy of the insights it generates.
Data:
Participants will have access to data and infrastructure provided by Alphathon 2025 sponsors.
Evaluation:
The submissions will be evaluated based on 1) the creativity demonstrated in leveraging CSP’s high performance features, 2) the rigorous use of quantitative finance skills, 3) the attractiveness of the visualization features, as well as 4) the usefulness of the outputs generated by the application.
Question 2

AllianceBernstein provided the following question and will join the SQA and Stony Brook in judging the submissions
Edgar + LLMs = Alpha
Problem:
Can we use LLMs on the Edgar database to extract signals or other alpha relevant information to predict US stock returns (S&P 500)?
Expected Outcome:
Your study should demonstrate how to use LLMs in innovative ways to extract insights and forecast future returns in ways that are incrementally and demonstrably better to simpler techniques with the same data inputs, and additive to known investment styles.
Data:
Participants will have access to the Edgar database. The Edgar filings data should be the primary data input and area of focus, though combining it with other data would be acceptable.
Evaluation:
The submissions will be evaluated based on the creativity demonstrated in proving or disproving the value of LLMs in time series forecasting, the rigorousness of the methodological comparison, the signals relation to the existing literature, including risk management of known investment styles, questions about forward looking information in the model, and realistic market impact assumptions, as well as the practical usefulness of the output for investment management. Please also consider the impact of in-sample bias of the LLMs and consider ways to address that bias.
Question 3

Principal Financial Group provided the following question and will join the SQA and Stony Brook in judging the submissions
Moving Targets: Detecting Shifts in Corporate Performance Narratives via LLMs
Problem:
Can we use Large Language Models (LLMs) to identify and quantify Moving Targets in corporate communications?
Expected Outcome:
Your study should develop a systematic methodology to detect shifts in the presence or prominence of specific performance metrics discussed in earnings call transcripts and the Management's Discussion and Analysis (MD&A) sections of 10-K and 10-Q filings.
Data:
Participants will have access to data by Orbit Financial Technology and/or other providers.
Evaluation:
The submissions will be evaluated on the creativity demonstrated in identifying key performance metrics, quantifying Moving Targets, constructing and backtesting a trading strategy, including risk management of known investment styles and realistic market impact assumptions, understanding the characteristics and nuances of these moving targets, as well as the practical usefulness of the output for investment management.
Question 4

Northfield provided the following question and will join the SQA and Stony Brook in judging the submissions
Microstucture, Market Impact and Scaling Active Equity Strategies: From Millions to Billions
Problem:
How have the profound changes in the asset management industry impacted the ability of equity managers to execute active strategies for very large funds?
Expected Outcome:
Your study should devise an active equity strategy (universe, alpha model, portfolio construction, trading procedures) that you believe will be successful in producing economically material alpha (risk adjusted return) for a fund of $20 Million in size. The strategy may be long-only (your choice of benchmark if any) or long-short. While positive alpha expectations are required, simulated performance metrics are a lesser consideration. You will then revise your strategy to best accommodate a fund size of $20 Billion.
Data:
Participants will have access to a market impact model, risk model, optimizer and backtesting functionality by Northfield Information Services, but their use is not strictly required.
Evaluation:
The submissions will be evaluated on the estimate of the relative performance of the two funds (it is preferred that the estimate of relative performance arises from an analytical solution rather than simulations), steps to mitigate liquidity limitations arising in the active management of a large fund over a multi-year time horizon, a rigorous understanding of market structure, microstructure and portfolio optimization, including the multi-period nature of trade execution, and the overall practical usefulness of the output for investment management.