Tools Insights Case Studies About Work With Me
Tool

The SPEC Test

Four Questions That Reveal Whether a Quant Model Is Built on Evidence or Accident

Milos Maricic

Most allocator due diligence asks whether a quant model works. Almost none asks why it works. That gap is not a minor oversight. It is the single largest undiagnosed source of risk in systematic strategy evaluation today.

The SPEC Test is a practical tool to close it. Four questions, no technical jargon, usable in your next manager meeting or investment committee memo. Each question comes with a clear guide: what a strong answer sounds like, what a standard but incomplete answer sounds like, and what should concern you.

It takes ten minutes to understand and zero technical background to apply.

The Problem

A mid-sized pension fund evaluates a quantitative manager. The manager presents five years of live track record, a Sharpe ratio above 1.2, and a detailed slide deck explaining their "proprietary machine learning infrastructure." The due diligence team reviews the performance attribution, checks the drawdown profile, and reads the risk disclosures. The manager passes.

Two years later, performance collapses. In the post-mortem, it becomes clear that the model's apparent edge came from a single feature: a crude sentiment signal that happened to work during a specific macro regime. The team never asked how the model was built. They only asked whether it had worked.

This is not an unusual story. Academic research on factor strategies documents it systematically. A landmark study by McLean and Pontiff found that the returns of documented factors decline by an average of 26% after publication, with the steepest drops in strategies most likely to attract institutional capital. The explanation is not just crowding. It is that many strategies, when examined structurally, were never built on stable economic relationships in the first place.

A 2023 analysis of 26 Barra factor models found that the majority could not reliably distinguish their genuine signal from noise in out-of-sample testing when structural assumptions were interrogated rather than performance outcomes. The models looked fine. Their foundations were not.

The problem is not that quantitative managers are dishonest. Most are not. The problem is that the standard framework for evaluating them creates systematic blind spots. Performance-focused due diligence is necessary but insufficient. It does not reveal whether a model's apparent success came from genuine insight, lucky configuration, or a feature that no longer exists in the market.

Why Current Due Diligence Misses This

Existing due diligence frameworks were designed for a different era. They ask excellent questions about operational infrastructure, risk controls, and performance attribution. They do not ask whether the model's structure justifies confidence in its future performance.

The gap exists for understandable reasons. Structural evaluation requires asking questions that feel technical, and allocators often assume they lack the background to interpret the answers. That assumption is wrong. The questions required to assess model structure are not technical. They are logical. They require curiosity and scepticism, not a PhD in machine learning.

The other reason is incentive misalignment. Managers who have built strong-looking track records have no commercial interest in inviting deep structural scrutiny. A performance-focused framework lets both sides maintain a comfortable distance from the harder questions.

The SPEC Test changes that dynamic. It gives allocators a vocabulary and a process for structural evaluation that does not require technical expertise and cannot be deflected by performance data.

The Framework

Four questions. Each tests a different dimension of model integrity.

S

Specification Choices

How was the model designed?

"How did you decide which variables to include in your model, and which did you deliberately exclude? What was the reasoning?"

Strong Answer

The manager explains a clear theoretical prior for each included variable and articulates why certain variables were excluded despite showing historical predictive power. They describe a formal model selection process with held-out validation data and can explain what the model would look like if built slightly differently.

Standard Answer

The manager describes a research process that tested many variables and selected the best performers. They may reference cross-validation or out-of-sample testing. This is better than nothing but does not resolve the core question of whether included variables have theoretical justification or were selected for historical fit.

Concern

The manager cannot clearly articulate why specific variables were included beyond historical performance. They struggle to explain what would change about the model under different market conditions. They conflate feature importance metrics with causal explanations.

P

Performance Failure Diagnosis

What happens when the model is wrong?

"Walk me through a time your model was wrong. What did you learn about its assumptions, and what did you change structurally, not just parametrically?"

Strong Answer

The manager describes a specific failure episode with clarity about what structural assumption proved incorrect. They distinguish between parametric adjustments, changing a weighting or lookback period, and structural changes that affected the model's underlying logic. They can explain why the same failure is less likely to recur.

Standard Answer

The manager describes a difficult period and explains that they adjusted parameters or risk controls in response. This is common and not necessarily concerning, but it does not answer the structural question. Follow up: "Was the underlying logic of the model changed, or only its settings?"

Concern

The manager cannot recall a meaningful failure, or attributes all difficult periods to external market conditions rather than model limitations. They describe responses that are purely parametric with no structural reflection. This suggests either a lack of self-critical research culture or a model that has not been meaningfully stress-tested.

E

Explanation Test

Can the model explain itself?

"Walk me through a specific trade from last quarter. Which signals fired, what factors were present, and why did the model make that decision?"

Strong Answer

The manager can reconstruct specific decisions with attribution to named signals and factors. They can explain why those signals would logically lead to that position in economic terms, not just model output terms. The explanation connects to the theoretical prior described in the Specification Choices response.

Standard Answer

The manager can explain that certain signals contributed positively to a position but struggles to articulate why those signals should have that effect in economic terms. They describe the model's behaviour without explaining its logic. This is common in complex ensemble models but warrants follow-up.

Concern

The manager cannot reconstruct specific decisions or deflects to aggregate performance statistics. They describe the model as a black box that produces outputs without explaining the logic connecting inputs to decisions. This does not necessarily mean the model is wrong, but it means the manager cannot verify it is right.

C

Configuration Sensitivity

How fragile is the model?

"What happens to your model when you remove one variable? How sensitive are the results to specification changes?"

Strong Answer

The manager has conducted systematic robustness testing and can describe which variables are load-bearing and which are supplementary. They can explain performance degradation under variable removal in terms that connect to the model's theoretical structure. The model performs materially but not catastrophically worse without any single component.

Standard Answer

The manager acknowledges that certain signals are more important than others but has not conducted formal sensitivity analysis. They can offer qualitative assessments of variable importance. This is not uncommon in operational strategies but represents a gap in structural validation.

Concern

The manager is evasive about sensitivity testing, or prior testing has revealed that performance collapses when specific variables are removed. A model that is catastrophically dependent on a single feature is effectively a concentrated bet on that feature's continued relevance, regardless of how it is presented.

Reading the Pattern

The strongest signal is not any single answer. It is the pattern across all four questions.

A manager who gives strong answers on Specification Choices and Configuration Sensitivity but struggles on the Explanation Test may have a well-constructed model that is difficult to communicate. A manager who gives confident answers on all four questions but whose explanations are vague or contradictory across questions is a more serious concern.

The pattern also reveals research culture. Managers who have thought carefully about structural questions tend to answer them readily, with specificity and without defensiveness. Managers who have not tend to redirect to performance data, cite operational complexity as a reason for opacity, or provide answers that are technically accurate but empty of content.

The SPEC Test is not a pass/fail instrument. It is a signal generator. Use it to structure your follow-up questions, to calibrate how much weight to place on track record, and to identify where further investigation is warranted.

Why This Matters Now

Three forces have made structural evaluation more important than at any previous point in the history of quantitative investing.

First, the proliferation of AI claims has created an evaluation crisis. When every manager claims to use machine learning, performance-based differentiation becomes less reliable. The structural questions are what distinguish genuine integration from rebranded factor strategies.

Second, the shift toward total portfolio approaches has increased the cost of structural failures. When quantitative strategies are held alongside private markets, infrastructure, and other illiquid allocations, a structural model failure in the liquid sleeve has amplified consequences for overall portfolio management.

Third, the ADIA Lab and similar sovereign research initiatives have elevated the methodological bar for what constitutes rigorous quantitative research. Academic standards are increasingly influencing what sophisticated institutional allocators expect from managers. The SPEC Test aligns practitioner due diligence with those evolving standards.

How to Use the SPEC Test

Before a Manager Meeting

Review the four questions and prepare one follow-up for each. The questions are designed to be conversational, not confrontational. Frame them as part of your standard process.

During Due Diligence

Use the Strong/Standard/Concern framework to document responses. The documentation creates a consistent record for committee review and enables comparison across managers.

For Board and Committee Reporting

The four-letter framework provides a non-technical vocabulary for communicating structural risk. Boards and investment committees can understand SPEC scoring without requiring technical translation.

Ongoing Monitoring

Re-apply the SPEC questions annually or following significant strategy changes. A manager who scored well initially may provide different answers after a difficult period, which can itself be informative.

Sources and Further Reading

The core research on causal factor investing

  1. Lopez de Prado, Lipton & Zoonekynd (2025). "Causal Factor Analysis Is a Necessary Condition for Investment Efficiency." Journal of Portfolio Management. SSRN.
  2. Lopez de Prado & Zoonekynd (2025). "Causality and Factor Investing: A Primer." CFA Institute Research Foundation.
  3. Olivetti, Zoonekynd, Yam, Lopez de Prado, Imbens & Hernan (forthcoming 2025). "ADIA Lab Causal Discovery Challenge." Journal of Financial Data Science. SSRN.
  4. Lopez de Prado (2025). "The Factor Mirage: How Quant Models Go Wrong." Enterprising Investor, CFA Institute blog.

CFA Institute resources on quant evaluation

  1. Simonian (2023). "Quant Screening: Three Questions for Investment Managers." Enterprising Investor, CFA Institute.
  2. Simonian (2024). "Investment Model Validation: A Guide for Practitioners." CFA Institute Research Foundation.
  3. Simonian, ed. (2025). "AI in Asset Management: Tools, Applications, and Frontiers." CFA Institute Research Foundation.
  4. CFA Institute. "Standard V(A): Diligence and Reasonable Basis." Standards of Practice Handbook, 11th ed.

Factor decay and backtest overfitting

  1. McLean & Pontiff (2016). "Does Academic Research Destroy Stock Return Predictability?" Journal of Finance, 71(1), 5-32.
  2. Bailey, Borwein, Lopez de Prado & Zhu (2017). "The Probability of Backtest Overfitting." Journal of Computational Finance, 20(4).

ADIA Lab and the infrastructure thesis

  1. ADIA Lab. "Causal Discovery Challenge" (2024). adialab.ae
  2. ADIA Lab. "ADIA Lab Award for Causal Research in Investments." adialab.ae
  3. ADIA Lab. "A Protocol for Causal Factor Investing." adialab.ae

Additional context

  1. Man Group (2021). "Overfitting and Its Impact on the Investor."
  2. Narang, R. Inside the Black Box: A Simple Guide to Quantitative and Algorithmic Trading. Wiley.
Get hands-on support