A Practical Application of Data Science in Preconstruction: Part I
- Nathan Schafer, CPE, VMA
- Dec 24, 2025
- 7 min read
Modeling Quantity Risk Using Design Maturity Signals

Introduction: The Contingency Problem in Preconstruction
Contingency is a debated but poorly defended component of estimate building. At all (but especially early) stages of design, estimators are asked to price incomplete information while maintaining credibility with owners, design teams, and internal stakeholders. Industry-standard guidance (most notably AACE International’s recommended contingency ranges by estimate class) provides a useful starting point, but it is fundamentally generic. These guidelines assume that uncertainty is primarily a function of design phase, rather than of how complete or incomplete the design actually is.
In practice, two projects both labeled “35% design” (please note: 35/65/95 are the common design development stages in my market, and I believe that 30/60/90 are still the most common in the lower 48) can present meaningfully different levels of risk. One may include well-developed structural drawings and edited specifications with minimal TBDs, while another may rely heavily on placeholders/plugs, allowances, and deferred decisions. Yet both are often assigned similar contingency percentages because they occupy the same nominal design phase. This disconnect (at least, until recently) forces estimators to rely on their best judgment when adjusting contingency—judgment that has been difficult to defend quantitatively.
The interesting part to me is that modern construction companies generate large volumes of historical data, which they may (but often do not) use to the limits of their practical applications: quantity takeoffs, estimate snapshots, buyout comparisons, RFIs, labor productivity reports, and final installed quantities. Despite having all of this data, contingency decisions are rarely informed by internal empirical evidence. Instead, many estimators default to external benchmarks (AACE, cost handbooks, or market norms) that do not reflect their specific estimating practices, project mix, or historical execution capabilities.
This article proposes a practical alternative: treating quantity growth as a measurable, predictable risk that can be modeled using design and specification maturity signals extracted from drawings and project documents. By linking observable indicators of design development to historical quantity outcomes, preconstruction teams can begin calculating contingency using their own data (and I hope to make the case that doing so will augment professional judgment with evidence rather than replacing it).
This is Part I of a multi-part series. Here, the focus is on framing the problem, identifying viable signals, and outlining a modeling approach. At this time, my hope/vision is that Part II will address validation, visualization, and implementation considerations in greater detail.
Quantity Growth as a Predictable Risk Variable
Before discussing models or data science techniques, it is important to clarify what problem is being solved. This approach is not concerned with total cost growth, scope creep, or owner-driven changes (there are, of course, different types of contingencies, and that should not be overlooked). Instead, it focuses narrowly on quantity growth (the difference between quantities estimated at a given design stage and quantities ultimately installed).
Quantity growth is particularly well-suited for modeling because it is:
Measurable and auditable
Less influenced by market volatility than unit pricing
Closely tied to design completeness
Examples include: increases in concrete volume due to missing sections, additional ductwork caused by undeveloped routing, or increased reinforcing steel resulting from late structural detailing. These outcomes are not random; they are systematic (and maybe that word is too big here, but I’m leaving it for now) responses to incomplete information.
From a risk perspective, quantity growth behaves more like a distribution than a fixed value. Across a portfolio of similar projects, early estimates may understate quantities by an average of (for example) 8%, with a standard deviation that reflects design variability. Some projects experience minimal growth, while others fall into the tail of the distribution. Traditional contingency percentages implicitly acknowledge this uncertainty (by virtue of being presented as a range rather than a fixed value) but fail to quantify it in a discipline-specific or data-driven way.
By reframing contingency as a response to expected quantity growth and its variance (potentially even shown in the directs as a scope-specific contingency), rather than an arbitrary percentage add-on, estimators can ground early-stage risk allowances in observed historical behavior.
Design and Specification Maturity as Measurable Signals
The key premise of this approach, I think, is that design maturity can be quantified (and if you object to this premise, I’d be keen to debate it in the comments). While design quality is often treated as subjective, most construction documents contain numerous objective, countable features that correlate strongly with completeness.
Drawing-Based Maturity Metrics
One of the simplest and most powerful signals (that I have identified to date) is sheet count by discipline. Of course, there are other factors to consider here, like historical sheet counts by design team at varying levels of design maturity and building type (just to name a couple). Across a company’s historical projects, patterns emerge: schematic design sets tend to include a limited number of architectural and structural sheets, while more developed sets show expansion in details, sections, and discipline coverage.
Potential drawing-based features include:
Total sheet count
Sheet count by discipline (C, A, S, M, E, P, FP, etc.)
Ratio of detail sheets to plan sheets
Presence or absence of key drawing types (enlarged plans, sections, schedules)
When normalized against project size or compared to historical averages for similar projects, these metrics provide a strong proxy for how resolved the design actually is (regardless of what phase label appears on the cover sheet).
Specification-Based Maturity Metrics
Specifications are often overlooked in preconstruction risk discussions, yet they contain valuable signals. Spec books that remain largely unedited from master templates frequently indicate unresolved scope decisions. Conversely, heavily edited specifications often suggest that materials, systems, and performance criteria have been meaningfully defined.
Useful spec-related metrics include:
Total page count by division
Percentage of bracketed language or TBDs
Frequency of phrases such as “to be determined” or “as selected”
Presence of allowances embedded in specifications
These features are particularly relevant for MEP and specialty scopes, where late specification decisions often drive quantity and labor growth. As a joke, I often observe out loud: “mechanical and electrical don't show up until 65% DD.”
Explicit Ambiguity Indicators
Beyond drawings and specs, estimators routinely document uncertainty explicitly through:
Allowances
Unit prices
Exclusions tied to incomplete design
Clarifications that defer responsibility
The count and value of these items provide direct evidence of unresolved scope. While allowances are often treated as cost-risk mechanisms, they could also function as maturity signals that can be incorporated into a predictive model/framework.
Construction-Phase Feedback: Labor and RFIs as Ground Truth
For any predictive model to be useful, it should be grounded in outcome data. In construction, two underutilized feedback mechanisms are labor productivity reports (which we often reference when estimating at our company) and requests for information (RFIs).
Labor productivity reports capture the realized effort required to install quantities in the field. When compared to estimated productivity assumptions, deviations often correlate with design quality. Poorly coordinated or incomplete documents tend to increase rework, inefficiency, and crew interruptions (effects that are frequently attributed to means and methods but, in my opinion, are actually design-driven).
RFIs can serve as a complementary signal. High RFI volumes, particularly early in construction or concentrated in specific disciplines, indicate unresolved design intent. RFIs that result in revised quantities (e.g., additional reinforcing, modified layouts, or expanded systems) are especially valuable, as they can directly link document maturity to quantity outcomes.
By tying final installed quantities and labor hours back to early-stage design metrics, preconstruction teams can begin closing the loop between estimating assumptions and construction reality.
A Conceptual Model for Quantity Risk Prediction
The objective of the proposed model is not to predict exact quantities, but to estimate expected growth in quantities and their uncertainty at a given point in time. In statistical terms, this means predicting a distribution rather than a point estimate.
Inputs
Candidate input variables may include, but are not limited to:
Drawing metrics (sheet counts, discipline coverage)
Specification metrics (page counts, TBD density)
Project attributes (size, type, delivery method)
Estimating context (estimate class, duration of estimate effort)
These features are typically available when an estimate is prepared, making them suitable for real-time decision support.
Outputs
The primary output is percentage quantity growth from estimate to final installed quantities, ideally segmented by discipline or major scope (e.g., concrete, steel, mechanical). Secondary outputs might include productivity deviation or RFI density, which can help explain why quantity growth occurred.
Modeling Approaches
From a practical standpoint, model interpretability matters more than sophistication. Linear regression or generalized linear models provide transparency and are often sufficient to capture first-order relationships. Tree-based methods, such as random forests or gradient boosting, can capture nonlinearity and interaction effects but should be used cautiously to avoid “black box” outcomes that estimators distrust.
Regardless of method, the model should be framed as decision support, not automation. Its purpose is to inform contingency discussions, not dictate them.
Translating Model Outputs into Contingency Decisions
Once the expected quantity growth is approximated, translating it into contingency becomes a straightforward exercise. Growth factors can be applied to quantified scopes, producing discipline-specific contingencies rather than a single global percentage.
This approach offers several advantages:
Contingency becomes proportional to actual risk drivers
Well-developed scopes are not over-penalized
Poorly developed scopes receive targeted attention
Importantly, this does not eliminate the use of industry guidance such as AACE ranges. Instead, those benchmarks could be understood as validation tools rather than primary inputs. If a model consistently produces results far outside accepted norms, the responsible estimator will investigate rather than blindly apply.
Communicating these results to stakeholders requires care. Precision should not be overstated; confidence intervals and ranges are more appropriate than single values. When framed correctly, data-backed contingency enhances credibility.
Implementation Considerations for Preconstruction Teams
Adopting this approach does not require a fully mature data science program (I’d take a moment and plug my friend David Hopkins’ blog here for anyone interested in getting started with data science in estimating). Most firms can start with a limited scope—one or two repeatable project types and a single major trade, such as concrete or structural steel (this seems like a good moment to note that thirty (30) data points are required to achieve statistical significance, and that should be taken into account when determining the scope with which to begin).
Data quality will be imperfect. Historical estimates may be inconsistent, and final quantities may require reconciliation (if possible...). However, even noisy data can reveal meaningful trends when aggregated across projects.
Equally important is cultural adoption. The responsible estimator will consider the model an extension of their expertise, not a critique of it. Transparency, auditability, and iterative refinement are (in my opinion) essential for long-term success.
Conclusion and Looking Ahead
Quantity growth is one of the most persistent sources of cost risk in early-stage estimates, yet it is rarely quantified explicitly. By treating design and specification maturity as measurable signals (and by leveraging internal historical data), preconstruction teams can move toward contingency decisions that are both defensible and company-specific.
This approach does not replace professional judgment; it strengthens it by grounding decisions in evidence. As contractors increasingly compete on preconstruction value rather than low bid alone, the ability to articulate risk using data will (again, in my humble opinion) become a differentiator.
At the time of writing, I imagine Part II of this series will explore practical case studies, model validation techniques, and visualization methods that make these insights usable in day-to-day estimating workflows.

Comments