Journal

Methodology27 Apr 2026 · 7 min read

How We Benchmark Company Footprint Data

Published: 27 Apr 2026
Reading time: 7 min read

Benchmark design

Private Test Data

Human-rated test sets are kept private from model training and providers.

Stratified sample

1000s

Our private datasets contain thousands of records, stratified to cover representative geographies, company sizes, and industries.

Account Impact Benchmark

5,000

Every 1% improvement in benchmark leads to 5,000 better revenue impacting account decisions (assuming 500k total CRM accounts).

Overview

Abstract

Footprint is Stride’s benchmarked data layer for go-to-market, segmentation, territory, TAM, and expansion decisions. The methodology starts from a simple principle: company data should be evaluated by the decisions it supports, not just by whether a field is filled.

Metrics such as employee count, revenue, industry, and operating footprint are deceptively difficult to define because source quality, reporting basis, and entity boundaries vary across companies. This article explains how Stride builds blind, human-rated benchmark datasets, establishes reference values, collects provider outputs, and evaluates data through two benchmarks: Footprint-Accuracy and Footprint-Account-Impact.

Together, these benchmarks separate materially incorrect data from errors that actually change account-level decisions. This allows teams to interpret data quality in terms of commercial impact, not superficial precision.

1Introduction

Company data looks simple until it is used to make decisions.

Employee count, revenue, industry, location, and operating footprint often appear as basic firmographic fields. In practice, each one can be ambiguous. A company’s reported value may depend on the source, the reporting period, the accounting basis, the legal entity, the brand boundary, or the corporate group structure.

This matters because account data is not passive. It determines how companies size markets, assign territories, route accounts, prioritise pipeline, define segments, and decide which accounts deserve attention.

Footprint was created to benchmark whether account data is reliable enough for those decisions. The goal is not perfect data for its own sake. The goal is better commercial decisions at account level.

2Building our Private Dataset

2.1Defining Metrics

The first step in any benchmark is defining each firmographic metric clearly.

Take employee count. Is a contractor an employee? What about temporary staff in a seasonal tourism business? Should part-time employees count? Should the count apply to the legal entity, the operating brand, the parent company, or the whole corporate group?

Revenue has the same problem. Which year are we using? Is it the last completed financial year, current run-rate, gross revenue, net revenue, marketplace GMV, or recognised accounting revenue? For an insurance business, are we looking at total revenue or net premium? For a marketplace, are we measuring transaction volume or reported accounting revenue?

For Footprint benchmarks, we use definitions from financial reporting, accounting, and compliance wherever possible. These tend to be the most heavily vetted definitions available. They are not perfect for every commercial use case, but they give us a consistent baseline for fair comparison.

2.2Building the Test Dataset

A benchmark is only useful if the test set reflects the companies customers actually care about.

It is easy to benchmark on large public companies. They have annual reports, audited filings, investor presentations, and a lot of available information. But that is not what most CRMs look like.

Real account universes contain private companies, subsidiaries, local businesses, franchises, marketplaces, holding companies, regional offices, public companies, government-linked entities, and businesses with limited public information.

Footprint benchmark datasets contain thousands of companies, human-rated and stratified across industry, geography, company size, ownership structure, and source availability.

2.3Establishing Reference Values

Once the metric is defined and the sample has been built, the hardest question remains: what is the true value?

That depends on two things: source quality and entity definition.

Source quality: Determines how much confidence we can place in the evidence. A public company’s audited filing is usually a highly reliable source for revenue. A private company’s revenue estimate may require a more careful review of filings, registries, company disclosures, industry sources, and expert judgement.
Entity definition: Determines what is actually being measured. Consider Whole Foods. It is owned by Amazon and highly integrated. What counts as “Whole Foods”? Is it just the legal grocery subsidiary? The physical store estate? Online Whole Foods orders through Amazon? Amazon Fresh overlap? Prime-linked grocery economics? Shared logistics and technology infrastructure?

Different people could reasonably draw the boundary differently depending on the question they are trying to answer.

For Footprint benchmarks, we only include cases where there is high confidence in both the source and the entity definition. If the answer is too ambiguous, it is not a good benchmark case. Benchmarks should not pretend uncertainty does not exist. They should control for it.

2.4Collecting Provider Outputs

Once the benchmark set is defined, each company is run through the providers and models being evaluated.

AI Models: We use structured prompts and workflows based on the best practices developed through Stride. The goal is to get the best reasonable out-of-the-box result from each model, not to make other tools look weak. This means that results from tools like ChatGPT or Clay may not be exactly replicable by someone using a simpler prompt or less structured process. We are testing what these systems can produce when used carefully.
Other providers: For providers such as ZoomInfo, D&B, or similar datasets, human experts collect and review the outputs. This matters because LLMs often confuse entities, especially subsidiaries and parent companies. We do not want the benchmark to accidentally measure extraction errors instead of provider data quality.

3Benchmarks

Footprint evaluates company data through two benchmarks. They answer different questions.

Comparison of Footprint-Accuracy and Footprint-Account-Impact benchmarks
Benchmark	Primary Question	What It Measures	Best For
Footprint-Accuracy	Is the value materially wrong?	Whether the provider output is within a practical error threshold of the human-rated reference value.	Strategy teams, operators, data teams
Footprint-Account-Impact	Would the decision change?	Whether the error moves an account across a meaningful GTM, routing, segmentation, or TAM boundary.	GTM teams, revenue operations, sales operations

3.1Footprint-Accuracy

Footprint-Accuracy measures whether the number is meaningfully wrong.

For metrics like employee count and revenue, we use a geometric threshold. A provider passes if its value is within 2x of the human-rated reference value.

2x accuracy threshold

0.5 × Reference ≤ Provider ≤ 2.0 × Reference

A provider passes when its value sits within this practical materiality range.

If the reference value is 100 employees, then 50 to 200 employees is considered acceptable. Below 50 or above 200 fails.

This may sound generous, but it reflects how account data is usually used.

If a public company reports $1.5B in revenue and a provider returns $1.2B, that is technically wrong. But for most GTM workflows, it probably does not matter. The account is still in the same broad segment. It would likely be routed the same way, assigned to the same team, and included in the same TAM analysis.

3.2Footprint-Account-Impact

Footprint-Account-Impact measures whether the error would change decisions based on a segmentation base model.

Most GTM systems do not use raw employee and revenue values directly. They use bands, thresholds, and segments. For example, a company with $125M in revenue and a company with $450M in revenue may both be treated as the same segment.

So we ask whether the provider’s error would change the decision using common segmentation rules based on revenue, location, size, and industry.

This is why providers usually score better on Footprint-Account-Impact than on Footprint-Accuracy. The benchmark is more forgiving because real-world motions are more forgiving. Not every numerical error changes a business decision unless it is an order of magnitude error.

4Interpretation

4.1Results

The value of this approach is that benchmark results translate directly into operational impact.

If an enterprise has 500,000 accounts in its CRM, then a 1% improvement in the Account Impact benchmark means 5,000 accounts would receive a different GTM motion, segmentation, routing decision, or TAM assessment.

That is the standard we care about: how many decisions get better?

This makes benchmark results easier to interpret. Instead of asking whether one provider is abstractly more accurate, teams can ask how many accounts would be treated differently and whether those differences improve commercial execution.

4.2Cross-Validation

Footprint findings are also cross-validated against OECD and national-level datasets where possible. This helps answer a different question: does the dataset resemble the actual composition of the economy?

In our experience, most company datasets do not come close. They often overrepresent certain geographies, industries, company sizes, and types of firms. They may look comprehensive, but when compared with national or international economic datasets, the distribution can be far from reality.

That is a strong signal that there is still room for improvement for the industry in general. Account-level benchmarking tells us whether a provider is right about specific companies. Macro-level validation tells us whether the dataset reflects the real structure of the market.