I have been keeping a running notebook of services teams that have made a single concrete operating change and watched their numbers move. The notebook is now several years long. The pattern that emerges from it is unspectacular: the teams that pull ahead are not doing more things, and they are not doing fancier things. They are doing one specific structural thing well, and that one thing compounds.
This piece walks through three archetypes, drawn from teams I have either worked with directly or watched closely enough to confirm the operating move and the numbers. The teams are real. The shapes are described as archetypes because the actual identities are not material to the operator lesson and the operators involved did not sign up to be a public case study. The numbers are accurate to the teams I am describing.
The point is not the headline number in any one example. The point is that the three teams converged on the same kind of move from different starting positions.
Why most teams’ delivery improvements never compound
Before the archetypes, a quick frame for what I am looking for. The teams that show up in the notebook all did one thing in common: they moved a piece of delivery judgement out of someone’s head and into the engagement object itself. Not into a wiki, not into a separate “knowledge base,” not into a PSA module that nobody opens. Into the structured object where the work actually happens.
The teams whose improvements never compounded almost always made their change in the wrong place. They wrote playbooks that nobody read. They held retrospectives whose outputs nobody looked at again. They bought a PSA that ended up being a fancier time tracker. The work changed shape briefly, then snapped back. The judgement that drove the improvement was still trapped in individual operators.
What follows is what it looks like when the change is made in the right place.
Archetype 1. The 40-person life sciences specialist
A specialist team running validation engagements for life sciences clients had a recurring problem on regulatory submissions. Roughly one in four engagements ran fifteen to twenty percent over on the validation testing phase. Every project review pinned it on either scope clarification with the client or testing complexity that emerged late. Every project review produced commitments to do better next time. Next time looked the same.
Their concrete operating move was specific and small. They stopped treating “validation testing” as a single line item in the scope. They broke it into structured complexity drivers attached to the service definition: number of integrations covered, regulatory framework in play, client team’s prior experience with the framework, data migration scope, and one more variable specific to their domain. Each driver had a default multiplier, calibrated against their last twenty engagements.
The work was unglamorous. It took them about six weeks of disciplined effort to build the initial drivers. Once it was in, every new engagement scoped automatically inherited the structured complexity model, and every closed engagement fed actuals back into the calibration.
Twelve months in, two numbers moved. Realised margin on validation engagements lifted from 23% to 31%. Estimation variance, the gap between quoted and realised effort, dropped by roughly half. Their senior estimators, who had been the bottleneck on every quote, were no longer the bottleneck. The complexity model was doing the work they had been doing in their heads. SPI’s industry benchmark puts the median professional services profit margin in the high single digits1, which gives a sense of how meaningful an eight-point margin shift is at this scale.
The lesson was not that complexity drivers are magic. The lesson was that judgement that lived in three senior estimators’ heads now lived in the service definition. The team owned the judgement. New estimators inherited it.
Archetype 2. The 90-person legal tech implementation shop
A larger team, doing legal-tech implementation work for mid-market law firms, had the opposite shape of the same problem. Their estimation was tight. Their pricing was the issue. The same kind of engagement would go out at noticeably different prices depending on which partner had assembled the deal. Margin in the team had been bouncing between 14% and 22% quarter to quarter for two years.
Their move was to put pricing logic into the quote builder itself, not into a rate card document. A rate card is a list. The quote builder, in their case, was the place where partners actually configured a deal. They built guardrails into it: when a deal was being shaped at a margin band outside the team’s normal range for that engagement type, the quote builder flagged it and required a structured justification before the deal could move forward. The partners could still exercise judgement. They just could not exercise it accidentally.
What changed was not pricing. The team did not raise rates. The variance compressed. Average realised margin lifted from 17% to 23% over three quarters and stopped bouncing. The flagging mechanism was catching deals that, in retrospect, had been structurally underpriced for years. There were also two engagements that the system flagged for being overpriced, which the partners adjusted before the proposal went out, and which converted at a higher rate than the team’s historical average.
The structural artifact was small. It was a margin band per engagement type, attached to the service definition, enforced at the point of quoting. The judgement of three senior partners about what a deal “should” be priced at was now legible as numbers in the system.
Archetype 3. The 200-person enterprise SaaS services arm
The third archetype is a services arm inside a mid-stage enterprise SaaS company. Roughly 200 people, doing implementation and integration work for their own platform’s customers. They had a problem the smaller teams did not: institutional memory across the team had become genuinely impossible to manage informally. Engagements were similar enough that lessons from one client should have informed the next, and structurally enough different that nobody could keep them all in their head.
Their move was to embed institutional memory into the engagement layer, with vector-based similarity over past engagements. When a new engagement was being scoped, the system surfaced the closest matches automatically: same kind of integration, same client size, same product modules, same complexity drivers. The new lead saw, before they wrote the first scope draft, the three engagements most like this one and what had actually happened on them.
The change was not adoption-driven. They did not have to chase consultants to use it. The information was sitting there when they opened the engagement. It was less work than not using it.
A year in, the visible numbers were: utilization on this team rose by about 4 points, against an industry baseline that was sliding2. New consultants got to first solo engagement in six weeks rather than four months. The recurring failure mode where two parallel engagements made the same mistake against the same kind of integration, a thing that had cost them money every quarter for years, mostly stopped.
The structural artifact was past delivery becoming queryable. The team’s history became their force multiplier. Their newest consultant in 2026 starts with three years of structured engagement memory at her fingertips. Their newest consultant in 2024 had to absorb that through tenure.
What the three teams had in common
Three different team sizes. Three different industries. Three different starting problems. The same operating shape.
Each team moved a piece of delivery judgement out of an individual’s head and into a structured object the team owns. Service definitions with complexity drivers. Margin bands at the quote builder. Engagement-level institutional memory queryable by similarity. The mechanism was different in each case. The pattern was identical.
Each team also resisted the obvious move. The 40-person team resisted writing a “validation playbook.” The 90-person team resisted publishing a rate card. The 200-person team resisted launching a “knowledge management initiative.” In every case the obvious move had been tried, by them or by a peer team, and had failed. The work had to happen in the engagement object, not next to it.
For the underlying argument on why this works, see AI Won’t Save Your Services Business and the end of tribal knowledge. For the operator-level deeper read on the institutional memory surface, see building an institutional memory engine. The companion artifact most teams use when starting this work is the PS OS Pillar Vision worksheet.
Frequently asked questions
-
The operators involved did not sign up to be a public case study. The shapes are accurate to teams I have either worked with directly or watched closely enough to confirm the operating move and the numbers. The point is the structural pattern, not the team's identity. Naming would not change the lesson.
-
Each team moved a piece of delivery judgement out of an individual's head and into a structured object the team owns. Complexity drivers attached to the service definition. Margin bands enforced at the quote builder. Engagement-level institutional memory queryable by similarity. Different mechanisms, same shape: judgement becomes a property of the engagement, not a property of a tenured operator.
-
In principle yes, but the gains do not compound. A senior estimator who is great at scoping validation engagements cannot scale herself across 40 engagements at once, and her judgement leaves when she does. Encoding the judgement into the service definition makes it a team asset that survives senior departures and is available to less experienced operators on day one.
-
The 40-person team saw clear margin lift inside twelve months. The 90-person team saw margin variance compress inside three quarters. The 200-person team saw utilization and ramp-time movement inside a year, with the institutional memory effect compounding from there. None of these were overnight changes. They were structural shifts whose impact accumulated as more engagements ran through the new shape.
-
Pick the engagement type that costs you the most when it goes wrong. Sit down with the operator who is best at scoping it. Ask three questions: what do you check for during scoping that others miss, where does this engagement type usually go wrong, and what should the estimate include that it often does not. Capture their answers as structured fields in the service definition, not as a wiki page. After three engagements, you will see whether the next person scoping that type uses the structure. If they do, scale the pattern.
Sources
- . (2025) . 2025 Professional Services Maturity Benchmark. https://spiresearch.com/professional-services-maturity-benchmark/ Accessed 2026-05-07. ↩
- . (2025) . 2025 Professional Services Maturity Benchmark, billable utilization section. https://spiresearch.com/professional-services-maturity-benchmark/ Accessed 2026-05-07. ↩