Data & Enrichment

Waterfall Enrichment Explained: How Stacking Data Providers Lifts Contact Coverage From 30% to 80%+

Aug 14, 20255 min read

You pulled a list from Apollo, ran it, and roughly a third of the rows came back with a usable email. The rest were blank, or worse, they looked fine and then bounced. If you have ever stared at a CRM export where the "email" column is half empty and the other half is decaying, you already understand the core problem: no single data provider knows everyone. That is the gap waterfall enrichment closes.

This is an operator's explanation, not a vendor definition. We have run 950K+ contacts through enrichment and built 1,800+ production Clay tables doing exactly this work, so the numbers and trade-offs below come from output, not a pricing page.

What waterfall enrichment actually is

Waterfall enrichment is a sequence. You take a contact (a name plus a company, or a LinkedIn URL), and you ask data provider one for the email. If it returns a verified result, you stop and pay only that provider. If it comes back empty, you pass the same contact to provider two, then provider three, and so on down the stack until something lands or you run out of sources.

The reason it works is simple: providers do not overlap perfectly. One vendor might cover 45% of your list well, another covers a different 40%, and the union of five or six sources covers far more than any one of them alone. A single source typically lands you 30 to 50% usable, verified coverage on a real B2B list. A sequenced stack pushes that to 80%+ on most ICPs. You are not finding better data, you are stitching together the non-overlapping parts of several datasets.

The mechanics matter: each row only moves to the next provider when the prior one fails. That conditional logic, run across a whole list, is what separates a waterfall from just buying five tools and merging the results.

Why one provider caps out at 30 to 50%

Apollo is the usual starting point because it is cheap and broad, and broad is exactly the problem. A database that tries to cover everyone is stale somewhere. People change jobs, companies rebrand, catch-all domains hide the truth, and a record that was correct eighteen months ago now bounces. When operators tell us their previous setup was full of bounces and bad data, the root cause is almost always a single-source list that was never re-verified.

That decay is not just a coverage problem, it is a deliverability problem. Sending to stale addresses drives your bounce rate up, and a high bounce rate is one of the fastest ways to wreck your sender reputation and start landing in spam. We hold bounce between 0.15 and 0.9% and average 98.5% inbox placement, and clean enrichment is a precondition for both. You cannot fix deliverability downstream if the list feeding it is rotten upstream.

How we sequence providers and control cost

The order of the stack is the whole game. Run your cheapest, highest-hit-rate provider first so the easy 40 to 50% gets resolved at the lowest cost per match. Only the rows that fail roll down to more expensive, more specialized sources. By the time a contact reaches provider five, you have already paid pennies to resolve most of the list, so the higher unit cost only applies to the hard remainder.

Sequence by cost, not by brand. Cheapest and broadest first, premium and niche last.
Verify at every step. An unverified email is a future bounce. Each provider's output gets validated before it counts as a match, so you never pay to send to a guess.
Cap the depth. If a contact is still unresolved after the stack, flag it rather than forcing a low-confidence guess. A missing email is recoverable, a damaged domain reputation is not.
Track cost per verified contact, not per credit. The real metric is dollars per usable, deliverable row, and the waterfall is what keeps that number low.

We build and run this in Clay with self-hosted n8n orchestrating the moving parts, which is why a 950K-contact volume stays affordable: you are not paying premium rates on the rows a cheap provider already solved.

Where waterfall enrichment fits in an outbound system

Enrichment is not a standalone purchase, it is one layer of a system. The contact data feeds personalization (we use Perplexity and Claude to write from real, current facts), the verified emails feed warmed sending infrastructure, and the whole thing only pays off if the list is targeted in the first place. Stacking providers on a bad ICP just gets you 80% coverage of the wrong people.

This is also why coverage and triggers belong together. A contact resolved through the waterfall becomes far more valuable when you also know the company just raised funding, is hiring for a relevant role, or runs a specific tool, which is the work behind signal-based outbound. GearLocker is a concrete example: rather than rent a generic list, we built a proprietary 66,000-school database and enriched it, which produced 194 interested replies because the coverage and the targeting were solved at the same time. You can see the build in the GearLocker case study.

FAQ

Questions, answered.

Is waterfall enrichment just buying several data tools and merging them?

No. Merging means you pay every provider for every contact and then dedupe the pile. A waterfall is conditional: each contact only moves to the next, more expensive provider when the cheaper one fails to return a verified result. That ordering is what keeps cost per verified contact low while still reaching 80%+ coverage.

How much coverage can I realistically expect?

On a single source like Apollo, expect 30 to 50% usable, verified coverage on a real B2B list, and some of that will decay or bounce. A properly sequenced stack of five or six providers, with verification at each step, typically reaches 80%+ on most ICPs. The exact number depends on how niche your audience is and how clean your input names and companies are.

Why does enrichment quality affect email deliverability?

Sending to stale or unverified addresses drives up your bounce rate, and a high bounce rate damages sender reputation and pushes you toward the spam folder. We keep bounce between 0.15 and 0.9% and average 98.5% inbox placement, and clean, verified enrichment is the precondition for both. You cannot fix deliverability downstream from a list that is wrong upstream.

Want this built and run for you?

LongRun builds the outbound system, runs it, and hands it over at day 90. Book a strategy call to scope yours.