72% of your call data is probably noise

The surprise in every call archive

When a company shares their call recordings with us, the first thing we do isn't analysis. It's triage.

In a recent engagement, we received over two thousand call transcripts. The expectation was two thousand conversations worth of strategic intelligence. The reality was different.

After scoring every transcript for signal quality, we found that only 28% contained enough substance for meaningful analysis. The rest were voicemails, wrong numbers, sub-30-second exchanges, and calls where the only content was "Hello? Hello? I'll call back later."

This isn't unusual. It's the norm.

· · · · · · · · · · · · · · · · · · · · ·

Why this matters

If you feed two thousand transcripts into an analysis system without filtering, three things happen:

Patterns get diluted. When 72% of your dataset is noise, the real signals are drowned out. An objection that appears in 40 out of 560 meaningful calls (7%) looks like it only appears in 40 out of 2,000 total calls (2%). The frequency data lies to you.

Resources get wasted. Processing a voicemail through an intelligence extraction pipeline produces nothing useful. It's compute time and API calls spent on "Hello. Hello. Goodbye."

Confidence drops. When an analysis system reports that it found "low-confidence evidence" across a dataset, part of the reason is that most of the dataset wasn't evidence at all.

· · · · · · · · · · · · · · · · · · · · ·

How we think about data quality

We classify every transcript into four tiers before any strategic analysis begins:

Rich — Three or more extractable signals (objections, triggers, competitive mentions, pricing discussions) plus direct quotes. These are the calls that drive intelligence.

Moderate — At least one extractable signal or a meaningful quote. These contribute to pattern detection but don't carry an insight alone.

Thin — A summary is possible but no structured signals can be extracted. Often these are very short calls or calls with poor audio quality where the transcription captured words but not meaning.

Empty — Voicemails, wrong numbers, automated messages, and exchanges under 30 seconds. Zero analytical value.

Only Rich and Moderate calls go into our cross-conversation analysis. Thin and Empty calls are excluded from the intelligence pipeline — though they're still indexed for search, in case a specific conversation needs to be found.

· · · · · · · · · · · · · · · · · · · · ·

What you can do about it

Before sharing your call archive for analysis — or before analyzing it yourself — a simple cleanup saves significant time:

Filter by duration. Calls under 60 seconds are almost never strategically useful. Remove them.
Exclude voicemails. Most CRM exports include voicemail recordings. These are noise.
Focus on call types. Discovery calls, client meetings, and negotiation calls contain the richest signals. Internal syncs and scheduling calls usually don't.
Check for transcription quality. If your recording tool produces garbled transcripts, the downstream analysis will be limited regardless of how sophisticated the methodology is.

The goal isn't to have the most calls. It's to have the most meaningful calls.

The quality of your intelligence is capped by the quality of the conversations that produced it.