Scoring 100% of sales calls with AI: what really changes

In most sales teams we meet, a manager listens to 5 to 10 calls per week. And the team makes 200 to 400. That means 95% of customer conversations are never reviewed. It’s a black hole. When Heex Technologies asked us to fill that hole, we built automatic AI scoring on 100% of calls. What happened next was more interesting than the technology itself.

The problem: 5 calls listened to out of 200

Heex is a tech scale-up with 8 sales reps. Each makes 30 to 40 calls per week — demos, qualification, closing. The sales director, in the best case, listened to 10 calls per week. That is 4% visibility into activity.

Side effect: sales coaching was based on a biased sample. The manager often heard the same two or three reps and missed the best (and worst) practices of the others.

The brief: be able to score every call automatically, and turn that data into a tool for collective improvement.

The architecture we shipped

Capture: integration with the video tools (Zoom, Google Meet) and the CRM softphone. As soon as a call ends, the audio file is uploaded automatically.

Transcription: Whisper running locally for sensitive calls, OpenAI’s API for the rest. French accuracy: > 95%.

Scoring: an LLM workflow compares the call against the Heex sales script (discovery → qualification → demo → objections → closing). It produces a 0–100 score per stage plus a two-line summary.

Delivery: automatic push to the dashboard via an AI automation workflow. No re-entry, no Excel export.

The pivot: from “reporting” to “coaching”

At launch, the sales team was wary. The phrase “AI scoring” sounded like “automated surveillance”. And that’s legitimate: if scores are public, you kill morale in two weeks.

So we made a product call: by default, each rep sees their own scores in detail (with problematic passages highlighted), but the manager only sees aggregated averages. No public ranking, no direct comparison.

The result was surprising. After a few weeks, reps started listening to their lowest-scored calls themselves, identifying their own failure patterns (not digging into the decision, skipping budget qualification). It became a self-improvement tool, not a surveillance one.

Four mistakes to avoid in AI call scoring

Mistake 1 — Scoring without a script as a baseline. If you ask an LLM “is this rep good?” without a frame, you get noise. You need a reference script (even imperfect) to compare against. Otherwise, the score shifts with the model’s mood.

Mistake 2 — Making all scores public. Tempting from a management standpoint, disastrous from a team one. Reps will game the score (talk longer to tick boxes) and lose the spontaneity that wins deals.

Mistake 3 — Wanting an ultra-precise numerical score. The difference between a 72 and a 78 has no statistical meaning. Better to use three levels (“on protocol”, “partial”, “off-script”) than fake 0–100.

Mistake 4 — Forgetting GDPR on recordings. Your prospects must be informed at the start of the call that the conversation is recorded and analysed by an automated system. It’s a blocker if you skip it.

Extending beyond sales

The same architecture works far beyond sales calls. Customer support: score resolution and satisfaction. Onboarding (as in the Toshify case): score the quality of the AI qualification. HR: score interviews (with consent, obviously).

The pattern to remember: transcribe automatically, score against a frame, deliver to the right people at the right level of aggregation. It’s a reusable workflow, not a one-shot product.

What we take away

The real win at Heex is not the scoring itself. It’s moving scoring from a reporting tool to a coaching tool. That move comes down to a product detail: who sees what, at what level of aggregation. Miss that and you ship a technically correct but socially unusable tool.

Read the full Heex case