In most sales teams we meet, a manager listens to 5 to 10 calls per week. And the team makes 200 to 400. That means 95% of customer conversations are never reviewed. It’s a black hole. When Heex Technologies asked us to fill that hole, we built automatic AI scoring on 100% of calls. What happened next was more interesting than the technology itself.
The problem: 5 calls listened to out of 200
Heex is a tech scale-up with 8 sales reps. Each makes 30 to 40 calls per week — demos, qualification, closing. The sales director, in the best case, listened to 10 calls per week. That is 4% visibility into activity.
Side effect: sales coaching was based on a biased sample. The manager often heard the same two or three reps and missed the best (and worst) practices of the others.
The brief: be able to score every call automatically, and turn that data into a tool for collective improvement.
The architecture we shipped
Capture: integration with the video tools (Zoom, Google Meet) and the CRM softphone. As soon as a call ends, the audio file is uploaded automatically.
Transcription: Whisper running locally for sensitive calls, OpenAI’s API for the rest. French accuracy: > 95%.
Scoring: an LLM workflow compares the call against the Heex sales script (discovery → qualification → demo → objections → closing). It produces a 0–100 score per stage plus a two-line summary.
Delivery: automatic push to the dashboard via an AI automation workflow. No re-entry, no Excel export.
The pivot: from “reporting” to “coaching”
At launch, the sales team was wary. The phrase “AI scoring” sounded like “automated surveillance”. And that’s legitimate: if scores are public, you kill morale in two weeks.
So we made a product call: by default, each rep sees their own scores in detail (with problematic passages highlighted), but the manager only sees aggregated averages. No public ranking, no direct comparison.
The result was surprising. After a few weeks, reps started listening to their lowest-scored calls themselves, identifying their own failure patterns (not digging into the decision, skipping budget qualification). It became a self-improvement tool, not a surveillance one.
Four mistakes to avoid in AI call scoring
Mistake 1 — Scoring without a script as a baseline. If you ask an LLM “is this rep good?” without a frame, you get noise. You need a reference script (even imperfect) to compare against. Otherwise, the score shifts with the model’s mood.
Mistake 2 — Making all scores public. Tempting from a management standpoint, disastrous from a team one. Reps will game the score (talk longer to tick boxes) and lose the spontaneity that wins deals.
Mistake 3 — Wanting an ultra-precise numerical score. The difference between a 72 and a 78 has no statistical meaning. Better to use three levels (“on protocol”, “partial”, “off-script”) than fake 0–100.
Mistake 4 — Forgetting GDPR on recordings. Your prospects must be informed at the start of the call that the conversation is recorded and analysed by an automated system. It’s a blocker if you skip it.
Extending beyond sales
The same architecture works far beyond sales calls. Customer support: score resolution and satisfaction. Onboarding (as in the Toshify case): score the quality of the AI qualification. HR: score interviews (with consent, obviously).
The pattern to remember: transcribe automatically, score against a frame, deliver to the right people at the right level of aggregation. It’s a reusable workflow, not a one-shot product.
What we take away
The real win at Heex is not the scoring itself. It’s moving scoring from a reporting tool to a coaching tool. That move comes down to a product detail: who sees what, at what level of aggregation. Miss that and you ship a technically correct but socially unusable tool.
Read the full Heex case