Client login

Northwind: a copilot users finally trust

An illustrative case study — a composite scenario, not a real client — built to show the shape and payoff of an engagement. Northwind is a mid-market B2B SaaS with a solid UX foundation and a new AI copilot that users were quietly ignoring.

The starting point

A 60-person software firm launched an AI copilot — and barely anyone was using it.

60
people — mid-market B2B SaaS
1
AI copilot, freshly launched
~15%
copilot adoption, and stalling
10 days
audit, end to end
Where it hurt

The model worked. Nobody trusted it.

The copilot’s suggestions were good — but users re-checked every one by hand, which defeated the point. A strong launch flattened into quiet non-adoption.

  • No way to see why the AI suggested something
  • No undo, so people were afraid to let it act
  • Support kept fielding “why did it do that?” tickets
  • Power users had gone back to doing it manually
Before and after: an opaque AI suggestion with no reasoning shown and no undo, versus the same suggestion with its reasoning revealed, a confirm gate, and an undo. The trust changed, not the model.

Same model. The trust changed.

What the ten days changed

We designed the trust, not the model.

The fix map was small and specific — all experience-layer work. No model retraining, no re-platforming.

  • Reasoning-on-demand on every suggestion — “why this?” answers inline
  • Confirm-before-act on the two consequential actions; everything else runs freely
  • A dry-run preview so users see the effect before they commit
  • The weakest flow — bulk actions — redesigned and handed to engineering
The turning point
The model never changed. The trust did. That’s the whole job.
Modelled outcome — illustrative

After the fixes: far more people using it, fewer confused support tickets — shipped in a single sprint.

copilot adoption
−40%
“why did it do that?” tickets
+22%
feature activation
1 sprint
to ship the fix map

What would an audit surface on your product?

Ten days, fixed scope. The gap between what you’d score and what we’d find is usually where the value is.