Northwind: a copilot users finally trust

An illustrative case study — a composite scenario, not a real client — built to show the shape and payoff of an engagement. Northwind is a mid-market B2B SaaS with a solid UX foundation and a new AI copilot that users were quietly ignoring.

The starting point

A 60-person software firm launched an AI copilot — and barely anyone was using it.

people — mid-market B2B SaaS

AI copilot, freshly launched

~15%

copilot adoption, and stalling

10 days

audit, end to end

Where it hurt

The model worked. Nobody trusted it.

The copilot’s suggestions were good — but users re-checked every one by hand, which defeated the point. A strong launch flattened into quiet non-adoption.

No way to see why the AI suggested something
No undo, so people were afraid to let it act
Support kept fielding “why did it do that?” tickets
Power users had gone back to doing it manually

Before and after: an opaque AI suggestion with no reasoning shown and no undo, versus the same suggestion with its reasoning revealed, a confirm gate, and an undo. The trust changed, not the model.

Same model. The trust changed.

What the ten days changed

We designed the trust, not the model.

The fix map was small and specific — all experience-layer work. No model retraining, no re-platforming.

Reasoning-on-demand on every suggestion — “why this?” answers inline
Confirm-before-act on the two consequential actions; everything else runs freely
A dry-run preview so users see the effect before they commit
The weakest flow — bulk actions — redesigned and handed to engineering

See the five lenses

The turning point

The model never changed. The trust did. That’s the whole job.

Modelled outcome — illustrative

After the fixes: far more people using it, fewer confused support tickets — shipped in a single sprint.

3×

copilot adoption

−40%

“why did it do that?” tickets

+22%

feature activation

1 sprint

to ship the fix map

What would an audit surface on your product?

Ten days, fixed scope. The gap between what you’d score and what we’d find is usually where the value is.

See the audit