How to audit an AI product in 10 days: the five lenses

"Is this AI product good?" is the wrong first question.

It is too broad. It invites taste, opinion, demo theatre, benchmark talk, and whatever the loudest stakeholder already believes. A better question is more useful:

Where does the product lose the user's ability to understand, steer, trust, or recover from the AI?

That is the question an AI product audit should answer.

The output should not be a 60-page report full of generic responsible AI principles. It should be a ranked fix map. It should tell the team which parts of the experience are breaking trust, which risks matter, and what to change in the next sprint.

WFK uses five lenses for that audit.

Why five lenses?

There is no shortage of AI guidance. Microsoft's HAX Toolkit gives evidence-based guidance for AI user experiences. Google's People + AI Guidebook provides practical human-centered AI guidance. NIST's AI Risk Management Framework helps organizations incorporate trustworthiness into AI design, development, use, and evaluation.

The problem is not that teams lack principles. The problem is that principles often fail to become product decisions.

An audit method has to be small enough to use under pressure. Five lenses are enough to cover the major experience failures without turning the work into a compliance exercise.

Lens 1: Planning visibility

Can the user see what the AI intends to do before it acts?

This lens matters most when an AI feature performs multi-step work. A user does not only need a final answer. They need to understand the route.

Look for:

hidden plans
vague loading states
agents that jump straight from prompt to action
no opportunity to inspect or adjust the plan
no separation between thinking, drafting, previewing, and committing

Score this low when the user discovers the AI's behavior only after the outcome appears.

Lens 2: Action disclosure

When the AI acts, is the action legible, attributable, and inspectable?

AI products often blur responsibility. The system updates a record, sends a message, changes a value, or rejects an item, but the product does not make clear what happened, why it happened, and under whose authority.

Look for:

missing audit trails
no record of the input or context used
unclear human versus AI ownership
no distinction between suggested and committed actions
actions that are visible only as final state, not as events

Score this low when a user or reviewer cannot reconstruct what the AI did.

Lens 3: Memory and context

Does the product surface what the AI knows, remembers, and assumes?

Memory creates power and risk. Users need to know when the AI is using previous interactions, customer data, project context, uploaded files, or inferred preferences. Hidden memory can make an AI feel smart until it feels invasive or inexplicable.

Look for:

context that cannot be inspected
stale or wrong memory with no correction path
data sources hidden behind generic "based on your information" language
no way to separate current-task context from persistent memory
uncertainty presented as knowledge

Score this low when the AI's context is invisible or uneditable.

Lens 4: Recovery and reversal

When the AI is wrong, can the user catch it, correct it, undo it, and carry on?

This is the lens many AI products fail hardest. They design for the impressive path and underdesign the wrong path.

Look for:

irreversible one-click actions
no undo window
no dry-run preview
no safe rollback
no way to correct the AI and preserve the workflow
errors that require support tickets or manual database fixes

Score this low when a plausible AI mistake becomes expensive.

Lens 5: Explanation depth

Does "why?" have an answer at the moment the user needs one?

Not every AI output needs a full explanation. Overexplaining can be noise. But consequential, surprising, low-confidence, or contested outputs need an accessible rationale.

Look for:

confidence without evidence
explanations that repeat the answer rather than justify it
chain-of-thought theatre instead of useful rationale
no source trail
no link between explanation and action

Score this low when users cannot challenge the system intelligently.

How to run the audit

Start with one AI workflow, not the whole product. Choose the workflow that matters commercially: activation, retention, operational throughput, customer trust, or risk.

Day 1: define the workflow and success criteria.

Days 2-3: inspect the live product and map the AI journey.

Days 4-5: score the five lenses with evidence screenshots and notes.

Days 6-7: identify the trust-breaking moments and map fixes.

Days 8-9: redesign the highest-risk flow or gate.

Day 10: deliver the findings, priorities, and walkthrough.

The audit should produce three things:

A five-lens scorecard.
A ranked fix map.
One redesigned flow that makes the method visible.

Why this matters commercially

Gartner predicted at least 30% of GenAI projects would be abandoned after proof of concept by the end of 2025 because of poor data quality, inadequate risk controls, escalating costs, or unclear business value.

An AI product audit cannot solve every one of those problems. But it can expose the experience-level version of them early: the unclear value moment, the missing control, the hidden risk, the action users do not trust enough to adopt.

That is the point. The audit is not a grade. It is a decision tool.

The WFK position

AI audits should be practical, product-shaped, and close to the work. The goal is not to admire the complexity of AI. The goal is to make the next product decision clearer.

If a score does not change a roadmap, it is decoration.