Client login
AI Product Audit

How to audit an AI product in 10 days: the five lenses

A five-lens AI product audit method for finding trust, oversight, and recoverability problems in AI features before they become adoption problems.

L1Planning visibility2/4
L2Action disclosure3/4
L3Memory & context1/4
L4Recovery & reversal2/4
L5Explanation depth3/4

"Is this AI product good?" is the wrong first question.

It is too broad. It invites taste, opinion, demo theatre, benchmark talk, and whatever the loudest stakeholder already believes. A better question is more useful:

Where does the product lose the user's ability to understand, steer, trust, or recover from the AI?

That is the question an AI product audit should answer.

The output should not be a 60-page report full of generic responsible AI principles. It should be a ranked fix map. It should tell the team which parts of the experience are breaking trust, which risks matter, and what to change in the next sprint.

WFK uses five lenses for that audit.

Why five lenses?

There is no shortage of AI guidance. Microsoft's HAX Toolkit gives evidence-based guidance for AI user experiences. Google's People + AI Guidebook provides practical human-centered AI guidance. NIST's AI Risk Management Framework helps organizations incorporate trustworthiness into AI design, development, use, and evaluation.

The problem is not that teams lack principles. The problem is that principles often fail to become product decisions.

An audit method has to be small enough to use under pressure. Five lenses are enough to cover the major experience failures without turning the work into a compliance exercise.

Lens 1: Planning visibility

Can the user see what the AI intends to do before it acts?

This lens matters most when an AI feature performs multi-step work. A user does not only need a final answer. They need to understand the route.

Look for:

  • hidden plans
  • vague loading states
  • agents that jump straight from prompt to action
  • no opportunity to inspect or adjust the plan
  • no separation between thinking, drafting, previewing, and committing

Score this low when the user discovers the AI's behavior only after the outcome appears.

Lens 2: Action disclosure

When the AI acts, is the action legible, attributable, and inspectable?

AI products often blur responsibility. The system updates a record, sends a message, changes a value, or rejects an item, but the product does not make clear what happened, why it happened, and under whose authority.

Look for:

  • missing audit trails
  • no record of the input or context used
  • unclear human versus AI ownership
  • no distinction between suggested and committed actions
  • actions that are visible only as final state, not as events

Score this low when a user or reviewer cannot reconstruct what the AI did.

Lens 3: Memory and context

Does the product surface what the AI knows, remembers, and assumes?

Memory creates power and risk. Users need to know when the AI is using previous interactions, customer data, project context, uploaded files, or inferred preferences. Hidden memory can make an AI feel smart until it feels invasive or inexplicable.

Look for:

  • context that cannot be inspected
  • stale or wrong memory with no correction path
  • data sources hidden behind generic "based on your information" language
  • no way to separate current-task context from persistent memory
  • uncertainty presented as knowledge

Score this low when the AI's context is invisible or uneditable.

Lens 4: Recovery and reversal

When the AI is wrong, can the user catch it, correct it, undo it, and carry on?

This is the lens many AI products fail hardest. They design for the impressive path and underdesign the wrong path.

Look for:

  • irreversible one-click actions
  • no undo window
  • no dry-run preview
  • no safe rollback
  • no way to correct the AI and preserve the workflow
  • errors that require support tickets or manual database fixes

Score this low when a plausible AI mistake becomes expensive.

Lens 5: Explanation depth

Does "why?" have an answer at the moment the user needs one?

Not every AI output needs a full explanation. Overexplaining can be noise. But consequential, surprising, low-confidence, or contested outputs need an accessible rationale.

Look for:

  • confidence without evidence
  • explanations that repeat the answer rather than justify it
  • chain-of-thought theatre instead of useful rationale
  • no source trail
  • no link between explanation and action

Score this low when users cannot challenge the system intelligently.

How to run the audit

Start with one AI workflow, not the whole product. Choose the workflow that matters commercially: activation, retention, operational throughput, customer trust, or risk.

Day 1: define the workflow and success criteria.

Days 2-3: inspect the live product and map the AI journey.

Days 4-5: score the five lenses with evidence screenshots and notes.

Days 6-7: identify the trust-breaking moments and map fixes.

Days 8-9: redesign the highest-risk flow or gate.

Day 10: deliver the findings, priorities, and walkthrough.

The audit should produce three things:

  1. A five-lens scorecard.
  2. A ranked fix map.
  3. One redesigned flow that makes the method visible.

Why this matters commercially

Gartner predicted at least 30% of GenAI projects would be abandoned after proof of concept by the end of 2025 because of poor data quality, inadequate risk controls, escalating costs, or unclear business value.

An AI product audit cannot solve every one of those problems. But it can expose the experience-level version of them early: the unclear value moment, the missing control, the hidden risk, the action users do not trust enough to adopt.

That is the point. The audit is not a grade. It is a decision tool.

The WFK position

AI audits should be practical, product-shaped, and close to the work. The goal is not to admire the complexity of AI. The goal is to make the next product decision clearer.

If a score does not change a roadmap, it is decoration.