All writing
Cluster D · Drift & trust

How do you measure trust in AI agent outputs?

Trust in AI agents shows up in what users do after the output, not in what they say. Here are four signals that measure it and one metric that rolls them up.

Amadin Ahmed9 min read

Your agent responds in under two seconds. The user reads the output. Then they pause. That pause, and what happens after it, is the closest thing to a trust signal your product has. Most teams never measure it. This is one of the core problems [product analytics for AI agents](/blog/what-is-locus) was built to solve.

Trust is the gap between what an AI agent produces and what a user is willing to act on without checking. In traditional software, trust is binary. The button works or it does not. In AI products, trust is a spectrum. The user reads the output, makes a judgment, and decides how much of their own time to invest verifying, editing, or redoing the work. That decision happens thousands of times a day across your user base. Most teams have no way to see it. They track completion rate, which tells them the system ran. They track thumbs-up ratings, which fewer than 8% of users submit (illustrative, based on Locus snapshot data). They miss the behavioural signal that sits between the output and the action. That signal is trust. And the reason it goes unmeasured is structural: why agents pass evals but still fail users explains the broader problem. This post is about the specific measurement layer.

What is trust in the context of an AI agent?

Trust is a user's willingness to act on an AI agent's output without additional verification. A user who copies the agent's email draft into Gmail and hits send without re-reading it has high trust. A user who reads the draft, rewrites the second paragraph, changes the subject line, and then sends it has low trust. Both users completed the task. Both show up as completed runs in Datadog or OpenTelemetry. The difference between them is invisible to every tool that stops at the system layer.

Trust is not the same as satisfaction. A user can be satisfied with a mediocre output if their expectations are low. A user can distrust a good output because the agent got it wrong last time. Satisfaction is a snapshot. Trust is a trend. It accumulates over interactions and erodes the same way. A single bad output does not destroy trust. Five bad outputs in a row do. And when trust erodes, the user does not file a bug report. They just start checking every output by hand.

Why is trust hard to measure in AI products?

Trust is hard to measure because it lives in user behaviour, not in user feedback. Most AI products have some form of explicit feedback: thumbs up, thumbs down, a star rating, a regenerate button. The problem is that fewer than 8% of interactions get any explicit feedback at all (illustrative). The other 92% are silent. The user reads the output, does something with it, and moves on. The signal is in what they did, not in what they said.

Trace stores like Langfuse, Braintrust, and LangSmith record the agent's side of the interaction. They show the prompt, the response, the tokens, the latency. They do not show what the user did after the response arrived. Product analytics tools like Mixpanel and Amplitude can track downstream events, but they were built for click-based products. They count that the user opened the editor. They cannot tell you whether the user rewrote the agent's output or used it as-is.

The result is a measurement gap. The team knows the agent responded. The team does not know whether the user trusted the response enough to act on it. The data is already there in most cases. It is sitting in the conversation logs, the edit histories, and the downstream actions. It just needs a layer that reads all of it together.

What are the four signals that measure trust?

Trust in AI agent outputs can be decomposed into four behavioural signals. Each one captures a different dimension of what the user does after the agent responds. None is sufficient alone. Together, they form a composite that tracks trust over time.

1. Acceptance rate.

Acceptance rate is the percentage of agent outputs the user acts on without requesting a regeneration or abandoning the conversation. It is the simplest trust signal and the easiest to instrument. A healthy production agent typically has an acceptance rate between 65% and 80% (illustrative). A drop of 5 or more points over two weeks is a leading indicator of trust erosion. The limitation is that acceptance does not mean the user used the output as-is. They may have accepted it and then rewritten half of it somewhere else.

2. Edit depth.

Edit depth measures how much the user modifies the agent's output before committing it. An edit rate below 10% means the user trusted the output almost entirely. An edit rate above 50% means the user used the agent as a starting point, not as a finisher. In early Locus snapshot data, 31% of completed runs showed edit rates above 50% (illustrative). Edit depth is the signal that separates genuine acceptance from shadow rework.

3. Time-to-trust-action.

Time-to-trust-action is the elapsed time between the agent delivering an output and the user taking the action the output was meant to enable. If the agent writes an email, time-to-trust-action is the gap between the draft appearing and the email being sent. If the agent generates code, it is the gap between the suggestion and the commit. A short time-to-trust-action means the user trusted the output enough to act quickly. A long one means the user spent time verifying, editing, or deliberating. Across production agents, a time-to-trust-action under 90 seconds correlates with high-trust interactions at roughly 0.7 (illustrative).

4. Shadow-rework rate.

Shadow-rework rate is the percentage of accepted outputs that the user substantially redoes in another tool. It is the hardest signal to measure and the most revealing. A user who accepts the agent's draft and then pastes it into Google Docs and rewrites the opening paragraph has shadow-reworked. The agent's metrics say success. The user's behaviour says partial failure. Shadow rework is a direct measure of distrust that is invisible to acceptance rate alone. For a deeper look at the mechanics, see what shadow rework is and how to detect it.

How do you turn these signals into a trust score?

A trust score is a per-cohort, per-week composite of the four signals above. It is not a single number for the whole product. Trust varies by behavioural cohort. Writers may trust the agent less than coders. Researchers may trust it more than support agents. The aggregate masks the cohort-level reality.

The simplest composite is a weighted average. Acceptance rate contributes positively. Edit depth and shadow-rework rate contribute negatively. Time-to-trust-action contributes positively when short, negatively when long. Bounded 0 to 100, a drop of 6 points week-over-week in any cohort is the threshold worth investigating (illustrative). That 6-point drop typically arrives 2 to 4 weeks before the same cohort's retention curve reflects the problem.

The value of the trust score is not the number itself. It is the delta. A cohort with a trust score of 62 is not necessarily in trouble. A cohort whose trust score dropped from 74 to 62 in three weeks is. The direction matters more than the level. And because the score is per-cohort, the team knows exactly which group of users to investigate.

Why does trust need to be measured per cohort?

Trust is not uniform across a user base. A coding agent might have a trust score of 78 among backend developers and 54 among data scientists. The aggregate is 68. That aggregate tells you nothing useful. The backend developers are happy. The data scientists are quietly checking every output by hand. If you build for the aggregate, you build for neither group.

Per-cohort trust also reveals which product improvements matter. If the data-science cohort's edit depth is 62% while the backend cohort's is 11%, the problem is not the model. The problem is the model's performance on data-science tasks specifically. That insight is invisible at the aggregate level. It becomes visible only when trust is decomposed by what users actually do, not by their plan tier or their company size.

How do you measure trust without relying on user feedback?

Explicit feedback (thumbs up, ratings, NPS) is useful when you get it. The problem is you rarely get it. The 8% feedback rate means 92% of interactions have no explicit signal at all. Building a trust model on the 8% and ignoring the 92% introduces sample bias worse than the sample of twenty that fails.

Behavioural trust signals do not require the user to do anything extra. Acceptance rate is computed from what the user already does: accept, reject, or regenerate. Edit depth is computed from the diff between the agent's output and the committed version. Time-to-trust-action is computed from timestamps already in the event log. Shadow-rework rate requires correlating the conversation with downstream actions, which is harder but still uses data the product already generates.

The advantage of behavioural signals over explicit feedback is coverage. You get a trust signal for every interaction, not just the ones where the user clicked a button. And the signal is more honest. Users rate generously. Their behaviour does not lie. A user who gives a thumbs-up but rewrites the output is telling you two different things. The behaviour is the one to believe.

What do you need to instrument to start measuring trust?

Most production AI agents already log enough data to compute at least two of the four trust signals. Here is the minimum instrumentation for each:

  • Acceptance rate. Log whether the user accepted, rejected, or regenerated each output. If your agent uses a tool-call pattern, log the tool outcome. Most agents built on OpenAI's API or Anthropic's tool use already have this in their trace store.
  • Edit depth. Log the agent's output and the final committed version. Compute the Levenshtein distance or token-level diff. If the user edits in your UI, this is straightforward. If they edit outside your UI, you need approach 3 or 4.
  • Time-to-trust-action. Log the timestamp of the agent's final response and the timestamp of the user's next meaningful action. This requires a definition of what counts as a trust-action for your product: sending an email, committing code, saving a document.
  • Shadow-rework rate. Correlate the agent's output with the downstream artifact. This is the hardest to instrument. If you cannot instrument it directly, time-to-next-session on the same topic is a proxy: a retry within 5 minutes correlates with rework at roughly 0.6 (illustrative).

You do not need all four on day one. Start with acceptance rate and edit depth. Add time-to-trust-action when you have the event pipeline. Add shadow-rework rate when you can correlate downstream. Or use a system that reads the conversation layer and computes all four, which is what product analytics for AI agents does out of the box.

Frequently asked questions.

How do you measure trust in AI agent outputs?

Trust is measured through four behavioural signals: acceptance rate (did the user use the output?), edit depth (how much did they change it?), time-to-trust-action (how long before they acted on it?), and shadow-rework rate (did they redo the work elsewhere?). Rolled up per cohort over time, these signals produce a trust score that moves weeks before retention does.

What is time-to-trust-action for an AI agent?

Time-to-trust-action is the elapsed time between an AI agent delivering an output and the user taking the action that output was meant to enable. For a writing agent, it is the time between the draft appearing and the email being sent. For a coding agent, it is the time between the suggestion appearing and the code being committed. A short time-to-trust-action correlates with high trust. A long one indicates verification or hesitation.

How do you know if users trust an AI agent?

You know by watching what they do, not what they say. Users who trust the agent act on outputs quickly, edit them lightly, and do not redo the work in another tool. Users who distrust the agent pause, edit heavily, retry, or move the work to a manual workflow. Explicit feedback like thumbs-up covers fewer than 8% of interactions (illustrative). Behavioural signals cover all of them. Product analytics for AI agents reads both layers.

How is trust different from satisfaction in AI products?

Satisfaction is a snapshot of how a user feels about one output. Trust is a trend built over many interactions. A user can be satisfied with a mediocre output if expectations are low. A user can distrust a good output because the agent failed them last time. Satisfaction is measured by ratings. Trust is measured by behaviour. For product decisions, the behavioural trend is more predictive of retention than any single rating.

Can you measure AI agent trust without user feedback?

Yes. Behavioural trust signals (acceptance rate, edit depth, time-to-trust-action, shadow-rework rate) do not require the user to click a feedback button. They are computed from data the product already logs: conversation outcomes, edit diffs, timestamps, and downstream actions. This gives you a trust signal for every interaction, not just the 8% that leave explicit feedback. For the mechanics of catching the rework signal, see what is shadow rework.

How does trust relate to AI agent retention?

Trust is a leading indicator of retention. When a cohort's trust score drops, the same cohort's retention typically follows 2 to 4 weeks later (illustrative). The delay exists because users do not leave the moment they stop trusting. They start checking outputs, editing more, and using the agent less. By the time the retention chart moves, the trust erosion has been underway for weeks. Tracking trust per cohort gives the product team a window to intervene before retention drops.

Tagged
ai agent trust metricsmeasure trust AI agenttime-to-trust-actionAI agent acceptance rateai agent silent failureai agent edit rateproduction agent user valueai product analyticsmeasure AI agent valueAI agent metricshow to know if users trust AI agentai agent success without user feedbackagent value visibilitycompleted run vs user value
Done reading? Try Locus on your own runs

See what every user of your agent does.

Pick a time. We'll walk through what a snapshot would look like for your product, on your terms.