Blog

June 11, 2026

How to Earn Trust in a Synthetic World

Synthetic data is not hard to generate. It is hard to know when to trust.

AI can now create modelled audiences, synthetic respondents, and fast outputs that look like research. The risk starts when those outputs move through the business as if they are evidence.

A synthetic number can look clean and confident before it has earned that confidence. It appears in a chart, gets copied into a deck, and is repeated in a senior meeting. By then, the caveat has often got smaller while the number has got stronger.

A modelled estimate becomes “the data.”
A directional signal becomes “the finding.”
A synthetic answer becomes part of the decision.

That happens because businesses are under pressure to move quickly. Synthetic data offers speed, breadth, and a way to keep moving when real-world research is slow, expensive, or hard to deliver in time.

But speed creates its own trust problem. If synthetic data is going to have a serious place in research, teams need to be clear about what it can support, what it can’t, and when real people still need to be in the loop.

In Season 2, Episode 14 of Research Revolutionaries, James “JT” Turner, Founder and CEO at Delineate, speaks with John-William Awbrey, Head of Brand & Campaign Insights at Sky, about the agency of tomorrow and how synthetic data is changing the relationship between evidence, risk, and decision-making.

Teams need to decide whether a synthetic audience can get them “70%, 80% of the way to an answer,” says John-William, or whether the question needs “more real-life experience” from human respondents.

The right starting point? Synthetic data should not be judged as good or bad in the abstract, but judged by the decision it’s being used to support.

You can watch or listen to the full podcast episode here: https://www.research-revolutionaries.com/e14-the-insight-agency-of-tomorrow-why-perspective-matters-more-than-data/  

The Problem with “The Number”

Research has never been free from uncertainty. A survey result is not “the truth” in some absolute sense; it is one view of a population, taken from a sample, at a point in time. Good researchers know that. They know where the sample is strong, where it is thinner, what the margin of error might imply, and which cuts of the data should be handled carefully.

The problem with synthetic data is that it can make uncertainty less visible.

A synthetic output can arrive looking finished. There is no obvious fieldwork process behind it, no messy respondent trail, no awkward verbatims, no visible reminder that real people are inconsistent, contradictory, distracted, emotional, and sometimes just hard to predict. The result can feel smoother than natural data, and that smoothness can create confidence before the work has earned it.

This becomes especially risky when the output is quantitative. “People tend to believe numbers way more than they do qualitative work,” says John-William. That feels true in most businesses. A piece of qualitative feedback invites debate. People ask whether it sounds believable, whether it reflects the audience, and whether it is just one person’s view. A number tends to land differently. It feels measured, comparable, or even like something has been tested.

That’s why a synthetic number needs more care, not less.

A modelled score or percentage may be useful, but it has to keep its context.

Was it generated from a synthetic audience?

Was it used to augment natural data?

Was it intended as an early directional signal, or is someone about to use it as evidence for a decision?

Those distinctions cannot sit quietly in the methodology note, because once a number travels through the business, the methodology often gets left behind.

This is how overtrust happens, through small simplifications. A modelled estimate becomes a result, a result becomes a finding, and a finding becomes the line someone repeats in the meeting.

Synthetic numbers shouldn’t be allowed to behave like natural data unless they have been tested, labelled, and explained properly. If the output is directional, call it directional. If it is modelled, call it modelled. If it is only strong enough to narrow the options, do not let it become the final answer.

The danger is not synthetic data itself – it’s misplaced confidence.

When “Close Enough” Is Enough, and When It Isn’t

This is where synthetic data needs a risk lens.

Not every business question needs the same level of certainty. Some decisions are early, reversible, and low-cost. Others are expensive, visible, hard to undo, or likely to shape the direction of a campaign, product, or brand. Treating all of those decisions as if they need the same evidence is unrealistic, but treating them all as if a synthetic answer is enough is risky.

A synthetic audience can get teams to a “70%, 80% of the way to an answer,” supports John-William, without that percentage, however, translating to the accuracy of the synthetic data. That’s just a practical way of talking about directional usefulness: enough to move the work forward, narrow the options, or work out what needs proper human testing next.

That can be a good role for synthetic data. If a team is exploring early message territories, for example, a synthetic audience might help identify which routes feel clearer, which claims are too similar, or which articulations are unlikely to add anything new. It might help a team move from a messy longlist to a sharper shortlist. That doesn’t mean the model has chosen the winning message, just that the team can go into the next stage with fewer, better options.

Used like that, synthetic data is not replacing evidence, just improving the input to the evidence.

The problem starts when the same level of confidence is used for a bigger decision. A synthetic output that is good enough to shortlist ideas isn’t automatically good enough to approve a major campaign. A modelled response that helps shape a discussion is not the same as real-world validation. A directional signal that helps a team move forward internally shouldn’t become the proof point for a board-level decision unless it’s been tested against natural data.

So the question is not whether we trust the synthetic data or not, but what we are trusting it to do.

Trust it to help us explore? Maybe.

Trust it to narrow a longlist? Often.

Trust it to replace human response in a high-stakes decision? Much less likely.

The decision should set the evidence standard. If the cost of being wrong is low, synthetic data may be enough to move the work forward. If the cost of being wrong is high, the business needs more than a clean modelled answer. That’s basic decision hygiene.

So, it’s not about using synthetic data everywhere or rejecting it completely. It’s about knowing where it belongs in the workflow, what level of confidence is reasonable, and when the work needs to come back to real people. Because in the end, the risk is using the view that synthetic data gives you for the wrong kind of decision.

The Hard Part Is Validation

The hard part is not producing synthetic data, but proving when it should be trusted.

That becomes complicated because synthetic data is often most attractive in situations where validation is hardest: low-incidence audiences, niche segments, hard-to-reach buyers, or markets where fieldwork takes too long.

That creates an awkward loop. You use synthetic data because real data is difficult to get, but you still need real data to know whether the synthetic output is any good. And if you need enough real data to validate it properly, the obvious question is: why not just collect the real data in the first place?

Unfortunately, there’s no perfect answer to that.

Synthetic data doesn’t need to be perfect before it can be useful. That would stop useful experimentation. But teams do need to understand what it has been checked against, where it seems to hold up, and where confidence is weaker.

A synthetic output that looks plausible in aggregate may not hold up when it is sliced by audience, context, behavior, or market. It might mimic broad distributions while failing on the combinations that matter most. It might reflect past patterns well but struggle with something new. It might give an answer that feels sensible but misses the real-world friction that would change the decision. That’s usually where confidence needs to be earned, not assumed.

And that’s also why validation can’t be a one-off comfort check. A model that helped sort messages last quarter may not behave the same way after a campaign, a competitor move, a price change, or a shift in consumer behavior. The output may still look clean, but the world it is trying to represent may have moved on.

So validation has to stay connected to real-world change. When the market shifts, when other signals disagree, or when the output is about to support a bigger decision, the work needs to come back to natural data.

Where Synthetic Data Can Help

The safest use of synthetic data is not replacing human research, but making human research better.

Message testing is a good example. Campaign teams can end up with too many routes, claims, lines, and small variations of the same thought. Some are meaningfully different. Many are not. Testing every version with real respondents can be expensive, repetitive, and not always useful. This is where synthetic data can help, if its role is clear. Not there to give the final answer, but help teams think about the construct.

Which messages are actually different?

Which claims overlap? Which versions are weaker?

Which ones deserve to go into proper human testing?

John-William describes the aim as moving from “35 messages” to “seven that we were far more confident in.” That’s a much better place to start a human test. You’re not asking real respondents to react to every possible articulation. You’re using synthetic support to reduce clutter before the real evidence stage begins.

That is a different kind of value, which helps teams spend their research time where it matters, rather than spreading it across too many weak or repetitive options.

It also challenges the idea that brands should simply “test everything” live. That may sound efficient, especially in digital environments, but it’s not cost-free. Media spend is expensive, poorly performing variants have a cost, and customer experience or even brand equity can be affected.

A disciplined synthetic step can help teams move faster without pushing every uncertainty into the real world. But the sequence matters: explore, narrow, validate, then act. Skip the validation step, and the whole thing becomes much harder to trust.

Trust Comes From Boundaries

Synthetic data will only earn trust if its boundaries stay visible. That means being clear when synthetic data has been used, what role it played, and what it should not be used for.

The risk is that the caveat travels less well than the output. The method is explained in the first meeting, but forgotten in the second. The limitation sits in the methodology note, while the number moves into the summary. The research team may know exactly what the output means, but the wider business may only remember the clean answer.

A responsible synthetic workflow should make a few things obvious:

whether the data is natural, synthetic, or a mix

whether the output is exploratory or decision-supporting

what it has been checked against

when human validation is required

“We still need human respondents,” argues John-William. Synthetic data can help teams move faster, but it can’t replace the contact with real people that keeps research grounded. Real people bring the friction models can smooth out. They misunderstand things, they contradict themselves, react emotionally, notice details the team did not expect. That messiness is not a problem to remove, but part of what research is trying to understand.

Synthetic data can help teams explore, narrow, and sharpen. But when the stakes are high, and the business needs confidence in how people will actually respond, the work needs to come back to the real world.

That’s how synthetic data earns trust: by being used for the right job, with the right caveats, and with enough discipline to know when “close enough” is no longer enough.

You can watch or listen to the full podcast episode here: https://www.research-revolutionaries.com/e14-the-insight-agency-of-tomorrow-why-perspective-matters-more-than-data/