BLOG
3. Juli 2026

Herbert's World #22

We are pleased to present issue No. 22 of the Mafo.de column “Herbert’s World”!
Our CEO, Herbert Höckel, discusses the importance of primary data, the future of online panels, and why it is essential to use the term “AI in the Loop” rather than “Human in the Loop”. Enjoy the read!

Primary data in the age of AI – gold or dust? On the dwindling value of raw data in the age of synthetic data and artificial intelligence

I must confess something to you – dear readers. Whilst I was recently preparing my presentation on AI in market research for the ESOMAR Community Circle Belgium, I found myself faced with a paradox that has been on my mind ever since. It is one of those uncomfortable issues that you cannot ignore, even though you would prefer to look the other way.

But as an industry, we cannot afford that. Here is the contradiction in question, in a nutshell:

Primary data are more valuable today than ever before.

Its perceived value is currently plummeting.

Both statements are true at the same time – and that is precisely where my problem lies.

The true value of primary data is increasing

Why has primary data become more valuable? Because, by comparison, all other data has become less reliable. The more synthetic output, generated personas and AI simulations flood the cloud, the more the original stands out. If anyone can claim to provide consumer insights – without ever having asked a real consumer – then the fact that you actually did ask becomes, all of a sudden, a unique selling point..

Jon Puleston, Chief Methodologist at Ipsos, has put this more precisely in his theorem on synthetic sample boosting than I ever could: if you artificially inflate a subgroup whilst assuming that any deviations from the main body of the sample are merely statistical noise, then you may well have ironed out the most interesting insight of the entire study. The researcher’s Schrödinger’s cat dilemma, so to speak.

Real data opens the box and allows us to look inside! Is the cat still alive, or is it already dead? Synthetic data, on the other hand, merely simulates what the real data might look like, but does not open the box. This is no small difference; it is the difference between knowledge and belief..

Image credit: Herbert Höckel

And at the same time, the price perceived by the market falls

Nevertheless – and this is where it really hits home for me – the value of primary data continues to erode. Not because primary data has become any worse, but because the question customers are asking themselves has changed: Do I even need this anymore??

The example of PepsiCo and Zappi brings this home: what used to be considered a time-consuming, expensive pre-test is now delivered by ADA – an AI system trained on ten years’ worth of standardised PepsiCo data – in a fraction of the time and at a tenth of the original cost. Not because ADA can work magic, but because the company has systematically structured, standardised and validated its own data over the years.

So if automated processes drive expectations of speed and perceived value through the roof or send them plummeting – who’s still going to pay the good old price for the good new data??

Online panels: a problem or an opportunity??

Confidence in online panels is currently under immense pressure, and quite rightly so. Their very nature makes them potential targets for fraudsters. This is nothing new; it is both widely known and alarming. I know what I’m talking about – and this certainly applies just as much to all my esteemed colleagues and competitors, both at home and abroad..

Here at moweb research, we have been running our own online panels for over two decades now, and from the very beginning we have had to invest heavily in maintaining data quality: in fraud-proof verification, robust panel hygiene, consistent fraud detection and ongoing validation. However, we certainly do NOT do this because it is easy or cheap, but simply because it is the only approach that works in the long term.

Anyone who cannot prove to their clients that there are real peopl

behind the figures simply no longer has a case to make in a world of

synthetic alternatives.

This is truly alarming for all of us: a study by CASE4Quality covering eight major sample providers in the English-speaking world revealed that 30 to 40 per cent of respondents provided poor-quality data. Around ten per cent were detected by fraud detection tools – the remaining 20 to 30 per cent were only identified through manual checks of open-ended responses and trap questions. To be more specific: three per cent of devices generate 19 per cent of all completed surveys – and 40 per cent of these devices take part in over 100 surveys a day, whilst passing all standard quality checks!

Unfortunately, this quality crisis is now also creating a vicious circle, because: declining data quality is forcing researchers to discard more and more data. The more data that is discarded, the larger the sample sizes required. Larger sample sizes cost more money, so researchers look for cheaper providers. Cheaper providers generally deliver lower quality, meaning that quality declines even further. And willingness to pay now falls as well, because the question arises: is this output still worth the money? Ultimately, the worst-case scenario ensues: the good providers – those who invest in quality – are squeezed out. Not because they are worse, but because they are better. The inferior providers win out and the market succumbs to a race to the bottom.

The famous imperial lower lip is back – as a Habsburg AI

There is yet another threat to the collection of primary data, and it is particularly insidious because it appears so tempting. The vast amounts of supposedly ‘freely’ available data (e.g. from social media) make it possible to generate an initial insight into almost any question at breakneck speed. Instead of the tedious, costly and time-consuming research involving real-life subjects. I do understand the temptation, but I do not share it at all.

For, with the flood of AI-generated content on the internet, the data landscape is turning into an opaque, almost unmanageable jumble. You do get answers – but you no longer know who provided them. Are they even from real people at all, or just simulations of them? The tech world refers to this as ‘data autophagy’ or ‘data cannibalism’ – personally, I prefer the term ‘Habsburg AI’, coined by Jathan Sadowski.:

Habsburg AI – a system that has been trained so extensively on

the output of other generative AIs that it has become an incestuous mutant,

ultimately taking on exaggerated, grotesque features.

One thing is clear: synthetic models only work as long as the questions remain closely aligned with the original survey context and use the original customer data. If you stray too far from this – or fail to update the data continuously with a critical volume – the model simply loses its footing..

#GIGO - Garbage in, garbage out.

Now, straight away, anywhere – and, best of all, for free.

So what is the solution??

After more than 30 years in the profession, I am no longer naive enough to believe that ‘doing better’ is a valid answer. That is why I avoid saying that and prefer to state quite directly what I think. And I am serious about all four of the following appeals:

Firstly: Quality must be demonstrable – not merely claimed. Methodological transparency, validation standards, clear documentation. This is precisely what ADM, BVM, ASI and DGOF call for in the AI guidelines of January 2026. Anyone who can demonstrate that their data is truly clean has evidence that no synthetic system can provide.

Secondly: Panel quality is the product.
Not the survey, not the questionnaire, not the dashboard. The key question that will set companies apart over the next five years is this: Are there really real people behind this data? Those who can confirm this beyond any doubt will win. Those who cannot will lose – and deserve to lose..

Thirdly: Original research must be adequately funded.
This is not just wishful thinking. It is a market necessity. Anyone who wants genuine data – collected objectively, methodologically sound and reliable in terms of content – must pay what it costs to survey real people. Panel maintenance costs money. Fraud detection costs money. Qualified researchers who ask the right questions cost money.

Fourthly – and I’m going to be very explicit here, because this issue really gets me worked up:
We must stop talking about ‘human-in-the-loop’. Because this concept is misguided in this context. It is not only inaccurate, it is also extremely dangerous, as it positions us, the researchers, as a corrective to the algorithm – effectively as quality controllers on the sidelines. In other words, as the ones who check things over one last time before the output is released.

That’s not our role! That’s a demotion!

The correct term is ‘AI-in-the-loop’, because AI is the tool and we are the experts. We determine which questions are asked. We define what constitutes good data. We decide what constitutes a valid insight and what is merely noise. Not the algorithm, not the dashboard and not the next fancy tech tool.

We need the courage to say clearly: WE are the experts here. Let AI do the calculations. We will do the thinking!

BTW, not every client is able to specify their information requirements or research hypothesis in concrete terms in advance – not even when dealing with an LLM-based "word-cube generator".

For me, this is precisely where the future of the researcher lies: not in producing data, but in formulating questions. That is the most valuable and indispensable part of our work – and no algorithm can take that over.

So, to conclude, I’d like to ask you – dear readers – a question:

If your client bases their valuable brand, their long-term strategy or their entire financial year on your results – can you then say, with a clear conscience, that this data truly reflects the facts?

If so: well done. If not, however, I strongly advise taking a step back to check whether this wonderful new technology has been given too much control over the processes in the meantime. Because …

A fool with a tool is still a fool.

A good researcher with objective, valid and reliable data – and the ability to ask the right questions? They are irreplaceable – now more than ever!

Sources:
https://www.linkedin.com/pulse/boosting-small-samples-schr%C3%B6dinger-cat-dilemma-jon-puleston-adjff/

Steve Phillips et al „The Consumer Insights Revolution“
https://www.amazon.de/-/en/Consumer-Insights-Revolution-Transforming-competitive/dp/1781338698

Case4Quality

https://nielseniq.com/global/en/insights/education/2023/does-everybody-lie-the-resiliency-of-survey-research-done-right/

https://www.greenbook.org/insights/data-quality-privacy-and-ethics/truth-to-be-told-five-realities-about-online-sample-that-compromise-data-quality

https://makingsciencepublic.com/2026/04/10/habsburg-ai-portrait-of-a-metaphor-and-its-family/

ADM/BVM/ASI/DGOF: AI-Guide
https://www.adm-ev.de/standards-richtlinien/#ki-leitfaden

Herbert Höckel ist geschäftsführender Gesellschafter hier bei bei der moweb research GmbH. Seit mehr als 25 Jahren ist er Marktforscher. 2004 gründete er die moweb GmbH, welche er bis heute als Inhaber führt. Die moweb aus Düsseldorf ist international tätig und eines der ersten deutschen, auf digitale Verfahren spezialisierte Marktforschungsinstitute.

Gerne können Sie sein Buch "Customer Centricity Mindset ® - Kunden wirklich verstehen, Disruption erfolgreich meistern" hier erwerben.

Ihr Erfolg, Unser Ziel!

Termin Angebot

BLOG3. Juli 2026

Herbert's World #22

BLOG
3. Juli 2026