Qualitative Researchers Are Not OK: adapting to the use of AI by research respondents

This post first appeared on the MERL Tech Initiative blog, which I recommend following if you like interesting, intelligent writing at the intersection of digital development, AI, and MERL.

Back in 2024, a mere year ago (and yet light years away in AI terms), Becky Zelikson, a member of the AI Ethics and Governance Working Group (EGWG) at the NLP-CoP, flagged yet another micro-phenomenon linked to the advent of GenAI: the detection of AI-generated responses in survey data. During a subsequent live chat on Slack, members exchanged questions and theories: How widespread was this practice in the context of research in LMICs? Is this the end of remote research? Should we see this as an indicator of ‘fraudulent’ behaviour, or as a marker of cultural differences when it comes to the legitimacy of opinions vs ‘right’ answers? Or was this emerging phenomenon a positive leveller, in that it enabled respondents to overcome language barriers? 

These questions offered concerning but fascinating avenues for exploration – but possibly less pressing than some of the big-picture questions around AI use which we’ve explored and documented on the MTI blog

However, there’s nothing like first-hand experience to put the fire in your belly, and during some recent qualitative research, where I hosted a WhatsApp group for 4 weeks with women in 2 African countries, I had to contend with AI usage by respondents personally. A handful of participants were given away by, yes, those pesky em dashes, but also the length of their responses, the lack of spelling errors, the systematic use of bullets and emojis, and of course the idiosyncratic sentence structure and tone that we all love to hate (unless like me you’ve modded your GPT to sound like a cross between Greta Thunberg and Jennifer Coolidge). Colloquialisms were sometimes present (including the use of Pidgin or local expressions), but jarred with the rest of the text.

When I briefly documented this on LinkedIn, the post generated so much traction (by my modest standards) that we realised it was high time to dedicate a conversation to this topic – and used the Ethics and Governance Working Group as a platform to bring together qualitative research practitioners in an informal event to discuss the Whats, Whys and Hows.

What is going on when it comes to AI use by research respondents?

Based on our discussions, the use of AI to answer remote researcher questions delivered in both text, voice, and video forms is a global phenomenon, happening not just in the Global North but also in LMICs (in my direct experience, in both Eastern and Western Africa at least). This is backed up by growing academic literature and blogs, including a recent post in Data & Society documenting the detection of AI usage in US research into mental health chatbots (as if things weren’t meta enough!), and this article on the detection of AI usage in a New Zealand study onto online dating.

In addition, participants at the event shared specific examples of how AI responses have affected their work. For example, one participant working in a foundation described the realisation that several responses to a survey on health care had been AI generated: “There were a lot of answers that were nonsensical. There were batches of surveys coming in at a very similar time from similar or the same IP addresses; there were patterns of respondents never answering any of the open ended questions (which was always the quickest way through the survey). Our team thinks there were either bots or professional kind of survey takers”

The process of retroactively trying to identify what could be bogus responses is not a straightforward one and it comes with its own set of challenges and impact to the research process: “When do you throw out something? (…) Is research going to become us guessing at what the responses should have been [without AI and bots]? And then throwing out things that seem off? How does that allow for the fact that you could actually have a surprising finding of something unexpected?” The organization is now dealing with several dilemmas because of this: From having to consider refielding surveys due to questionable data, which inevitably comes at a significant cost to the project; to developing strategies to include even more data checks in their already sophisticated collection process.

More research is needed to understand the extent of the phenomenon, but these examples suggest that AI is increasingly being used by (some) respondents to:

  • Identify useful paid research work (note, this has been happening for a while!).

  • Complete eligibility screening forms.

  • Pre-empt the types of questions they might be asked and prepare responses.

  • In real-time:

    • Make up answers entirely.

    • Create an answer based on a few loose thoughts.

    • Improve or refine pre-written answers.

  • Answer entire surveys (quant and qual).

  • Answer questions on why they were using AI within the research group (yes!)

Whilst questions of trust, authenticity, and the existence of systematically fraudulent practices are not new in qualitative research (as per this 30-year-old academic article “Lies My Informants Told Me”), the existence of GenAI makes the potential of ‘inauthenticity’ (which, I believe, is our main worry) exponentially more likely. The use of AI in group research settings can also deepen any existing inequalities across participants, making less literate and less digitally literate members even more shy to contribute when faced with the wisecracking and flashy responses of some of their peers. We can only expect this to grow as more and more individuals across the globe start using genAI…

Why might it be happening? The case for fraud vs agency

…and why shouldn’t they, especially when researchers themselves are doing the same

It’s easy to react with fear, especially as this can lead to major and unexpected increases in cost, time, and emotional stress for research teams, as a number of our participants pointed out. My stomach flipped when I realised it was happening to me, not least because I feared it wouldn’t have happened if I had been able to conduct the research in a local language.

However, as with many things, reacting with curiosity is probably more useful. Our discussion revealed many ideas around possible motivations for the use of AI by respondents. Understanding these is the key to pre-empting and responding:

  • Time constraints, and the lure of speed: using AI to craft an answer faster, or blast through questions will speed things up, which let’s face it, is both useful and satisfying.

  • Boredom: some questions can feel boring or irrelevant to users – why shouldn’t they use the tools at their disposal as we might do with a particularly wordy report (or blog…)

  • Confusion: some questions or tasks might feel unclear, and out of concern for getting things ‘wrong’, especially in group settings, respondents might seek clarity via AI.

  • Language barriers: if the researcher doesn’t speak the preferred language, AI is a legitimately useful way to bridge that gap (and the same is true for research bringing together different people within 1 country, or across multiple countries).

  • Cultural expectations: if respondents grow up in a community where the opinions and thoughts of individuals are not as valued as they can be elsewhere, or where knowledge and discussion are approached more didactically, the fear of giving a ‘wrong’ answer can be higher. It’s telling that surveys are often referred to as ‘quizzes’, in many African countries I’ve worked in.

  • A desire to be understood: conveying complex thoughts or experiences to a researcher can be hard, especially for those not used to doing so. Having a helping hand can feel empowering, and a positive act to ensure their voice is heard.

  • Fun & showboating: It’s fun to use new technology! And maybe transparent use is the point in some settings – by using AI visibly, participants may be showing off their digital skills.

  • Presentability: Silva Ferretti recently pointed out that ‘presentability’ is also a motivating factor for many of us using AI within the development sector. At the individual level, maybe in the same way that a person might want to fix their hair and put on a clean shirt before an outsider takes a photo of them, they want to put their best foot forward when a researcher asks them questions. It could be a question of dignity. 

  • Playing the System: the System usually wins. Why shouldn’t I game it? As the authors of ‘chatbots in disguise’ pointed out, ‘deception can also be a survival strategy’.

In my own research, I probed the question of AI use and researcher/researchee inequality a bit further, only for my respondent to laugh, and show me a screenshot of an email to her auntie. It had been written by chatGPT: she told me it was simply a fast and convenient way to get her thoughts down and ensure her message carried across, especially in contexts where she cared.

Whilst it’s of course important to use the emergence of AI in research to reignite bigger questions of power dynamics (and in development work), putting down its usage to a binary of language barriers vs wilful deceit, is probably reductive, and removes agency from the complex and evolving decisions made by respondents.

How can researchers adapt & respond?

With that being said, researchers clearly need to develop new strategies to both anticipate and respond to AI usage, and fast. As one attendee pointed out, even deeply sophisticated and experienced researchers were being caught on the back foot, and were really struggling to adapt to this new paradigm. 

Some of the ideas that emerged from our discussion and wider reading were:

  • Go back to basics! Maybe remote research is impossible now, and only face-to-face will do. In principle, I support this sentiment, but experiences in other sectors, such as education, and even in our day-to-day lives, show us that it’s not that simple: more and more people see it as socially acceptable to use GenAI right in the middle of a face-to-face conversation. 

  • Double down on local language research:  not only is this the ‘right’ thing to do, whilst local language models are so poor, the use of AI to support conversations will be lessened… maybe… for now…

  • Manage expectations and adapt research briefs: When seeking funding for research, it could be wise to include a ‘sub-theme’ relating to the use of AI within research, and what this could help us discover about the target audience. For example, the team researching mental health bots were forced to pivot, but discovered some fascinating insights into informal gig economies.

  • Plan harder and longer: Anticipate that respondents will use AI and plan for how to mitigate that – whether it’s preparing scripts to gently challenge its usage, spending more time on identifying data that is AI-generated, and developing a policy on how to treat data that was likely AI-generated.

  • Recruit smarter: Avoid online recruitment platforms that can be ‘gamed’. One participant noted that most internet traffic is generated by bots, which has been further accelerated by AI, but not caused by it.

  • Get stricter:  impose technical restrictions and consent wording that forbids or ringfences the use of AI. However, the risk here is of coming across as autocratic and unfair even before the research has started, which will probably undermine the participation of respondents who weren’t going to use AI.

  • Be agile: frequent spotchecks, rather than waiting for the data to come in, will allow teams to spot potential problems early.

  • Build in triangulation: validating insights via different methodologies has never been so important, after all, “If you know that what you receive has been validated at the grassroots with sound processes, the gloss of AI becomes less concerning.”

  • Open up dialogue: open, non-judgmental conversations with the group or individuals about their AI usage will help you understand the extent to which data they generate contains ‘legitimate’ insights. For example, in my own research, it was clear that 99% of one respondent’s inputs were AI-generated, giving me little reason to prioritise their inputs in my write-up. For others, though, I gained more confidence thanks to our other interactions, that what was being said under the gloss of AI was ‘authentic’, even if the way it was said was not.

  • Be creative and destructive: In our original LinkedIn thread, a couple of researchers made great suggestions on how our questioning techniques could evolve. This might include reducing emphasis on text-based responses to  ‘remove the pressure on eloquence’, or giving users an AI-generated answer to respond to and critique.

One of my favourite contributions from this event was the suggestion that each ‘motivating factor’ could be paired with a series of ‘mitigating factors’ – only by unpacking and acknowledging the multifaceted reasons behind AI usage by respondents, can we start to develop meaningful strategies. There is, of course, a deep irony that the very technology that promised to maximise our collective efficiency is, for the time being, also slowing us down considerably.

Looking to the future

I recently came across the optimistic take that, despite our fears about AI destroying literature, non-AI crafted written words (and the skills to produce them) might actually increase in value in the near future. 

Similarly, being able to prove that data is not AI-tainted might soon be the gold standard for qualitative research. As one contributor pointed out, “maybe in 5-10 years time, we could be less obsessed with sample size as representative, and will place more value on smaller sample sizes that have proof of life.” 

Whilst I, like many fellow qualitative researchers, will continue to wrestle with seeing AI within research data, and plan to do my best to respond by adapting my own practice, I also believe it’s important to continue to consider the question, What is the more genuine answer? The one where the respondent didn’t use AI, but potentially struggled to express their authentic experience and feelings? Or one where the respondent used AI to help them to better express their authentic experience and feelings?” 

Finding out where we draw the line, and how we do so, is at the heart of reaching a post-AI equilibrium within qualitative research. 

Next
Next

What can we learn from emerging evidence of GenAI use in Social and Behaviour Change chatbots?