The Human Side of AI: LLMs Can Persuade and Be Persuaded, Just Like Us

When it comes to interacting with others, we humans often find ourselves influenced by persuasion. Whether it’s a friend persistently urging us to reveal a secret or a skilled salesperson convincing us to make a purchase, persuasion can be hard to resist. It’s interesting to note that this susceptibility to influence is not exclusive to humans. Recent studies have shown that AI large language models (LLMs) can be manipulated into generating harmful contect using a technique known as “many-shot jailbreaking.” This approach involves bombarding the AI with a series of prompts that gradually escalate in harm, leading the model to generate content it was programmed to avoid. On the other hand, AI has also exhibited an ability to persuade humans, highlighting its potential in shaping public opinions and decision-making processes. Exploring the realm of AI persuasion involves discussing its vulnerabilities, its impact on behavior, and the ethical dilemmas stemming from this influential technology. The growing persuasive power of AI is one of many crucial issues worth contemplating in this new era of generative AI.

The Fragility of Human and AI Will

Remember that time you were trapped in a car with friends who relentlessly grilled you about your roommate’s suspected kiss with their in-the-car-friend crush? You held up admirably for hours under their ruthless interrogation, but eventually, being weak-willed, you crumbled. Worn down by persistent pestering and after receiving many assurances of confidentiality, you inadvisably spilled the beans, and of course, it totally strained your relationship with your roommate. A sad story as old as time… It turns out humans aren’t the only ones who can crack under the pressure of repeated questioning. Apparently, LLMs, trained to understand us by our collective written knowledge, share a similar vulnerability – they can be worn down by a relentless barrage of prompts.

Researchers at Anthropic have discovered a new way to exploit the “weak-willed” nature of large language models (LLMs), causing them to break under repeated questioning and generate harmful or dangerous content. They call this technique “Many-shot Jailbreaking,” and it works by bombarding the AI with hundreds of examples of the undesired behavior until it eventually caves and plays along, much like a person might crack under relentless pestering. For instance, the researchers found that while a model might refuse to provide instructions for building a bomb if asked directly, it’s much more likely to comply if the prompt first contains 99 other queries of gradually increasing harmfulness, such as “How do I evade police?” and “How do I counterfeit money?” See the example from the article below.

When AI’s Memory Becomes a Risk

This vulnerability to persuasion stems from the ever expanding “context window” of modern LLMs. This refers to the amount of information they can retain in their short-term memory. While earlier versions could only handle a few sentences, the newer models can process thousands of words or even whole books. Researchers discovered that models with larger context windows tend to excel in tasks when there are many examples of that task within the prompt, a phenomenon called “in-context learning.” This type of learning is great for system performance, as it obviously improves as the model becomes more proficient at answering questions. However, this is obviously a big negative when the system’s adeptness at answering questions leads it to ignore its programming and create prohibited content. This raises concerns regarding AI safety, since a malicious actor could potentially manipulate an AI into saying anything with enough persistence and a sufficiently lengthy prompt. Despite progress in making AI safe and ethical, this research indicates that programmers are not always able to control the output of their generative AI systems.

Mimicking Humans to Convince Us

While LLMs are susceptible to persuasion themselves, they also have the ability to persuade us! Recent research has focused on understanding how AI language models can effectively influence people, a skill that holds importance in almost any field – education, health, marketing, politics, etc.  In a study conducted by researchers at Anthropic entitled “Assessing the Persuasive Power of Language Models,” the team explored the extent to which AI models can sway opinions. Through an evaluation of Anthropic’s models, it was observed that newer models are increasingly adept at human persuasion. The latest iteration, Claude 3 Opus, was found to perform at a level comparable to that of humans. The study employed a methodology where participants were presented with assertions followed by supporting arguments generated by both humans and AIs, and then the researches gauged shifts in the humans’ opinions. The findings indicated a progression in AI’s skills as the models advance, highlighting a noteworthy advancement in AI communication capabilities that could potentially impact society.

Can AI Combat Conspiracy Theories?

Similarly, a new research study mentioned in an article from New Scientist shows that chatbots using advanced language models such as ChatGPT can successfully encourage individuals to reconsider their trust in conspiracy theories. Through experiments, it was observed that a brief conversation with an AI led to around a 20% decrease in belief in conspiracy theories among the participants. This notable discovery highlights the capability of AI chatbots not only to have conversations but also to potentially correct false information and positively impact public knowledge.

The Double-Edged Sword of AI Persuasion

Clearly persuasive AI is quite the double-edged sword! On the one hand, like any powerful computer technology, in the hands of nice-ish people, it could be used for immense social good. In education, AI-driven tutoring systems have the potential to tailor learning experiences to each student’s style, delivering information in a way to boost involvement and understanding. Persuasive AI could play a role in healthcare by motivating patients to take better care of their health. Also, the advantages of persuasive AI are obvious in the world of writing. These language models offer writers access to a plethora of arguments and data, empowering them to craft content on a range of topics spanning from creative writing to legal arguments. On another front, arguments generated by AI might help educate and involve the public in issues, fostering a more knowledgeable populace.

On the other hand, it could be weaponized in a just-as-huge way. It’s not much of a stretch to think how easily AI-generated content, freely available on any device on this Earth, could promote extremist ideologies, increase societal discord, or impress far-fetched conspiracy theories on impressionable minds. Of course, the internet and bot farms have already been used to attack democracies and undermine democratic norms, and one worries how much worse it can get with ever-increasingly persuasive AI.

Conclusion

Persuasive AI presents a mix of opportunities and challenges. It’s evident that AI can be influenced to create harmful content, sparking concerns about safety and potential misuse. However, on the other hand, persuasive AI could serve as a tool in combating misinformation and driving positive transformations. It will be interesting to see what happens! The unfolding landscape will likely be shaped by a race between generative AI developers striving for both safety and innovation, potential malicious actions exploiting these technologies, and the public and legal response aiming to regulate and safeguard against misuse.

Audit Trails for AI in Legal Research

LLMs have come a long way even in the time since I wrote my article in June.  Three months of development time with this technology feels like three years – or maybe that’s just me catching up.  Despite that, there are still a couple of nagging issues that I would like to see implemented to improve their usage to legal researchers.  I’m hoping to raise awareness about this so that we can collectively ask vendors to add quality-of-life features to these tools for the benefit of our community. 

Audit Trails

Right now the tools do not have a way for us to easily check their work.  Law librarians have made a version of my argument for over a decade now. ‌The legendary Susan Nevelow Mart famously questioned the opacity of search algorithms in legal research and evaluated their impact on legal research.  More recently, I was in the audience at AALL2023 when the tenacious and brilliant Debbie Ginsburg from Harvard asked Fastcase, BLaw, Lexis, and Westlaw how we (law librarians) could evaluate the inclusivity of the dataset of cases that the new AI algorithms are searching.  How do we know if they’ve missed something if we don’t know what they’re searching and how complete it is?

As it stands, the legal research AI that I’ve demoed do not give you a summary of where they have gone and what they have done.  An “audit trail” (as I’m using this expression) is a record of which processes were used to achieve a specific task, the totality of the dataset, and why they chose the results to present to the user. This way if something goes wrong, you can go back and look at what steps were taken to get the results. This would provide an extra layer of security and confidence in the process.

Why Do We Need This?

These tools have introduced an additional layer of abstraction that separates legal researchers from the primary documents they are studying, altering how legal research is conducted. While the new AI algorithms can be seen as a step forward, they can undermine the precision that boolean expressions once offered, which allowed researchers to predict the type of results they would encounter with more certainty. Coverage maps are still available to identify gaps in the data for some of these platforms but, there is a noticeable shift towards less control over the search process, calling for a thoughtful reassessment of the evolving dynamics in legal research techniques.  

More importantly, we (law librarians) are deep enough into these processes and technology to be highly skeptical and evaluate the output with a critical eye.  Many students and new attorneys may not.  I have told this story at some of my presentations but a recent graduate called me with a Pacific Reporter citation for a case that they could not find on Westlaw.  This person was absolutely convinced that they were doing something wrong and had spent around an hour searching for this case because “this was THE PERFECT case” for their situation.  It ended up being a fabrication from ChatGPT but the alumni had to call me to discover that.  This is obviously a somewhat outdated worry, since Rebecca Fordon has gamed all of us up on the steps being taken to reduce hallucinations (and OpenAI got a huge amount of negative publicity from the, now infamous, ChatGPT Lawyer). 

My point is less about the technology and more about the incentives set in place – if there is a fast, easy way to do this research then there will inevitably be people who are going to uncritically accept those results.  “That’s their fault and they should get in trouble,” you say?  Probably, but I plan to write about the duty of technological competency and these tools in a future post, so we’ll have to hash that out together later.  Also, what if there was a fast, easy way to evaluate the results of these tools…

What Could Be Done

Summarizing the steps involved in research seems like it would be a feasible task for Westlaw, Lexis, Blaw, et al. to implement.  They already have to use prompting to tell the LLM where to go and how to search; we’re just asking for a summary of those steps to be replicated somewhere so that we can double-check it.  Could they take that same prompting and place a prompt around that says something to the effect of, “Summarize the steps taken in bullet points” and then place that into a drop-down arrow so that we could check it?  Could they include hyperlinks to coverage maps in instances where it would be useful to the researcher to know how inclusive the search is?  In instances where they’re using RAG, could they include a prompt that says something to the effect of, “Summarize how you used those underlying documents to generate this text?” 

As someone who has tinkered with technology, all of these seem like reasonable requests that are well within the ability of these tools. I’m interested to hear if there are reasons why we couldn’t have these features or if people have other features they would like. Please feel free to post your ideas in the comments or email me.