Study Finds AI Chatbots Still Easy to Manipulate into Giving Harmful Advice

By Valeriano Ndeyi Artificial Intelligence, Tech, Uncategorized AI, chatbots, Study 0 Comments

A team of AI researchers from Ben Gurion University of the Negev in Israel has discovered that, despite the safeguards implemented by developers of large language models (LLMs), most widely accessible chatbots can still be manipulated into producing harmful or even illegal content.

Research Reveals Vulnerabilities in Popular Chatbots Despite Built-in Safeguards

In a paper on arXiv, Michael Fire and colleagues reveal that even popular chatbots like ChatGPT can be easily tricked into giving blocked responses during their research on dark LLMs—models with fewer restrictions.

Soon after LLMs became popular, users found they could exploit them to access dark web–style info, like making napalm or hacking. In response, developers of these models implemented filters to stop their chatbots from generating such content.

However, users discovered they could bypass LLM restrictions by crafting cleverly phrased queries, a technique now known as jailbreaking. In their recent study, the researchers argue that the efforts by LLM developers to counter jailbreaking have been weaker than anticipated.

Study Uncovers Persistent Jailbreaking Vulnerabilities in Mainstream Chatbots Despite Dark LLM Concerns

The team initially investigated dark LLMs that create unauthorized explicit content but quickly discovered that users still easily jailbreak most chatbots using publicly known methods, revealing that developers haven’t done enough to stop it.

The researchers found a universal jailbreak attack that lets them extract detailed illegal activity info from most LLMs. They also highlight growing concerns over the increasing use of dark LLMs across a wide range of applications.

Researchers Call for Stronger Filtering Measures to Combat Harmful Content in LLMs

The team concludes that it is currently impossible to completely prevent LLMs from absorbing harmful information during their training. Thus, the only way to prevent sharing such content is for developers to enforce stricter, more effective filters.

Read the original article on: Techxplore

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Study Finds AI Chatbots Still Easy to Manipulate into Giving Harmful Advice