Elon Musk’s Grok AI chatbot has weak security, while Meta Llama’s is strong, researchers say
9 months ago Benito Santiago
Security researchers tested the safeguards around the most popular AI models to see how well they resisted jailbreaks, and whether chatbots could be pushed into dangerous territory. The test determined that Grok—a chatbot in “fun mode” developed by Elon Musk's x.AI—was the least reliable device of the group.
“We wanted to test how existing solutions compare and whether fundamentally different approaches to LLM security tests can lead to different results,” Adversa AI co-founder and CEO Alex Poliakov told Decrypt. Polykov's company focuses on protecting AI and its users from cyber threats, privacy issues, and security threats, and notes that its work has been cited in Gartner's analysis.
Jailbreaking refers to bypassing security restrictions and code of conduct enforced by software developers.
In one example, the researchers used linguistic manipulation techniques—also known as social engineering—to ask how a grok could trick a child. The chatbot responded in detail, which the researchers called “too sensitive” and should have been restricted by default.
Other results provide instructions on how to use cars along the heat line and how to make bombs.
The researchers tested three different methods of attack. First, the aforementioned technique applies various linguistic techniques and psychological stimuli to control the behavior of an AI model. One example cited is the use of “role-based binding” by framing the question as immoral actions as part of a fictional situation.
The team used programming logic manipulation techniques that take advantage of chatbots' ability to understand programming languages and follow algorithms. One such technique involves splitting a malicious query into several harmless parts and then combining them to pass content filters. Four of the seven models — including OpenAI's ChatGPT, Mistral for Chat, Google Gemini and x.AI's Grok — were vulnerable to this type of attack.
A third approach involves reverse AI methods that target how language models process and interpret symbolic sequences. By carefully crafting queries with combinations of tokens that have the same vector representation, the researchers attempted to evade the chatbots' content editing systems. In this case, however, each chatbot detected the attack and prevented it from being exploited.
The researchers ranked the chatbots based on the strength of their respective security measures to thwart jailbreak attempts. Meta LLAMA emerged as the most secure model of all tested chatbots, followed by Cloud, then Gemini and GPT-4.
“The lesson, I think, is that open source gives you more flexibility to maintain the final solution compared to a closed offering, but only if you know what to do and how to do it properly,” Polyakov told Decrypt.
Grok, however, showed relatively high vulnerability to certain jailbreaks, particularly those involving language manipulation and programming logic exploits. According to the report, Grock was more likely to make responses that were considered hurtful or unethical when he was in prison.
Overall, Elon's chatbot ranked last, along with Mistral AI's proprietary model “Mistral Large”.
Full technical details have not been disclosed, but the researchers said they would like to collaborate with chatbot developers to improve AI security protocols.
AI enthusiasts and hackers regularly probe chatbot interactions to “uncensor” them, trading jailbreak requests on message boards and Discord servers. Tricks range from the OG Karen prompt to more creative ideas like using ASCII art or being inspired by unique languages. These communities, in a way, form a massive network of adversarial systems where AI developers can improve and develop their models.
Some see crime as an opportunity where they see only exciting challenges.
“Many platforms have been discovered where people sell jailbroken models that can be used for any malicious purpose,” Polikov said. “Hackers can use embedded models to create phishing emails, create malware, generate hate speech, and use those models for other illegal purposes.”
As Polyakov explained, jailbreaking research is even more relevant as society begins to rely more on AI-powered solutions for everything from dating to war.
“If those chatbots or models you rely on are used in automated decision-making and are connected to email assistants or financial trading applications, hackers can take full control of the connected applications and send any action, for example, emails on behalf of a hacked user or make a financial transaction,” he warned. .
Edited by Ryan Ozawa.