The amazing capabilities of generative AI to create visuals are getting better and more accessible, but with their models based on existing art libraries, artists are frantically looking to prevent their work from being collected without their permission. A new tool, aptly named Nightshade, may be the answer.
The trick involves using optimized, speed-based “data poisoning attacks” when entering the image generator.
“Poison has been a well-known attack vector in machine learning models for years,” Professor Ben Zhao told Decrypt. “Nightshade is interesting not because it causes poisoning, but because these models are so large that it poisons AI models that no one thought possible.”
Since generative AI models came into the mainstream this year, combating intellectual property theft and AI deep counterfeiting has become critical. In July, a team of MIT researchers similarly proposed that tiny bits of code that distort the image should not be used for injection.
Generative AI refers to AI models that use queries to generate text, images, music or videos. Google, Amazon, Microsoft, and Meta have invested heavily in bringing comprehensive AI tools to consumers.
As Zhao explained, Nightshade targets querying and includes an AI model to solve the problem of large data sets—for example, it asks questions to create an image of a dragon, dog, or horse.
“It makes no sense to attack the whole model,” Zhao said. “What you want to attack is the questioning of individuals, which weakens the model and prevents it from generating wisdom.”
To avoid detection, the text and images in the poisoned data must be designed to look natural and designed to fool both automated alignment detectors and human monitors to achieve the intended results, the researcher explained.
Although the toxic nightshade dataset is a proof of concept, the easiest way to trick an AI model like Stable Diffusion into thinking a cat is a dog is by mistaking a few hundred images of cats as dogs.
Although without any coordination, artists can start applying these poison pills en masse, and it can cause the AI model to fail.
“Once enough attacks are active on the same model, the model becomes worthless,” Zhao said. “By worthless, I mean, you give it things like ‘give me a picture' and it comes up with something like a kaleidoscope of pixels. The model is successfully annihilated on a version of an object similar to a random pixel generator.
Zhao doesn't want Nightshade to take any action on its own AI image generator, but says the AI model will take effect when Nightshade tries to use the input data.
“Unless you take those images and put them into the training data, they're useless,” he says, calling for less aggressive and self-defense or barbed-wire fence-targeting poison advice for AI developers to respect opt-out requests and don't scratch instructions.
“This is designed to solve that problem,” Zhao said. “So this barbed wire injected us with some poison. If you don't run, you won't suffer until you receive this item.
Edited by Ryan Ozawa.