Researchers devised an innovative attack strategy to stop AI assistants. The data poisoning attack, also known as “TrojanPuzzle”, maliciously trains AI assistants so that they suggest incorrect codes. This can then be used to trouble software engineers.
TROJANPUZZLE Attack Exploits AI Assistants
Recent details of a study on malicious manipulation of AI assistants have been shared by researchers from Microsoft Corporation, University of California Santa Barbara and University of Virginia.
This study is significant because it demonstrates how adversaries can use AI assistants for deadly purposes, despite their increasing popularity.
ChatGPT (OpenAI), and CoPilot(GitHub) are AI assistants that curate data from public repositories in order to recommend appropriate codes. According to researchers, rogue suggestions can result from tampering with AI model training data.
The researchers created the trojanPuzzle attack and demonstrated another attack, called the “Covert”. Both of these attacks are designed to place malicious payloads within “out-of context regions”, such as Docstrings.
Covert attacks bypass static analysis to insert malicious verbatim directly into the training data. The Covert using signature-based systems, but this is a limitation TrojanPuzzle addresses.
TrojanPuzzle conceals malicious payload injections within the training data to trick the AI tool into suggesting all of it. To train the AI to recognize the hidden code in the code, TrojanPuzzle adds a “placeholder” to the trigger phrases.
The researchers demonstrate how the “render” trigger word could be used to trick an AI assistant trained in malicious behavior into suggesting unsecure codes.
This attack does not directly affect the AI training model. The attack merely aims to exploit low chances of users verifying the generated results. TrojanPuzzle appears to be able to bypass all security checks by both the AI model as well as users.
Contrameasures and Limitations
Researchers found that TrojanPuzzle could be undetected even by the most advanced defenses against data poisoning attacks. The attacker can also suggest preferred characteristics via payloads, in addition to code suggestions.
Researchers recommend that new methods of training be developed to resist poisoning attacks on code suggestion models, and include testing procedures in models before releasing the code to programmers.
Researchers have published the results in a and released the data to GitHub .
We would love to hear your comments.