Chatbots Trained to 'Jailbreak' Rivals
Briefly

The Masterkey method begins with reverse-engineering an LLM's defense mechanisms. Then, with the acquired data, another LLM is taught to learn how to create a bypass.
Masterkey was found to be three times more effective in jailbreaking LLM chatbots than standard prompts normally generated by LLMs, and due to its ability to learn from failure and evolve, it also rendered any patches useless.
Read at Acm
[
add
]
[
|
|
]