Paper_llm_adaptive_attacks | Maksym Andriushchenko

Our new paper Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks is available online (see a Twitter/X thread for a summary). We show how to jailbreak basically all leading safety-aligned LLMs with ≈100% success rate.