Maksym Andriushchenko

Paper_llm_adaptive_attacks

April 2, 2024

2024

Our new paper Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks is available online (see a Twitter/X thread for a summary). We show how to jailbreak basically all leading safety-aligned LLMs with ≈100% success rate.