Talk_adaptive_attacks_uk_aisi | Maksym Andriushchenko

An invited talk at the UK AI Safety Institute about Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks, where we achieved 100% jailbreak success rate on all major LLMs, including GPT-4o and Claude 3.5 Sonnet.