Maksym Andriushchenko

Talk_adaptive_attacks_uk_aisi

November 4, 2024

2024

An invited talk at the UK AI Safety Institute about Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks, where we achieved 100% jailbreak success rate on all major LLMs, including GPT-4o and Claude 3.5 Sonnet.