Maksym Andriushchenko

Paper_short_circuiting

June 7, 2024

2024

Incredibly excited about our new paper Improving Alignment and Robustness with Short Circuiting (see the Twitter/X thread from Andy for a summary)! Effective defenses against jailbreaking attacks on LLMs may be much more feasible than previously thought.