Paper_short_circuiting
Incredibly excited about our new paper Improving Alignment and Robustness with Short Circuiting (see the Twitter/X thread from Andy for a summary)! Effective defenses against jailbreaking attacks on LLMs may be much more feasible than previously thought.