Maksym Andriushchenko

Iclr_2025_accepted

January 30, 2025

2025

Four papers accepted at ICLR 2025: AgentHarm, past tense jailbreaks, adaptive jailbreaks, and a comparison between in-context learning vs. instruction fine-tuning.