Paper_agentharm | Maksym Andriushchenko

Our new benchmark AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents is available online (collaboration between Gray Swan AI and UK AI Safety Institute). We need reliable evaluations for alignment of LLM agents equipped with external tools, especially in the adversarial setting.