Perturbation-Based Generation Profiling Detects Covert AI Agent Attacks Where Token-Level Statistics Fail
Unsupervised attack detection via cross-perturbation generation profiling — 0.94–1.00 AUROC across six LLMs (8B–32B) with zero training data, where token-level baselines plateau at 0.56–0.83.