Catching AI Sleeper Agent - LLM Backdoors
An episode of the Build Wiz AI Show podcast, hosted by Build Wiz AI, titled "Catching AI Sleeper Agent - LLM Backdoors" was published on February 5, 2026 and runs 15 minutes.
February 5, 2026 ·15m · Build Wiz AI Show
Summary
Could your trusted AI model be a hidden "sleeper agent" just waiting for a secret command to turn malicious? We explore a new methodology that extracts and reconstructs backdoor triggers by exploiting the surprising fact that these models often strongly memorize their own poisoning data. Tune in to discover how this inference-only scanner can unmask hidden threats across various LLMs without needing any prior knowledge of the attacker’s specific trigger or target behavior.Source: https://arxiv.org/pdf/2602.03085
Episode Description
Could your trusted AI model be a hidden "sleeper agent" just waiting for a secret command to turn malicious? We explore a new methodology that extracts and reconstructs backdoor triggers by exploiting the surprising fact that these models often strongly memorize their own poisoning data. Tune in to discover how this inference-only scanner can unmask hidden threats across various LLMs without needing any prior knowledge of the attacker’s specific trigger or target behavior.
Source: https://arxiv.org/pdf/2602.03085
Similar Episodes
Apr 11, 2026 ·4m
Apr 10, 2026 ·3m
Apr 9, 2026 ·37m
Apr 9, 2026 ·14m
Apr 9, 2026 ·18m