Catching AI Sleeper Agent - LLM Backdoors
An episode of the Build Wiz AI Show podcast, hosted by Build Wiz AI, titled "Catching AI Sleeper Agent - LLM Backdoors" was published on February 5, 2026 and runs 15 minutes.
February 5, 2026 ·15m · Build Wiz AI Show
Summary
Could your trusted AI model be a hidden "sleeper agent" just waiting for a secret command to turn malicious? We explore a new methodology that extracts and reconstructs backdoor triggers by exploiting the surprising fact that these models often strongly memorize their own poisoning data. Tune in to discover how this inference-only scanner can unmask hidden threats across various LLMs without needing any prior knowledge of the attacker’s specific trigger or target behavior.Source: https://arxiv.org/pdf/2602.03085
Episode Description
Could your trusted AI model be a hidden "sleeper agent" just waiting for a secret command to turn malicious? We explore a new methodology that extracts and reconstructs backdoor triggers by exploiting the surprising fact that these models often strongly memorize their own poisoning data. Tune in to discover how this inference-only scanner can unmask hidden threats across various LLMs without needing any prior knowledge of the attacker’s specific trigger or target behavior.
Source: https://arxiv.org/pdf/2602.03085
Similar Episodes
Jan 15, 2016 ·18m
Dec 23, 2015 ·40m
Dec 18, 2015 ·9m
Dec 7, 2015 ·16m
Nov 11, 2015 ·10m