EPISODE · Jun 20, 2026 · 41 MIN
Testing the Untestable: LLM Security for Java Developers with Tiberius (#99)
from Foojay.io, the Friends Of OpenJDK! · host Foojay.io | Java and Programming Community
Your Java AI application is live in production. But have you tested whether it can be jailbroken, manipulated into revealing its system prompt, or tricked into printing content it should never output?In this episode, Iryna Dohndorf, Software Engineer at Karakun Group and creator of Tiberius, explains how to bring security testing to LLM-powered Java applications. We cover why traditional unit tests break down with non-deterministic systems, how the Scan-Fixture-Validate workflow works, what buff mutation testing is, and why even well-trained models can be cracked with something as simple as the grandmother attack.Topics include:Why LLM non-determinism breaks the classic input/output test modelThe Scan-Fixture-Validate principle and sharing test artifacts across teamsPrompt injection, jailbreaks, and emotional manipulation attacksBuff mutation: testing linguistic surface coverageProbabilistic security contracts and multi-trial scansFingerprinting and why your model choice should not be detectableLLM as a judge: using a second model as a guardrailGetting started with Tiberius in Spring Boot and LangChain4jGuestIryna Dohndorf - Software Engineer at Karakun GroupLinkedInLinksArticle on FoojayTiberius on GitHubSecurity Testing GuideTimestamps00:00 Introduction of topic and guest01:05 The problem Tiberius wants to solve06:39 How "traditional" unit tests don't work for LLM integrations10:23 Scan-Fixture-Validate principle and sharing artifacts15:15 Using different skills, for example, the grandmother skill17:33 Testing for required versus forbidden bias19:35 The probes across nine attack categories used by Tiberius20:44 Buff mutation testing26:55 Using Tiberius in your pipelines and when to fail29:35 Using multi-trial scans31:14 Fingerprinting: which model you use, should not be detectable32:55 Combining multiple models, model as a judge34:41 Sharing JSON models to improve tests36:05 How to get started with Tiberius in Spring and with LangChain4j36:41 Quarkus not supported yet, plans for the future39:07 Conclusions and a call out to everyone to become a Foojay author
What this episode covers
Your Java AI application is live in production. But have you tested whether it can be jailbroken, manipulated into revealing its system prompt, or tricked into printing content it should never output?In this episode, Iryna Dohndorf, Software Engineer at Karakun Group and creator of Tiberius, explains how to bring security testing to LLM-powered Java applications. We cover why traditional unit tests break down with non-deterministic systems, how the Scan-Fixture-Validate workflow works, what buff mutation testing is, and why even well-trained models can be cracked with something as simple as the grandmother attack.Topics include:Why LLM non-determinism breaks the classic input/output test modelThe Scan-Fixture-Validate principle and sharing test artifacts across teamsPrompt injection, jailbreaks, and emotional manipulation attacksBuff mutation: testing linguistic surface coverageProbabilistic security contracts and multi-trial scansFingerprinting and why your model choice should not be detectableLLM as a judge: using a second model as a guardrailGetting started with Tiberius in Spring Boot and LangChain4jGuestIryna Dohndorf - Software Engineer at Karakun GroupLinkedInLinksArticle on FoojayTiberius on GitHubSecurity Testing GuideTimestamps00:00 Introduction of topic and guest01:05 The problem Tiberius wants to solve06:39 How "traditional" unit tests don't work for LLM integrations10:23 Scan-Fixture-Validate principle and sharing artifacts15:15 Using different skills, for example, the grandmother skill17:33 Testing for required versus forbidden bias19:35 The probes across nine attack categories used by Tiberius20:44 Buff mutation testing26:55 Using Tiberius in your pipelines and when to fail29:35 Using multi-trial scans31:14 Fingerprinting: which model you use, should not be detectable32:55 Combining multiple models, model as a judge34:41 Sharing JSON models to improve tests36:05 How to get started with Tiberius in Spring and with LangChain4j36:41 Quarkus not supported yet, plans for the future39:07 Conclusions and a call out to everyone to become a Foojay author
NOW PLAYING
Testing the Untestable: LLM Security for Java Developers with Tiberius (#99)
No transcript for this episode yet
Similar Episodes
Dec 5, 2025 ·50m
Oct 9, 2025 ·33m
Oct 3, 2025 ·40m
Sep 11, 2025 ·31m
Aug 27, 2025 ·39m
Aug 18, 2025 ·54m