AI Under Attack: How Harbench Tests LLM Safety

How safe are large language models like ChatGPT and Google’s Gemini? In this episode, we dive into groundbreaking research on AI safety and explore Harbench, a powerful new tool designed to stress-test LLMs against harmful manipulation. With 18 different attack methods tested across 33 models, this study reveals surprising vulnerabilities—and promising solutions. We break down red teaming, contextual attacks, and the innovative R2-D2 defense system that could make AI more resilient. Can LLMs ever be truly safe? Join us as we tackle the risks, defenses, and ethical responsibilities shaping the future of AI.Link: https://arxiv.org/pdf/2402.04249

NOW PLAYING

0:00 10:49

1×

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Share this episode

Similar Episodes

No similar episodes found.

Similar Podcasts

No similar podcasts found.

Frequently Asked Questions

How long is this episode of AIandBlockchain?

This episode is 10 minutes long.

When was this AIandBlockchain episode published?

This episode was published on February 17, 2025.

What is this episode about?

Can I download this AIandBlockchain episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.

URL copied to clipboard!