AI testing and evaluation podcast artwork

PODCAST · technology

AI testing and evaluation

A deep dive into AI quality and security, evaluation frameworks, bias detection, and building reliable and robust AI systems. Hosted by Aleksandr Meshkov, who is an AI evaluation architect with 13 years of experience

  1. 1

    AI Evaluation. Episode 1. Practical approach to using LLM-as-a-Judge effectively

    Episode Description: In this episode, we dive into a practical, three-step approach to transform LLMs from unpredictable evaluators into reliable and transparent tools. Stop relying on vague instructions like "evaluate relevance" and learn how to implement a high-precision framework that yields consistent results.What we cover in this episode:• Step 1: The Power of Binary Criteria. Learn why you should define 5–7 concrete evaluation metrics—such as checking for fabricated facts, length limits, or specific tones—that result in a simple "yes" or "no".• Step 2: Structured Output for Accountability. Discover how to request JSON or other structured formats so the model provides a verdict and the specific evidence or justification supporting its decision.• Step 3: Continuous Improvement and Debugging. We discuss the importance of running 20–30 test examples to identify where the model makes mistakes. We explain why evaluation failures often stem from how criteria are formulated rather than the model's inherent capabilities.Tune in to learn how to move away from "black box" scoring and create an evaluation logic that you can continuously improve and fully understand.

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

ABOUT THIS SHOW

A deep dive into AI quality and security, evaluation frameworks, bias detection, and building reliable and robust AI systems. Hosted by Aleksandr Meshkov, who is an AI evaluation architect with 13 years of experience

HOSTED BY

Aleksandr Meshkov

CATEGORIES

Frequently Asked Questions

How many episodes does AI testing and evaluation have?

AI testing and evaluation currently has 1 episodes available on PodParley. New episodes are automatically indexed when they're published to the podcast feed.

What is AI testing and evaluation about?

A deep dive into AI quality and security, evaluation frameworks, bias detection, and building reliable and robust AI systems. Hosted by Aleksandr Meshkov, who is an AI evaluation architect with 13 years of experience

How often does AI testing and evaluation release new episodes?

AI testing and evaluation has 1 episodes. Check the episode list to see recent publication dates and frequency.

Where can I listen to AI testing and evaluation?

You can listen to AI testing and evaluation on PodParley by clicking any episode. We provide an embedded audio player for direct listening, and you can also subscribe via your preferred podcast app using the RSS feed.

Who hosts AI testing and evaluation?

AI testing and evaluation is created and hosted by Aleksandr Meshkov.
URL copied to clipboard!