706: Large Language Model Leaderboards and Benchmarks
In this episode, Caterina Constantinescu dives deep into Large Language Models (LLMs), spotlighting top leaderboards, evaluation benchmarks, and real-world user perceptions. Plus, discover the challenges of dataset contamination and the intricacies of platforms like HELM and Chatbot Arena.Additional materials: www.superdatascience.com/706Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
An episode of the Super Data Science: ML & AI Podcast with Jon Krohn podcast, hosted by Jon Krohn, titled "706: Large Language Model Leaderboards and Benchmarks" was published on August 18, 2023 and runs 33 minutes.
August 18, 2023 ·33m · Super Data Science: ML & AI Podcast with Jon Krohn
Summary
In this episode, Caterina Constantinescu dives deep into Large Language Models (LLMs), spotlighting top leaderboards, evaluation benchmarks, and real-world user perceptions. Plus, discover the challenges of dataset contamination and the intricacies of platforms like HELM and Chatbot Arena.Additional materials: www.superdatascience.com/706Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
Episode Description
Similar Episodes
Mar 10, 2026 ·43m
Feb 17, 2026 ·57m