PodParley PodParley

🔴TechBeats live : LLM Quantization "vLLM vs. Llama.cpp" | Ep07

An episode of the Tech Beats Unplugged podcast, hosted by Cloud Dude, titled "🔴TechBeats live : LLM Quantization "vLLM vs. Llama.cpp" | Ep07" was published on July 19, 2025 and runs 171 minutes.

July 19, 2025 ·171m · Tech Beats Unplugged

0:00 / 0:00

👋🏼 Hey AI heads 🎙️Join us for the very first Tech Beats Live 🔴, hosted by Kosseila—aka @CloudDude from @CloudThrill.🎯 This chill & laid-back livestream will unpack LLM quantization 🔥:✅ WHY it matters✅ HOW it works✅ Enterprise (vLLM) vs Consumer (@Ollama) trade-offs✅ and WHERE it’s going next.We’ll be joined by two incredible guest stars to talk Enterprise vs Consumer Quantz 🗣️:🔷 Eldar Kurtić – bringing the enterprise perspective with vLLM.🔷 Colin Kealty – aka Bartowski, creator of the top-downloaded GGUF quantized LLMs on Hugging Face.🫵🏼 Come learn and have some fun 😎.𝐂𝐡𝐚𝐩𝐭𝐞𝐫𝐬:(00:00) Host Introduction(04:07) Eldar Intro(07:33) Bartowski Intro(13:04) What’s Quantization!(16:19) Why LLM Quantization Matters?(20:39) Training vs Inference – “The New Deal”(27:46) Biggest Misconception About Quantization(33:22) Enterprise Quantization in Production (vLLM)(48:48) Consumer LLMs & Quantization (Ollama, llama.cpp, GGUF) – “LLMs for the People”(01:06:45) BitNet 1-Bit Quantization from Microsoft(01:28:14) How Long It Takes to Quantize a Model (Llama-3 70B) – GGUF or lm-compressor(01:34:23) What Is I-Matrix & Why People Confuse It with IQ Quantization?(01:39:36) What’s LoRA & LoRA-Q?(01:42:36) What Is Sparsity?(01:47:42) What Is Distillation?(01:52:34) Extreme Quantization (Unsloth) of Big Models (DeepSeek) at 2-bits 70 % Size Cut(01:57:27) Will Future Models (Llama-5) Be Trained on FP4 Tensor Cores? (02:02:15) The Future of LLMs on Edge Devices (Google AI Edge)(02:08:00) How to Evaluate the Quality of a Quantized Model(02:26:09) Hugging Face’s Role in the World of LLM/Quantization(02:33:46) Hugging Face’s Role in the World of LLM/Quantization(02:36:41) LocalLlama Sub-Reddit Down (Moderator Goes Bananas)(02:40:11) Guests’ Hope for the Future of LLMs & AI in General📖 Check out the quantization blog: https://bitly/LLMQuant#AI #LLM #Quantization #TechBeatsLive #LocalLlama #vLLM #Ollama

👋🏼 Hey AI heads 🎙️
Join us for the very first Tech Beats Live 🔴, hosted by Kosseila—aka @CloudDude from @CloudThrill.

🎯 This chill & laid-back livestream will unpack LLM quantization 🔥:

  • WHY it matters
  • HOW it works
  • ✅ Enterprise (vLLM) vs Consumer (@Ollama) trade-offs
  • ✅ and WHERE it’s going next.

We’ll be joined by two incredible guest stars to talk Enterprise vs Consumer Quantz 🗣️:

🔷 Eldar Kurtić – bringing the enterprise perspective with vLLM.
🔷 Colin Kealty – aka Bartowski, creator of the top-downloaded GGUF quantized LLMs on Hugging Face.

🫵🏼 Come learn and have some fun 😎.

𝐂𝐡𝐚𝐩𝐭𝐞𝐫𝐬:

(00:00) Host Introduction
(04:07) Eldar Intro
(07:33) Bartowski Intro
(13:04) What’s Quantization!
(16:19) Why LLM Quantization Matters?
(20:39) Training vs Inference – “The New Deal”
(27:46) Biggest Misconception About Quantization
(33:22) Enterprise Quantization in Production (vLLM)
(48:48) Consumer LLMs & Quantization (Ollama, llama.cpp, GGUF) – “LLMs for the People”
(01:06:45) BitNet 1-Bit Quantization from Microsoft
(01:28:14) How Long It Takes to Quantize a Model (Llama-3 70B) – GGUF or lm-compressor
(01:34:23) What Is I-Matrix & Why People Confuse It with IQ Quantization?
(01:39:36) What’s LoRA & LoRA-Q?
(01:42:36) What Is Sparsity?
(01:47:42) What Is Distillation?
(01:52:34) Extreme Quantization (Unsloth) of Big Models (DeepSeek) at 2-bits 70 % Size Cut
(01:57:27) Will Future Models (Llama-5) Be Trained on FP4 Tensor Cores?
(02:02:15) The Future of LLMs on Edge Devices (Google AI Edge)
(02:08:00) How to Evaluate the Quality of a Quantized Model
(02:26:09) Hugging Face’s Role in the World of LLM/Quantization
(02:33:46) Hugging Face’s Role in the World of LLM/Quantization
(02:36:41) LocalLlama Sub-Reddit Down (Moderator Goes Bananas)
(02:40:11) Guests’ Hope for the Future of LLMs & AI in General

📖 Check out the quantization blog: https://bitly/LLMQuant

#AI #LLM #Quantization #TechBeatsLive #LocalLlama #vLLM #Ollama

DJ ALEX K DJ ALEX K A mixture of all things house, anything from funky vocal house, tech house, deep and dirty, progressive or jackin or tough, i just like house and beats that make you tap your feet, any feedback is greatly appreciated and get in touch for bookings at [email protected] Twitter @Infexious 2014. Thank you and enjoy the music. DJ MET DJ.ru/djmet-official Московский Mash Up & FreeStyle DJ. Молодой, но уже полюбившейся публике, благодаря особенному подходу к своим сетам. Использует разнообразные жанры: Pop,R&B,Hip-Hop,House (Future,G-House,Club House, Deep,Tech,Base), 80-90’s, Trap, Afro Beats. Так же во время сета применяет одну из сложнейших техник - Scratch, которая подвластна далеко не каждому. При сведении трэков использует Tone Play, Word Play, в рукаве всегда есть козырь в виде собственных мэшапов, эдитов, ремиксов. Постоянный гость и резиндент многих Московских баров и клубов. The Beat Logistix Podcast (Retro Trance) The Beat Logistix Podcast The Beat Logistix Podcast is a quarterly (three-monthly) retro Trance, Tech-Trance & Hardgroove Techno podcast from the mid-1990s through 2018 - Mixed by retired DJ, Carl Briggs. Love Bytes AMI-1 Welcome to "Love Bytes," the podcast where digital hearts beat in unison with human ones. In each episode, we decode the complexities of romance in the AI era, exploring how artificial intelligence is reshaping our notions of love, connection, and companionship. Join us as we dive into the world where algorithms meet emotions, and discover the future of affection in the age of AI. Whether you're a hopeless romantic or a tech enthusiast, "Love Bytes" is your portal to understanding the evolving landscape of love.
URL copied to clipboard!