514: Running Local LLMs in VS Code episode artwork

EPISODE · May 11, 2026 · 55 MIN

514: Running Local LLMs in VS Code

from Merge Conflict · host soundbite.fm

In this episode James and Frank dive into running AI coding models locally versus in the cloud—BYOK/Open Router, VS Code’s chat/agent harness, model runners (Olama, vLLM), and the practicality of 27B models on a 3090 using 4‑bit quantization. They share hands-on takeaways—how recent engineering (MT/MTPLX) boosts inference to usable token rates, when auto model selection makes sense, cost and hardware trade‑offs, and why local models can liberate your workflow while still needing smarter, unified tooling. Follow Us Frank: Twitter, Blog, GitHub James: Twitter, Blog, GitHub Merge Conflict: Twitter, Facebook, Website, Chat on Discord Music : Amethyst Seer - Citrine by Adventureface ⭐⭐ Review Us ⭐⭐ Machine transcription available on http://mergeconflict.fm

In this episode James and Frank dive into running AI coding models locally versus in the cloud—BYOK/Open Router, VS Code’s chat/agent harness, model runners (Olama, vLLM), and the practicality of 27B models on a 3090 using 4‑bit quantization. They share hands-on takeaways—how recent engineering (MT/MTPLX) boosts inference to usable token rates, when auto model selection makes sense, cost and hardware trade‑offs, and why local models can liberate your workflow while still needing smarter, unified tooling. Follow Us Frank: Twitter, Blog, GitHub James: Twitter, Blog, GitHub Merge Conflict: Twitter, Facebook, Website, Chat on Discord Music : Amethyst Seer - Citrine by Adventureface ⭐⭐ Review Us ⭐⭐ Machine transcription available on http://mergeconflict.fmSupport Merge Conflict

NOW PLAYING

514: Running Local LLMs in VS Code

0:00 55:46

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Dream Into Being Kat Divine Welcome Envisionaries to the “Dream into Being" Podcast where Mind Science, Transformational Psychology and Magic all merge into one! If you’re ready to dream again and transcend the inertia of what you’ve known, then I invite you to join me, your Host and Envisioneer, Kat Divine, in expanding the boundaries of your own Fantasia by remembering that YOU are the Master you’ve been waiting for!! FORGIVENESS TIWA SAVAGE Tiwa Savage's song "Forgiveness" is a soulful R&B track about seeking reconciliation and healing in a troubled relationship. Released in 2024, the song is a vulnerable and emotional plea for a partner to see beyond their current conflict and prioritize everyday love over fighting. Boekestijn en De Wijk BNR Nieuwsradio Boekestijn & De Wijk: op zoek naar de nieuwe wereldorde. Niet alleen wat er gebeurt, maar vooral waarom. Wil je meer? Met Boekestijn & De Wijk Plus luister je verder dan de koppen: extra colleges en bonuscontent bij grote dossiers en momenten. De lange zaterdagaflevering, is exclusief voor leden. 👉 BoekestijnEnDeWijk.nl Kwaliteit kost wat, maar niet de wereld. Arend Jan Boekestijn en Rob de Wijk gaan onder leiding van Hugo Reitsma op zoek naar de nieuwe wereldorde. Elke zaterdagochtend om 11 uur een nieuwe aflevering waar wekelijks een relevante gast aanschuift.Daarnaast een dagelijkse update waarin de oorlog in Oekraïne en momenteel het conflict in de Gazastrook centraal staan.De wereld lijkt soms in brand te staan, meerdere conflicten en oorlogen, de geopolitieke verhoudingen lijken voortdurend te verschuiven. Met zoveel nieuws is het prettig om naar deskundigen te luisteren die orde in de chaos scheppen. Defensiespecialist Rob de Wijk en docent internationale betrekkingen Aren Talk Like a Leader Guy Harris Tips, techniques, and insights for more effective leadership communication. Learn to apply powerful communication strategies to coach, inspire, and resolve conflict situations as a leader.

Frequently Asked Questions

How long is this episode of Merge Conflict?

This episode is 55 minutes long.

When was this Merge Conflict episode published?

This episode was published on May 11, 2026.

What is this episode about?

In this episode James and Frank dive into running AI coding models locally versus in the cloud—BYOK/Open Router, VS Code’s chat/agent harness, model runners (Olama, vLLM), and the practicality of 27B models on a 3090 using 4‑bit quantization. They...

Is there a transcript available for this episode?

Yes, a full transcript is available for this episode. You can read the complete transcript on the episode page.

Can I download this Merge Conflict episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!