636: Red Hat's James Huang episode artwork

EPISODE · Dec 19, 2025 · 20 MIN

636: Red Hat's James Huang

from Coder Radio · host The Mad Botter

Links James on LinkedIn Mike on LinkedIn Mike's Blog Show on Discord Alice Promo AI on Red Hat Enterprise Linux (RHEL) Trust and Stability: RHEL provides the mission-critical foundation needed for workloads where security and reliability cannot be compromised. Predictive vs. Generative: Acknowledging the hype of GenAI while maintaining support for traditional machine learning algorithms. Determinism: The challenge of bringing consistency and security to emerging AI technologies in production environments. Rama-Llama & Containerization Developer Simplicity: Rama-Llama helps developers run local LLMs easily without being "locked in" to specific engines; it supports Podman, Docker, and various inference engines like Llama.cpp and Whisper.cpp. Production Path: The tool is designed to "fade away" after helping package the model and stack into a container that can be deployed directly to Kubernetes. Behind the Firewall: Addressing the needs of industries (like aircraft maintenance) that require AI to stay strictly on-premises. Enterprise AI Infrastructure Red Hat AI: A commercial product offering tools for model customization, including pre-training, fine-tuning, and RAG (Retrieval-Augmented Generation). Inference Engines: James highlights the difference between Llama.cpp (for smaller/edge hardware) and vLLM, which has become the enterprise standard for multi-GPU data center inferencing.

Links James on LinkedIn Mike on LinkedIn Mike's Blog Show on Discord Alice Promo AI on Red Hat Enterprise Linux (RHEL) Trust and Stability: RHEL provides the mission-critical foundation needed for workloads where security and reliability cannot be compromised. Predictive vs. Generative: Acknowledging the hype of GenAI while maintaining support for traditional machine learning algorithms. Determinism: The challenge of bringing consistency and security to emerging AI technologies in production environments. Rama-Llama & Containerization Developer Simplicity: Rama-Llama helps developers run local LLMs easily without being "locked in" to specific engines; it supports Podman, Docker, and various inference engines like Llama.cpp and Whisper.cpp. Production Path: The tool is designed to "fade away" after helping package the model and stack into a container that can be deployed directly to Kubernetes. Behind the Firewall: Addressing the needs of industries (like aircraft maintenance) that require AI to stay strictly on-premises. Enterprise AI Infrastructure Red Hat AI: A commercial product offering tools for model customization, including pre-training, fine-tuning, and RAG (Retrieval-Augmented Generation). Inference Engines: James highlights the difference between Llama.cpp (for smaller/edge hardware) and vLLM, which has become the enterprise standard for multi-GPU data center inferencing.

NOW PLAYING

636: Red Hat's James Huang

0:00 20:53

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of Coder Radio?

This episode is 20 minutes long.

When was this Coder Radio episode published?

This episode was published on December 19, 2025.

What is this episode about?

Links James on LinkedIn Mike on LinkedIn Mike's Blog Show on Discord Alice Promo AI on Red Hat Enterprise Linux (RHEL) Trust and Stability: RHEL provides the mission-critical foundation needed for workloads where security and reliability cannot...

Can I download this Coder Radio episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!