EPISODE · Jun 12, 2026 · 7 MIN
Optimize custom machine learning operations with Metal tensors
from Podkey WWDC 2026
A Podkey summary of Optimize custom machine learning operations with Metal tensors, from WWDC 2026.A lot of this week’s Apple graphics and AI story really comes down to one idea: getting more model work closer to the hardware, with less fuss for developers. The big pieces are the M5’s new neural accelerator inside each shader core, broader quantized tensor support in TensorOps, and a pretty practical path for building things like FlashAttention and dropping them into real models. It’s fairly technical stuff, but the throughline is simple enough: faster inference, less wasted memory movement, and fewer awkward hand-built code paths.The M5 adds AI help right inside the GPU coresQuantization gets broader and a lot more usableA scale plane keeps quantized tensors togetherCooperative tensors cut down on memory trafficHow FlashAttention fits into all thisCustom Metal kernels can plug into real model workflowsThe dequantization trade-off is speed versus extra movementThis podcast was created with Podkey. Make your own at https://podkey.fm
NOW PLAYING
Optimize custom machine learning operations with Metal tensors
No transcript for this episode yet
Similar Episodes
May 14, 2026 ·360m
May 14, 2026 ·310m
May 14, 2026 ·205m
May 14, 2026 ·85m
May 14, 2026 ·282m