EPISODE · Jun 16, 2022 · 1H 7M
#77 - Vitaliy Chiley (Cerebras)
from Machine Learning Street Talk (MLST)
Vitaliy Chiley is a Machine Learning Research Engineer at the next-generation computing hardware company Cerebras Systems. We spoke about how DL workloads including sparse workloads can run faster on Cerebras hardware. [00:00:00] Housekeeping [00:01:08] Preamble [00:01:50] Vitaliy Chiley Introduction [00:03:11] Cerebrus architecture [00:08:12] Memory management and FLOP utilisation [00:18:01] Centralised vs decentralised compute architecture [00:21:12] Sparsity [00:23:47] Does Sparse NN imply Heterogeneous compute? [00:29:21] Cost of distributed memory stores? [00:31:01] Activation vs weight sparsity [00:37:52] What constitutes a dead weight to be pruned? [00:39:02] Is it still a saving if we have to choose between weight and activation sparsity? [00:41:02] Cerebras is a cool place to work [00:44:05] What is sparsity? Why do we need to start dense? [00:46:36] Evolutionary algorithms on Cerebras? [00:47:57] How can we start sparse? Google RIGL [00:51:44] Inductive priors, why do we need them if we can start sparse? [00:56:02] Why anthropomorphise inductive priors? [01:02:13] Could Cerebras run a cyclic computational graph? [01:03:16] Are NNs locality sensitive hashing tables? References; Rigging the Lottery: Making All Tickets Winners [RIGL] https://arxiv.org/pdf/1911.11134.pdf [D] DanNet, the CUDA CNN of Dan Ciresan in Jurgen Schmidhuber's team, won 4 image recognition challenges prior to AlexNet https://www.reddit.com/r/MachineLearning/comments/dwnuwh/d_dannet_the_cuda_cnn_of_dan_ciresan_in_jurgen/ A Spline Theory of Deep Learning [Balestriero] https://proceedings.mlr.press/v80/balestriero18b.html
What this episode covers
Vitaliy Chiley is a Machine Learning Research Engineer at the next-generation computing hardware company Cerebras Systems. We spoke about how DL workloads including sparse workloads can run faster on Cerebras hardware. [00:00:00] Housekeeping [00:01:08] Preamble [00:01:50] Vitaliy Chiley Introduction [00:03:11] Cerebrus architecture [00:08:12] Memory management and FLOP utilisation [00:18:01] Centralised vs decentralised compute architecture [00:21:12] Sparsity [00:23:47] Does Sparse NN imply Heterogeneous compute? [00:29:21] Cost of distributed memory stores? [00:31:01] Activation vs weight sparsity [00:37:52] What constitutes a dead weight to be pruned? [00:39:02] Is it still a saving if we have to choose between weight and activation sparsity? [00:41:02] Cerebras is a cool place to work [00:44:05] What is sparsity? Why do we need to start dense? [00:46:36] Evolutionary algorithms on Cerebras? [00:47:57] How can we start sparse? Google RIGL [00:51:44] Inductive priors, why do we need them if we can start sparse? [00:56:02] Why anthropomorphise inductive priors? [01:02:13] Could Cerebras run a cyclic computational graph? [01:03:16] Are NNs locality sensitive hashing tables? References; Rigging the Lottery: Making All Tickets Winners [RIGL] https://arxiv.org/pdf/1911.11134.pdf [D] DanNet, the CUDA CNN of Dan Ciresan in Jurgen Schmidhuber's team, won 4 image recognition challenges prior to AlexNet https://www.reddit.com/r/MachineLearning/comments/dwnuwh/d_dannet_the_cuda_cnn_of_dan_ciresan_in_jurgen/ A Spline Theory of Deep Learning [Balestriero] https://proceedings.mlr.press/v80/balestriero18b.html
NOW PLAYING
#77 - Vitaliy Chiley (Cerebras)
No transcript for this episode yet
Similar Episodes
Apr 21, 2026 ·13m
Apr 19, 2026 ·16m
Apr 17, 2026 ·13m
Apr 13, 2026 ·11m
Apr 11, 2026 ·16m