#22 David vs. Goliath: Open Source Takes on Generative AI Giants - DODS
Episode 2 of the Diaries of a Data Scientist podcast, hosted by Jasmin and Kate, titled "#22 David vs. Goliath: Open Source Takes on Generative AI Giants - DODS" was published on October 6, 2024 and runs 39 minutes.
October 6, 2024 ·39m · Diaries of a Data Scientist
Summary
𝐖𝐞𝐥𝐜𝐨𝐦𝐞 𝐛𝐚𝐜𝐤 𝐭𝐨 𝐭𝐡𝐞 𝐄𝐩𝐢𝐬𝐨𝐝𝐞 21! 🎙 Have you ever wondered how much control you truly have over your Gen. AI models? What about the protection of your data? 🤔 E𝐩𝐢𝐬𝐨𝐝𝐞 #22 𝐨𝐟 “𝐃𝐢𝐚𝐫𝐢𝐞𝐬 𝐨𝐟 𝐚 𝐃𝐚𝐭𝐚 𝐒𝐜𝐢𝐞𝐧𝐭𝐢𝐬𝐭” explores a David vs. Goliath like story: Open source taking on Generative AI Giants. ? Why should you care about models you can run 𝐥𝐨𝐜𝐚𝐥𝐥𝐲 or in your own 𝐩𝐫𝐢𝐯𝐚𝐭𝐞 𝐜𝐥𝐨𝐮𝐝 ? What if you could avoid paying a surplus on each token processed ? And how valuable is the global 𝐜𝐨𝐦𝐦𝐮𝐧𝐢𝐭𝐲 constantly improving and innovating these models? IWe also cover relevant 𝐨𝐩𝐞𝐧-𝐬𝐨𝐮𝐫𝐜𝐞 𝐝𝐚𝐭𝐚𝐬𝐞𝐭𝐬 and models for 𝐭𝐞𝐱𝐭-𝐭𝐨-𝐭𝐞𝐱𝐭 and 𝐭𝐞𝐱𝐭-𝐭𝐨-𝐢𝐦𝐚𝐠𝐞 generation. If you're ready to extend your horizon beyond the standard Gen AI providers, this episode is for you! 🪽 Follow Jasmin on LinkedIn: https://www.linkedin.com/in/jasmin-weimueller-bsc2018/ 🪽 Follow Kate on LinkedIn: https://www.linkedin.com/in/kate-nazarova-data-science/ 🪽 Subscribe to our official DODS page: https://www.linkedin.com/company/diaries-of-data-scientist/ Follow us on Medium👇 🖇 Jasmin’s Medium page: https://medium.com/@JasminWhy 🖇 Kate’s Medium page: https://medium.com/@Kate_in_DS Join us on other platforms: 🎧 Spotify: https://open.spotify.com/show/1DAelRe22W8vBHK7rTU361?si=4e4f3d7bc67546cc 🎧 Apple: https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://podcasts.apple.com/us/podcast/diaries-of-a-data-scientist/id1710961657&ved=2ahUKEwjhm8PdhMWIAxV6qZUCHYZsCDwQFnoECBsQAQ&usg=AOvVaw1deaPC2MF6aWM69-SKSRoH 🎧 Amazon: https://amzn.asia/d/7J3UkTE 🎧 Podimo: https://podimo.com/de/shows/diaries-of-a-data-scientist 🎧 Podscribe: https://app.podscribe.ai/series/2353052 Useful links & Resources: State of Open Source AI :https://github.blog/news-insights/research/the-state-of-open-source-and-ai/ LLaMA https://about.fb.com/news/2024/07/open-source-ai-is-the-path-forward/ **LAION-5B: https://huggingface.co/datasets/danielz01/laion-5b** **The Pile: https://huggingface.co/datasets/EleutherAI/pile** **C4: https://huggingface.co/datasets/legacy-datasets/c4** GPT-Neo / GPT-J: https://huggingface.co/docs/transformers/en/model_doc/gpt_neo; https://huggingface.co/docs/transformers/en/model_doc/gptj **Mixtral 8x7B: https://huggingface.co/mistralai/Mixtral-8x7B-v0.1** **BLOOM: https://bigscience.huggingface.co/blog/bloom** **T5: https://huggingface.co/docs/transformers/en/model_doc/t5** **LLaMA: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/** Stable Diffusion: https://huggingface.co/models?other=stable-diffusion **DALL-E Mini: https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-Mini-Explained--Vmlldzo4NjIxODA; https://github.com/borisdayma/dalle-mini** **FLUX.1: https://www.bentoml.com/blog/a-guide-to-open-source-image-generation-models**
Episode Description
๐๐๐ฅ๐๐จ๐ฆ๐ ๐๐๐๐ค ๐ญ๐จ ๐ญ๐ก๐ ๐๐ฉ๐ข๐ฌ๐จ๐๐ 21! ๐
Have you ever wondered how much control you truly have over your Gen. AI models? What about theย protectionย of your data? ๐ค
E๐ฉ๐ข๐ฌ๐จ๐๐ย #22ย ๐จ๐ โ๐๐ข๐๐ซ๐ข๐๐ฌ ๐จ๐ ๐ ๐๐๐ญ๐ ๐๐๐ข๐๐ง๐ญ๐ข๐ฌ๐ญโ explores a David vs. Goliath like story: Open source taking on Generative AI Giants.
? Why should you care about models you can runย ๐ฅ๐จ๐๐๐ฅ๐ฅ๐ฒย or in your ownย ๐ฉ๐ซ๐ข๐ฏ๐๐ญ๐ ๐๐ฅ๐จ๐ฎ๐
? What if you could avoid paying a surplus on each token processed
? And how valuable is the globalย ๐๐จ๐ฆ๐ฆ๐ฎ๐ง๐ข๐ญ๐ฒย constantly improving and innovating these models?
IWe also cover relevantย ๐จ๐ฉ๐๐ง-๐ฌ๐จ๐ฎ๐ซ๐๐ ๐๐๐ญ๐๐ฌ๐๐ญ๐ฌย and models forย ๐ญ๐๐ฑ๐ญ-๐ญ๐จ-๐ญ๐๐ฑ๐ญย andย ๐ญ๐๐ฑ๐ญ-๐ญ๐จ-๐ข๐ฆ๐๐ ๐ย generation. If you're ready to extend your horizon beyond the standard Gen AI providers, this episode is for you!
๐ชฝ Follow Jasmin on LinkedIn:ย https://www.linkedin.com/in/jasmin-weimueller-bsc2018/
๐ชฝ Follow Kate on LinkedIn:ย https://www.linkedin.com/in/kate-nazarova-data-science/
๐ชฝ Subscribe to our official DODS page:ย https://www.linkedin.com/company/diaries-of-data-scientist/
Follow us on Medium๐
๐ย Jasminโs Medium page:ย https://medium.com/@JasminWhy
๐ย Kateโs Medium page:ย https://medium.com/@Kate_in_DS
Join us on other platforms:
๐งย Spotify: ย https://open.spotify.com/show/1DAelRe22W8vBHK7rTU361?si=4e4f3d7bc67546cc
๐งย Amazon:ย https://amzn.asia/d/7J3UkTE
๐งย Podimo:ย https://podimo.com/de/shows/diaries-of-a-data-scientist
๐งย Podscribe:ย https://app.podscribe.ai/series/2353052
Useful links & Resources:
State of Open Source AI :https://github.blog/news-insights/research/the-state-of-open-source-and-ai/
LLaMAย https://about.fb.com/news/2024/07/open-source-ai-is-the-path-forward/
**LAION-5B:ย https://huggingface.co/datasets/danielz01/laion-5b**
**The Pile:ย https://huggingface.co/datasets/EleutherAI/pile**
**C4:ย https://huggingface.co/datasets/legacy-datasets/c4**
GPT-Neo / GPT-J:ย https://huggingface.co/docs/transformers/en/model_doc/gpt_neo;ย https://huggingface.co/docs/transformers/en/model_doc/gptj
**Mixtral 8x7B:ย https://huggingface.co/mistralai/Mixtral-8x7B-v0.1**
**BLOOM:ย https://bigscience.huggingface.co/blog/bloom**
**T5:ย https://huggingface.co/docs/transformers/en/model_doc/t5**
**LLaMA:ย https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/**
Stable Diffusion:ย https://huggingface.co/models?other=stable-diffusion
**DALL-E Mini:ย https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-Mini-Explained--Vmlldzo4NjIxODA;ย https://github.com/borisdayma/dalle-mini**
**FLUX.1:ย https://www.bentoml.com/blog/a-guide-to-open-source-image-generation-models**
Similar Episodes
Dec 19, 2025 ·43m
Nov 7, 2025 ·54m