Data Archives - Software Engineering Daily Podcast

100

Hyperscaling SQL with Sam Lambert

Databases underpin almost every user experience on the web, but scaling a database is one of the most fundamental infrastructure challenges in software development. PlanetScale offers a MySQL platform that is managed and highly scaleable. Sam Lambert is the CEO of PlanetScale and he joins the show to talk about why he started the platform, The post Hyperscaling SQL with Sam Lambert appeared first on Software Engineering Daily.

Jul 4, 2024

99

Iceberg at Netflix and Beyond with Ryan Blue

Apache Iceberg is an open source high-performance format for huge data tables. Iceberg enables the use of SQL tables for big data, while making it possible for engines like Spark and Hive to safely work with the same tables, at the same time. Iceberg was started at Netflix by Ryan Blue and Dan Weeks, and The post Iceberg at Netflix and Beyond with Ryan Blue appeared first on Software Engineering Daily.

Mar 7, 2024

47m

98

Building a Data Lake with Adam Ferrari

Starburst is a data lake analytics platform. It’s designed to help users work with structured data at scale, and is built on the open source platform, Trino. Adam Ferrari is the SVP of Engineering at Starburst. He joins the show to talk about Starburst, data engineering, and what it takes to build a data lake. The post Building a Data Lake with Adam Ferrari appeared first on Software Engineering Daily.

Feb 6, 2024

46m

97

Rama with Nathan Marz

Building scalable software applications can be complex and typically requires dozens of different tools. The engineering often involves handling many arcane tasks that are distant from actual application logic. In addition, a lack of a cohesive model for building applications can lead to substantial engineering costs. Nathan Marz is the creator of Rama, which is The post Rama with Nathan Marz appeared first on Software Engineering Daily.

Dec 28, 2023

45m

96

Bonus Episode: SurrealDB with Tobie Morgan Hitchcock

SurrealDB is the result of a long-time collaboration between brothers Tobie and Jaime Morgan Hitchcock. The project has modest origins and started merely to support other projects the brothers were working on. However, over time the project grew and in 2021 they started working on it full-time. Since then the project has gained serious adoption. The post Bonus Episode: SurrealDB with Tobie Morgan Hitchcock appeared first on Software Engineering Daily.

Dec 25, 2023

57m

95

Tracking Drug Smugglers and Migrating Databases with Benny Keinan and Lior Resisi

Maritime logistics is the process organizing the movement of goods across the ocean. Historically, this has been a challenging problem because of the multinational nature of shipping, as well as piracy, smuggling, and legacy technology. It’s also profoundly important for security reasons, and because 90% of what we buy travels over the oceans. Ocean vessels The post Tracking Drug Smugglers and Migrating Databases with Benny Keinan and Lior Resisi appeared first on Software Engineering Daily.

Dec 7, 2023

50m

94

The Right to Be Forgotten with Gal Ringel

Data breaches at major companies are so now common that they hardly make the news. The Wikipedia page on data breaches lists over 350 between 2004 and 2023. The Equifax breach in 2017 was especially notable because over 160 million records were leaked, and much of the data was acquired by Equifax without individuals’ knowledge The post The Right to Be Forgotten with Gal Ringel appeared first on Software Engineering Daily.

Nov 29, 2023

47m

93

Sofascore with Josip Stuhli

If you’re a sports fan and like to track sports statistics and results, you’ve probably heard of Sofascore. The website started in 2010 and ran on a modest single server. It now has 25 million monthly active users, covers 20 different sports, 11,000 leagues and tournaments, and is available in over 30 languages.   Josip The post Sofascore with Josip Stuhli appeared first on Software Engineering Daily.

Nov 28, 2023

49m

92

Chronosphere with Martin Mao

Observability software helps teams to actively monitor and debug their systems, and these tools are increasingly vital in DevOps. However, it’s not uncommon for the volume of observability data to exceed the amount of actual business data. This creates two challenges – how to analyze the large stream of observability data, and how to keep The post Chronosphere with Martin Mao appeared first on Software Engineering Daily.

Nov 9, 2023

48m

91

Streamlit with Amanda Kelly

The importance of data teams is undeniable. Most companies today use data to drive decision-making on anything from software feature development to product strategy, hiring and marketing. In some companies data is the product, which can make data teams even more vital. But there’s a common problem – analyzing data is hard and time consuming. The post Streamlit with Amanda Kelly appeared first on Software Engineering Daily.

Oct 24, 2023

47m

90

Modern Web Scraping with Erez Naveh

Today it’s estimated there are over 1 billion websites on the internet. Much of this content is optimized to be viewed by human eyes, not consumed by machines. However, creating systems to automatically parse and structure the web greatly extends its utility, and paves the way for innovative solutions and applications. The industry of web The post Modern Web Scraping with Erez Naveh appeared first on Software Engineering Daily.

Oct 18, 2023

57m

89

AI and Business Analytics with John Adams

It’s now clear that the adoption of AI will continue to increase, with nearly every industry working to rapidly incorporate it into their systems and applications to provide greater value to their users. Business analytics is a key domain that promises to be radically reshaped by AI. Alembic is an AI platform that integrates web The post AI and Business Analytics with John Adams appeared first on Software Engineering Daily.

Oct 5, 2023

30m

88

Database Caching with Ben Hagan

Database caching is a fundamental challenge in database management and there are hundreds of techniques to satisfy different caching scenarios. PolyScale is a fully automated database cache. It offers an innovative approach to database caching, leveraging AI and automated configuration to simplify the process of determining what should and should not be cached. Ben Hagan The post Database Caching with Ben Hagan appeared first on Software Engineering Daily.

Aug 8, 2023

35m

87

Data-Centric AI with Alex Ratner

Companies have high hopes for Machine learning and AI to support real-time product offerings, prevent fraud and drive innovation. But there was a catch – training models require labeled data that machines can digest. As data volumes increase, the opportunity to get great ML results rises, but so does the problem of labeling all the The post Data-Centric AI with Alex Ratner appeared first on Software Engineering Daily.

Jul 20, 2023

50m

86

Making Data-Driven Decisions with Soumyadeb Mitra

RudderStack is a warehouse-native customer data platform (CDP) that helps businesses collect, unify, and activate customer data from all their different sources. In today’s episode, we’re talking to Soumyadeb Mitra, the founder and CEO of RudderStack. We discuss the importance of activating all your data, how RudderStack can help you activate your data, the challenges The post Making Data-Driven Decisions with Soumyadeb Mitra appeared first on Software Engineering Daily.

Jul 11, 2023

50m

85

Customer-facing Analytics with Tyler Wells

The state of Data inside most companies is chaotic. It takes significant time and investment to tame this chaos. When you are a platform provider you are gathering tons of data from the developers using your platform. These developers building products on your platform need insight into that data to better understand how their application The post Customer-facing Analytics with Tyler Wells appeared first on Software Engineering Daily.

Jun 30, 2023

51m

84

Data Reliability with Barr Moses and Lior Gavish

As companies depend more on data to improve digital products and make informed decisions, it’s crucial that the data they use be accurate and reliable. MonteCarlo, the data reliability company, is the creator of the industry’s first end-to-end data observability platform. Barr Moses and Lior Gavish are the founders of Monte Carlo and they join The post Data Reliability with Barr Moses and Lior Gavish appeared first on Software Engineering Daily.

Jun 12, 2023

56m

83

Low-Code SQL on dbt Core with Raj Bains from Prophecy

In this podcast episode, we take a look at the intricacies of low-code data pipelines with Raj Bains, the founder of Prophecy.io. Raj shares valuable insights into how performant low-codedata pipelines are revolutionizing industries and transforming everyday operations. Raj discusses the founding story of Prophecy.io, the company’s mission, and its approach to democratizing the creation The post Low-Code SQL on dbt Core with Raj Bains from Prophecy appeared first on Software Engineering Daily.

May 26, 2023

54m

82

Open-Source Embedding Database with Anton Troynikov

Chroma is an open source embedding database that is designed to make it easy to build large language model applications by making knowledge, facts and skills pluggable. Anton Troynikov is the co-founder of Chroma and he is our guest today. This episode is hosted by Lee Atchison. Lee Atchison is a software architect, author, and The post Open-Source Embedding Database with Anton Troynikov appeared first on Software Engineering Daily.

Apr 20, 2023

32m

81

Data Activation with Tejas Manohar

Data Activation is the method of unlocking the knowledge sorted within your data warehouse, and making it actionable by your business users in the end tools that they use every day. In doing so, Data Activation helps bring data people toward the center of the business, directly tying their work to business outcomes. Hightouch is The post Data Activation with Tejas Manohar appeared first on Software Engineering Daily.

Apr 13, 2023

41m

80

Self-Service Data Culture with Stemma’s Mark Grover

A data catalog provides an index into the data sets and schemas of a company.Data teams are growing in size, and more companies than ever have a data team, so the market for data catalog is larger than ever. Mark is the CEO of Stemma and the co-creator of Amundsen, a data catalog that came The post Self-Service Data Culture with Stemma’s Mark Grover appeared first on Software Engineering Daily.

Apr 7, 2023

46m

79

Streaming Analytics with Hojjat Jafarpour

Streaming analytics refers to the process of analyzing real-time data that is generated continuously and rapidly from various sources, such as sensors, applications, social media, and other internet-connected devices. Streaming analytics platforms enable organizations to extract business value from data in motion, similar to how traditional analytics tools derive insights from data at rest. DeltaStream The post Streaming Analytics with Hojjat Jafarpour appeared first on Software Engineering Daily.

Apr 6, 2023

46m

78

Observability Trends with John Hart

DataSet is a log analytics platform provided by Sentinel One that helps DevOps, IT engineering, and security teams get answers from their data across all time periods, both live streaming and historical. It’s powered by a unique architecture that uses a massively parallel query engine to provide actionable insights from the data available. John Hart The post Observability Trends with John Hart appeared first on Software Engineering Daily.

Mar 20, 2023

26m

77

Accessing Data at Scale with Justin Borgman

The Presto/Trino project makes distributed querying easier across a variety of data sources. As the need for machine learning and other high volume data applications has increased, the need for support, tooling, and cloud infrastructure for Presto/Trino has increased with it. Starburst helps your teams run fast queries on any data source. With Starburst you The post Accessing Data at Scale with Justin Borgman appeared first on Software Engineering Daily.

Nov 11, 2022

46m

76

Building on the Data Cloud with Torsten Grabs

Building and managing data-intensive applications has traditionally been costly and complex, and has placed an operational burden on developers to maintain as their organization scales. Todays’ developers, data scientists, and data engineers need a streamlined, single cloud data platform for building applications, pipelines, and machine learning models — without having to move or copy their The post Building on the Data Cloud with Torsten Grabs appeared first on Software Engineering Daily.

Nov 7, 2022

40m

75

Serverless Clickhouse for Developers with Jorge Sancha

Data analytics technology and tools have seen significant improvements in the past decade. But, it can still take weeks to prototype, build and deploy new transformations and deployments, usually requiring considerable engineering resources. Plus, most data isn’t real-time. Instead, most of it is still batch-processed. Tinybird Analytics provides an easy way to ingest and query The post Serverless Clickhouse for Developers with Jorge Sancha appeared first on Software Engineering Daily.

Sep 12, 2022

35m

74

Data Infrastructure for Finance

Data is becoming a bank’s biggest asset. These complex enterprises have a huge opportunity ahead – to transform themselves to become a trusted hub of a much broader data ecosystem that goes beyond the financial industry and helps to form a new class of cross-industry experience architectures that are scalable and transparent. The data physics The post Data Infrastructure for Finance appeared first on Software Engineering Daily.

Aug 18, 2022

54m

73

Faking Data Using Tonic.ai with Ian Coe and Adam Kamor

Ian Coe CEO Adam Kamor Head of Engineering Companies that gather data about their users have an ethical obligation and legal responsibility to protect the personally identifiable information in their dataset. Ideally, developers working on a software application wouldn’t need access to production data. Yet without high-quality example data, many technology groups stumble on avoidable The post Faking Data Using Tonic.ai with Ian Coe and Adam Kamor appeared first on Software Engineering Daily.

Aug 5, 2022

46m

72

Couchbase with Ravi Mayuram

Couchbase is a distributed NoSQL cloud database. Since its creation, Couchbase has expanded into edge computing, application services, and most recently, a database-as-a-service called Capella. Couchbase started as an in-memory cache and needed to be rearchitected to be a persistent storage system. In this episode, We interviewed Ravi Mayuram, SVP Products, and Engineering at Couchbase. The post Couchbase with Ravi Mayuram appeared first on Software Engineering Daily.

Jul 28, 2022

30m

71

Decodable Streaming with Eric Sammer

Streaming data platforms like Kafka, Pulsar, and Kinesis are now common in mainstream enterprise architectures, providing low-latency real-time messaging for analytics and applications. However, stream processing – the act of filtering, transforming, or analyzing the data inside the messages – is still an exercise left to the receiving microservice or datastore, a custom programming exercise The post Decodable Streaming with Eric Sammer appeared first on Software Engineering Daily.

Jun 1, 2022

44m

70

Data Delivery with Naqeeb Memon

  Data-as-a-service is a company category type that is not as common as API-as-a-service, software-as-a-service, or platform-as-a-service. In order to vend data, a data-as-a-service provider needs to define how that data will be priced, stored, and delivered to users: streaming over an API or served via static files. Naqeeb Memon of Safegraph joins the show The post Data Delivery with Naqeeb Memon appeared first on Software Engineering Daily.

May 14, 2022

28m

69

Data Labeling with Michael Malyuk

Data labeling allows machine learning algorithms to find patterns among the data. There are a variety of data labeling platforms that enable humans to apply labels to this data and ready it for algorithms. Heartex is a data labeling platform with an open source core. Michael Malyuk joins the show to talk through the platform The post Data Labeling with Michael Malyuk appeared first on Software Engineering Daily.

May 11, 2022

41m

68

Pinot and StarTree with Chinmay Soman

Real-time analytics are difficult to achieve because large amounts of data must be integrated into a data set as that data streams in. As the world moved from batch analytics powered by Hadoop into a norm of “real-time” analytics, a variety of open source systems emerged. One of these was Apache Pinot. StarTree is a The post Pinot and StarTree with Chinmay Soman appeared first on Software Engineering Daily.

May 9, 2022

44m

67

Data Loss Prevention with Yasir Ali

Data loss can occur when large data sources such as Slack or Google Drive get leaked. In order to detect and avoid leaks, a data asset graph can be built to understand the risks of a company environment. Polymer is a data loss prevention product that helps companies avoid problematic data leaks. Yasir Ali is The post Data Loss Prevention with Yasir Ali appeared first on Software Engineering Daily.

Apr 29, 2022

40m

66

Airbyte Engineering with Michel Tricot

Data integration infrastructure is not easy to build. Moving large amounts of data from one place to another has historically required developers to build ad hoc integration points to move data between SaaS services, data lakes, and data warehouses. Today, there are dedicated systems and services for moving these large batches of data. Airbyte builds The post Airbyte Engineering with Michel Tricot appeared first on Software Engineering Daily.

Apr 27, 2022

42m

65

Select Star with Shinji Kim

Modern organizations eventually face data governance challenges. Keeping track of where data came from, what systems update it, in what ways updates can be made are just some of the issues to be tackled. Large organizations face additional challenges like training, onboarding, and capturing the institutional knowledge that leaves with the departure of key team The post Select Star with Shinji Kim appeared first on Software Engineering Daily.

Apr 25, 2022

42m

64

Time Series IoT on InfluxDB with Brian Gilmore

The solution many turn to for capturing their streaming data is InfluxDB. In this episode, I interview Brian Gilmore, Director of Product Management at InfluxData, about how real time applications achieve success built on top of InfluxDB. When most people hear the phrase Internet of Things, it typically evokes an image of connected devices we The post Time Series IoT on InfluxDB with Brian Gilmore appeared first on Software Engineering Daily.

Apr 14, 2022

48m

63

Data Engineering Trends with Lior Gavish and James Densmore

Lior Gavish James Densmore Data infrastructure is a fast-moving sector of the software market. As the volume of data has increased, so too has the quality of tooling to support data management and data engineering. In today’s show, we have a guest from a data intensive company as well as a company that builds a The post Data Engineering Trends with Lior Gavish and James Densmore appeared first on Software Engineering Daily.

Apr 5, 2022

43m

62

PlanetScale Management with Sam Lambert

Running a database company requires expertise in both technical and managerial skills. There are deeply technical engineering questions around query paths, scalability, and distributed systems. And there are complex managerial questions around developer productivity and task allocation. Sam Lambert is the CEO of PlanetScale, which is building modern relational database infrastructure. Before PlanetScale he spent The post PlanetScale Management with Sam Lambert appeared first on Software Engineering Daily.

Mar 31, 2022

49m

61

DuckDB with Hannes Muleisen

DuckDB is a relational database management system with no external dependencies, with a simple system for deployment and integration into build processes. It enables complex queries in SQL with a large function library, and provides transactional guarantees through multi-version concurrency control. Hannes Mühleisen works on DuckDB and joins the show to talk about query engines The post DuckDB with Hannes Muleisen appeared first on Software Engineering Daily.

Mar 19, 2022

49m

60

RudderStack Engineering with Soumaydeb Mitra

Customer data pipelines power the backend of many successful web platforms. In a customer data pipeline, data is collected from sources such as mobile apps and cloud SaaS tools, transformed and munged using data engineering, stored in data warehouses, and piped to analytics, advertising platforms, and data infrastructure. RudderStack is an open source customer data The post RudderStack Engineering with Soumaydeb Mitra appeared first on Software Engineering Daily.

Mar 16, 2022

46m

59

Apache Hudi with Vinoth Chandar

The data lake architecture has become broadly adopted in a relatively short period of time. In a nutshell, that means data in it’s raw format stored in cloud object storage. Modern software and data engineers have no shortage of options for accessing their data lake, but that list shrinks quickly if you care about features The post Apache Hudi with Vinoth Chandar appeared first on Software Engineering Daily.

Mar 9, 2022

43m

58

Data Catalog in Practice with Mark Grover

A data catalog provides an index into the data sets and schemas of a company. Data teams are growing in size, and more companies than ever have a data team, so the market for data catalog is larger than ever. Mark is the CEO of Stemma and the co-creator of Amundsen, a data catalog that came out of The post Data Catalog in Practice with Mark Grover appeared first on Software Engineering Daily.

Feb 25, 2022

51m

57

Splunk Platform with Spiros Xanthos

Splunk is a monitoring and logging platform that has evolved over its 18 years of existence. In its modern focus on observability it is focused on open source and AIOps. Observability has evolved with the growth of Kubernetes, and Splunk’s work around OpenTelemetry has kept parity with the open source community of Kubernetes. Spiros Xanthos The post Splunk Platform with Spiros Xanthos appeared first on Software Engineering Daily.

Feb 23, 2022

43m

56

Hex Collaborative Data Workspace with Barry McCardel and Caitlin Colgrove

Barry McCardel Co-Founder and CEO at Hex Caitlin Colgrove Co-Founder and CTO at Hex In contrast to other IDEs, the notebook interface offers software developers a unique environment idealized for data professionals. Despite the growth in popularity, a surprising learning curve still exists for setup and configuration. A siloed notebook offers no native collaboration tools. The post Hex Collaborative Data Workspace with Barry McCardel and Caitlin Colgrove appeared first on Software Engineering Daily.

Feb 18, 2022

45m

55

Data Quality Using Anomalo with Jeremy Stanley

When writing code, test driven development is a common accepted methodology to ensure the development of high quality software. Your organization’s data, on the other hand, is an entirely different challenge. Data can be missing due to human error, a failure with a 3rd party provider, a botched release, or dozens of other issues. When The post Data Quality Using Anomalo with Jeremy Stanley appeared first on Software Engineering Daily.

Feb 17, 2022

46m

54

Couchbase Architecture with Ravi Mayuram

Couchbase is a distributed NoSQL cloud database. Since its creation, Couchbase has expanded into edge computing, application services, and most recently a database-as-a-service called Capella. Couchbase started as an in-memory cache and needed to be rearchitected to be a persistent storage system. In this episode, I interview Ravi Mayuram, SVP Products and Engineering at Couchbase The post Couchbase Architecture with Ravi Mayuram appeared first on Software Engineering Daily.

Jan 28, 2022

58m

53

Trifacta with Joe Hellerstein

If you haven’t encountered a data quality problem, then you haven’t yet worked on a large enough project. Invariably, a gap exists between the state of raw data and what an analyst or machine learning engineer needs to solve their problem. Many organizations needing to automate data preparation workflows look to Trifacta as a solution. The post Trifacta with Joe Hellerstein appeared first on Software Engineering Daily.

Dec 21, 2021

41m

52

MemGraph with Dominik Tomicevic

Relational databases have been a fixture of software applications for decades. They are highly tuned for performance and typically offer explicit guarantees like transactional consistency. More recently, there’s been a figurative cambrian explosion of other-than-relational databases. Simple key value stores or counters were an early win in this space. Managing a graph data structure is The post MemGraph with Dominik Tomicevic appeared first on Software Engineering Daily.

Dec 10, 2021

42m

51

Amplemarket with João Batalha

The lifeblood of most companies is their sales departments. When you’re selling something other than a commodity, it’s typically necessary to carefully groom the onboarding experience for inbound future customers. Historically, companies approached this in a one-size-fits-all manner, giving all customers a common experience. In today’s data-driven age, a better experience can be provided that The post Amplemarket with João Batalha appeared first on Software Engineering Daily.

Dec 9, 2021

38m