Open-Source Data Issues and Developments: Summit Insights

Open-source data issues and developments are reshaping how organizations manage and leverage data. At the recent summit, experts addressed key challenges and emerging trends in the open-source ecosystem. Understanding open-source data issues and developments is vital for staying ahead in the data-driven world.

Summit speakers discussed the open-source data ecosystem and its role in modern businesses.

Data infrastructure is critical as data volumes continue to explode and as businesses try to get more value and insights out of their data. Open-source technologies and solutions continue to play ever-important and essential roles. These and other themes were the focus of the recent Open Source Data Summit.

According to Onehouse, one of the summit’s sponsors, the live virtual event attracted thousands of registrants from around the world and included more than 30 speakers.

Onehouse Founder and CEO Vinoth Chandar kicked off the day with a keynote address that provided an overview of the role of open source in data infrastructure. Chandar discussed the history of open source and provided an overview of the different tools and technologies in the open data ecosystem, including databases, data lakes, data warehouses, stream processing, and more. Chandar emphasized the need for a thoughtful strategy when adopting open-source data solutions and highlighted the challenges and considerations involved. The talk concluded with a discussion of a blueprint for an open data architecture that offers flexibility, interoperability, and control.

Industry leaders talk open-source data

The one-day summit included speakers from Netflix, Uber, Walmart, LinkedIn, Tesla, Wayfair, Google, Microsoft, and more.

One particularly interesting session covered OneTable, a new open-source project that “unlocks omni-directional interoperability between the popular lakehouse projects Apache Hudi, Delta Lake, and Apache Iceberg.” Speakers included Ashvin Agrawal, Senior Researcher at Microsoft; Tim Brown, Engineering at Onehouse; and Anoop Johnson, Senior Staff Software Engineer at Google.

According to the speakers, OneTable offers lightweight conversion mechanisms that can take a source metadata format and sync it into one or more target metadata formats. The session featured a live demo, and participants described how to build open data foundations that could accelerate workloads into a variety of open-source query engines, including Spark, Presto, Trino, Flink, and more. The session is available on-demand here.

Other sessions included talks by:

Jordan West, Staff Software Engineer at Netflix, on the practicalities of deploying open-source databases.
Patrick McFadin, VP of Developer Relations at DataStax, on A petabyte-scale vector store for generative AI.
Ankur Ranjan, Data Engineer III, and Ayush Bijawat, Senior Data Engineer, both from Walmart, on enabling Walmart’s data lakehouse with Apache Hudi.
Tun Shwe, VP of Data at Quix, and Jay Clifford, Developer Advocate at InfluxData, on data plumbing basics: Build, deploy, and scale ML models for your time series data.
Nishith Agarwal, Head of Data & ML Platforms at Lyra Health, on making decisions that are right for your data platform.
Siddharth Jain, Senior Engineering Manager at Wayfair, on options for real-time data pipelines.

In addition to these and other sessions, there were several panel discussions sprinkled throughout the day. One focused on batch, streaming, and real-time data processing for ML, with speakers from Eastern Bank, Intuit, and Tecton. Another examined the growing role of open-source technology in today’s data architectures. There were speakers on this panel from Onehouse, Microsoft, Confluent, LinkedIn, Starburst, Uber, and Google.

A complete list of the sessions and panels, all of which are available on-demand, can be found at the summit’s website here.

Silicon Media Network

Summit Explores Open-Source Data Issues and Developments

Industry leaders talk open-source data

Leave a Reply Cancel reply

Faster Investigations with AI: Build the Full Story Across Every Site

Weaponised AI and the SMB: Defend the Threat, Govern the Risk, Grow the Business

Baanda – Brand Recall & Market Discovery Survey

Tanium, From Poor Visibility to Autonomous Operations: A New Model for Government Cybersecurity