Aug 25, 2023

Get Ready to Talk Composable Data Systems at VLDB 2023

Maura Hollebeek

Image of Vancouver, Canada skyline with VLDB 2023 logo on top

The Very Large Database (VLDB) conference is one of the most influential events in the field of database research. The event bridges academia and industry, with individuals from research, industry, startups, and more, all coming together to share knowledge and best practices.

This week, the 49th Annual Conference kicks off in Vancouver (8/28 - 9/1). Topics span from Quantum Data Science to LLMs, GenAI, and more. While these are prevailing trends in the data ecosystem – we suspect the main draw will center on topics around innovation in data management, storage, processing, and infrastructure.

Paving the Way for Composable Data Systems

At the heart of robust research events like VLDB are the technical workshops; where individuals rally around a central focus to share knowledge and have open discussions about trends, challenges, or opportunities.

When it comes to VLDB and database innovation, we look forward to the International Workshop on Composable Data Management Systems, or CDMS, where the focus is on building composable and reusable data management systems.

The event will have keynotes from Meta, Firebolt, MotherDuck, and Neon Database with talks and contributions from institutions including Databricks, Meta, Snowflake, Microsoft, Oracle, the University of Melbourne, IT University of Copenhagen, and the University of California Irvine

Given this line-up and the list of keynotes, talks, and accepted papers, is it safe to consider this workshop as an incubator moment for the future of data system architecture?

We think so. This is what Voltron Data will be on the ground covering composability during VLDB and CDMS. We will share our vision for composable data systems and dive into work happening with hardware acceleration and language interoperability.

Find Voltron Data at VLDB & CDMS

The Composable Data Management System Manifesto (VLDB Paper Session)

  • Tuesday, 8/29 from 10:30-12:30 PT
  • Authors: Pedro Pedreira, Orri Erling, Konstantinos Karanasos, and Scott Schneider from Meta; Wes McKinney from Voltron Data; Satya Valluri and; Mohamed Zait from Databricks; Jacques Nadeau from Sundeck
  • Read the paper
note icon
“We foresee that composability is soon to cause another major disruption to how data management systems are designed. We foresee that monolithic systems will become obsolete, and give space to a new composable era for data management.”

Learn about the trends leading us to the next disruption in data system design. A vision will be laid out for composable data systems and how we see it accelerating innovation in the space. We’ll present the components of the reference architecture, and their APIs, and talk about how open source is paving the way for this. This paper was created in partnership with Meta, Databricks, Intel, Sundeck, and our team at Voltron Data.

Open Source Modular Data Stack Outline

Source: The Composable Data Management System Manifesto


Using Multiple Composable, Hardware-Accelerate Executors (CDMS Talk)

  • Monday, 8/28 from 5:20pm - 5:25pm PT in the Port McNeil Room
  • Presenter: Felipe Aramburu, Co-Founder and VP of Engineering, Voltron Data

The number of FLOPs is steadily increasing and the cost of memory is decreasing – at a fast rate. With a variety of hardware acceleration technologies available and the evolution of high-speed interconnects, teams now have the ability to build data analytics systems that can adapt to new hardware and execution capacities.

The talk will cover how to use composable, open source executors for GPUs (RAPIDS CuDF) and CPUs (Velox) using hardware accelerators in data systems.

Empirical GPU FLOP/s per dollar
Source: Marius Hobbhahn and Tamay Besiroglu (2022), ”Trends in GPU price-performance”.
Retrieved from: ’https://epochai.org/blog/trends-in-gpu-price-performance’


Polygraph - An Arrow Format for Large Apache Calcite Query Plans and Multi-Language Data Systems (CDMS Talk)

  • Monday, 8/28 from 11:55am - 12pm PT in the Port McNeil Room
  • Presenter: Adam Kennedy, Director of Engineering at Voltron Data

Future query engines are trending away from Java – and Apache Calcite remains the only mature choice for query planning on a wide range of workloads. However, it is difficult to use Calcite for dynamic query planning in a diverse, multi-language tool stack. This is where Arrow comes into the equation.

The talk will present an experimental format based on the Arrow format: Polygraph. It allows query plans to be exchanged efficiently between tools in different languages with minimal serialization overhead. This responds to the upward trend in the complexity and size of query graphics – especially when it comes to integration with machine learning workloads.

Composable Data Systems Beyond VLDB ‘23

This is an exciting time to be on the cusp of innovation when it comes to defining, designing, and building composable data systems. As always, open source needs your help. Jump in and help shape the future of composability by contributing to leading projects like Velox, Arrow, Ibis, Calcite, Substrait, and others.