Join Us for The Data Thread

May 24, 2022
The Data Thread conference for open source data science best practice

Voltron Data has launched The Data Thread, a free, virtual learning event highlighting Apache Arrow and members of the Arrow community. This inaugural event will be held on June 23rd with a mix of live and pre-recorded sessions. The Data Thread is intended as a space for analysts, developers and data leaders to learn and connect.

Register for The Data Thread Today!

Connecting Data, Scaling Communities

Apache Arrow aims to establish methods and standards for fast, efficient data interoperability through a growing community of open source contributors. That is why this year’s Data Thread theme is “Connecting Data, Scaling Communities”. The live portion of The Data Thread will kick off with a keynote from Wes McKinney and Jacques Nadeau, co-creators of Apache Arrow, hosted by our very own developer advocate Marlene Mhangami.

Participants will be able to enjoy access to 25+ live and pre-recorded talks on Arrow and the broader ecosystem. You will hear about data solutions built and deployed using Apache Arrow, introductory talks on Arrow, Arrow Flight, Ibis Framework and Substrait, and highlights of recent developments across the Arrow ecosystem.

Some Highlights from The Data Thread speaker line-up include:

Paige Bailey (Product Lead at Anyscale) will tell us about how Ray Datasets are the standard way to load and exchange data in Ray libraries and applications. They provide basic, efficient distributed data transformations such as map, filter, and repartition, and are compatible with a variety of file formats, datasources, and distributed frameworks—including Arrow.
Pedro Pedreira (Software Engineer at Meta) will talk about Velox, a novel open source C++ database acceleration library. Velox provides extensible and high-performance data processing components that can be reused to build, enhance, and unify execution engines focused on different workloads. Velox is integrated with more than a dozen data systems at Meta, from analytical query engines to machine learning and more.
Randy Zwitch (Head of Developer Relations, Streamlit) will talk about the replacement of the homegrown Streamlit data serialization with Apache Arrow. Not only did performance improve in moving to Apache Arrow, but it also dramatically streamlined the Streamlit codebase. Come listen to a high-level discussion of the product and engineering tradeoffs of moving to Apache Arrow and why it made perfect sense for Streamlit.

Check out the web site for the full list of speakers.

We are excited to host The Data Thread. There are still a few slots open so please consider joining the growing list of inaugural speakers and sharing your experience with Apache Arrow–talk submissions are open until May 31st, 2022. We look forward to seeing you on June 23rd.