Aug 05, 2021

Joining Forces for an Arrow-Native Future

Josh Patterson, Wes McKinney

Time-lapse Photo Of Cars In Asphalt Road by Ruiyang

This post was adapted from individual blogs of Josh and Wes.

Too often people say “let’s do something together” in passing, and then they don’t follow through. There’s the occasional inter-project collaboration, but rarely will people take that next step. There are countless reasons why this happens, and aligning goals is challenging to say the least. But after spending the last several years working separately on related problems in the data ecosystem, we realized our best hope to make lasting progress was to build a stronger, unified foundation. We needed to do something radically different.

A Brief History

Wes helped start the Apache Arrow project in 2015, and since then has continued to build a developer community to achieve Arrow’s dual goals. The first goal is to be an efficient, language-independent open standard for columnar data interchange. The second goal is to be a portable, high-performance computing foundation for doing analytics on that columnar data. To pursue these goals, Wes formed Ursa Labs in 2018 and Ursa Computing in 2020.

In parallel, Josh and colleagues at NVIDIA foresaw the potential of GPUs to accelerate analytics workloads. In 2017, they created the [GPU Open Analytics Initiative] (https://developer.nvidia.com/blog/goai-open-gpu-accelerated-data-analytics/){:target=”_blank”} and later RAPIDS, which has demonstrated the potential of accelerated high-performance columnar analytics. Josh and the cuDF developers collaborated extensively with BlazingSQL to bring GPU-accelerated Arrow analytics not only to the Python community, but to modern SQL workloads as well.

Over the last 5 years, Arrow has been rapidly adopted as the gold standard for tabular data interchange across the data warehousing and data science ecosystems, bringing massive performance and efficiency improvements to many use cases. Arrow is also taking Flight (pun intended) as a replacement for slow database access protocols like ODBC and JDBC. These frameworks helped numerous projects achieve their goals, but individually, each only addressed some of the community’s needs.

United Foundation

Thus, as a natural progression, the next stage of growth is to see Arrow adopted not only as the standard for fast data movement but also as the native format for cost-effective analytical computing. We envision a ubiquitous, hardware-optimized foundation that simplifies and accelerates data analytics workloads across programming languages.

With this unified vision in mind we launched a new company. The Ursa Computing and BlazingSQL teams joined forces with pioneers of RAPIDS and other open-source projects to form Voltron Data. In the process, Ursa Labs became Voltron Data Labs, and it continues to work for the benefit of the open-source ecosystem around Apache Arrow. Josh and Wes became Voltron Data’s CEO and CTO, respectively.

We are committed as ever to work with the Arrow community and you will see us doing even more work with them than we have in the past, and we look forward to increasing Arrow’s footprint in the world. Together we are unifying our collective expertise in performance, portability, and programmability to build more bridges across the data ecosystem so we improve the tools you know and love.

We look forward to sharing more about Voltron Data in the coming months. In the meantime, we have many open roles and are looking for talented software engineers around the globe to further our mission.

Join us!



Source: https://www.pexels.com/photo/time-lapse-photo-of-cars-in-asphalt-road-3717291/