Jun 21, 2022

Explore Practical Use Cases Within Apache Arrow at The Data Thread

Fernanda Foertter

The Data Thread Conference Cover

One of the best things about working in the open source world is the community. But sometimes an open source tool is adopted so quickly that the community barely has time to organize. This is the case with Apache Arrow. Initially announced by the Apache Software Foundation just six years ago, Apache Arrow has been adopted by dozens of projects as the de facto in-memory columnar storage and data exchange standard.

With rapidly increasing interest, it was time to bring together Apache Arrow users and developers. We are proud to make this a reality by hosting The Data Thread on June 23rd. The lineup includes several introductory talks for those who are curious about how to implement Apache Arrow into projects across a variety of sectors using their language of choice.

Speeding up DNA analysis

Two speakers will cover how Apache Arrow is being used to speed up DNA analysis from sequencer to diagnostics. Chris Seymour will speak about Oxford Nanopore’s new sequencing format POD5, which allows data to be streamed out of the instrument and lowers the file size considerably. Zaid Al-Ars will show a DNA analysis pipeline based on Apache Arrow.

Pivoting Quant and Analytics

Apache Arrow loves tabular data and we have several talks highlighting Tableau, PyArrow, and other APIs. Learn how some of your favorite analytics tools can be integrated with Arrow. Will Ayd’s lightning talk on Tableau will illustrate this point by drawing from his experience with creating tools to manage workflows.

Mapping Geospatial Data

If you are curious about searching and parsing geospatial data, we have two excellent speakers covering tools built specifically for this community. Dewey Dunningham will cover GeoParquet and GeoArrow formats and John Murray will show how to use convolutional neural networks (CNNs) for multispectral analysis of satellite images.

Overall, we hope The Data Thread will show the versatility and performance of Apache Arrow. We also hope to build a community of practitioners around the Apache Arrow ecosystem.

Join us on Thursday!