How Voltron Data Supports Business-Critical Applications of Apache Arrow
Ian Cook · Mar 02, 2022
Announcing an enterprise subscription service for companies building with Apache Arrow. Sign up.
An Arrow-native future
At Voltron Data, we envision an Arrow-native future. This is a future in which widely adopted open standards enable frictionless interoperability across the data analytics ecosystem. A future in which complex data architectures can be simplified as technologies converge. A future in which data access is fast and efficient. A future in which vendor lock-in is a thing of the past.
From the next big thing to a real big thing
Since its inception in 2016, the Apache Arrow project has made remarkable progress toward realizing this vision. The growing team of Apache Arrow developers has built the project into an indispensable collection of open source software, with implementations in a dozen languages and tens of millions of monthly downloads. Arrow has become the de facto standard for efficient in-memory analytics and fast data transport, providing an expansive toolbox of modular libraries that form a strong foundation for the next generation of analytical computing systems.
Getting down to business
As more and more developers and data practitioners have recognized the value of Arrow, its use in business-critical applications has grown. Innovative companies including Dremio, InfluxData, Snowflake, Streamlit, and Tellius have led the way. But we believe that the opportunity is much larger. The benefits of Arrow—better interoperability, simpler data architectures, greater speed and efficiency, more choice of tools, freedom from vendor lock-in—are broadly compelling. We want to help more businesses realize these benefits.
At Voltron Data, we’re in a unique position to achieve this: we’re the largest corporate contributor to Arrow, with co-creator Wes McKinney as our CTO and many of the core developers on our engineering staff. From the outset, we were certain that helping other businesses succeed with Arrow would be central to our mission—but we were less certain about which specific types of help were most needed. So we set out to discover what’s blocking broader adoption of Arrow in business today, and to find out what services we could offer to eliminate these blockers. We spoke with leaders and engineers across a spectrum of industries. Some of the stories we heard made intuitive sense, but others surprised us. Here’s what we learned.
Starting with awareness
Unsurprisingly, the adage “change starts with awareness” applies here—in spades. From its beginnings as little more than a specification for storing columnar data in memory, Arrow has grown to span multiple large communities and subprojects. Today, Arrow’s dozen language implementations offer a veritable Swiss Army knife of capabilities, including two full-fledged query execution engines. All this rapid development has sometimes outpaced our efforts to spread the word. If getting started with Arrow feels like an exercise in demystification, we need to do better.
In recent months, we’ve stepped up our efforts to increase awareness. The Arrow community recently introduced the Arrow Cookbook. Voltron Data is investing in building a talented team of Arrow-focused developer advocates. But we learned that we need to do more to spread the word within business communities. We need to help more platform architects identify the highest-impact applications of Arrow in enterprise data workflows. We need to help more corporate technology leaders recognize the value gained by replacing legacy solutions with Arrow-native equivalents. We need to help more professional developers understand that today Arrow is relevant in many more technology categories than it was in 2016—and that in the years ahead, we expect it to be relevant in many more.
Small fixes, outsize impact
Despite some obstacles to awareness, Arrow has gained a solid foothold in business applications. But when we spoke with teams using Arrow today, we learned that plans to expand its use were often blocked by shortcomings in Arrow, such as bugs, incompatibilities, and unimplemented features. Surprisingly, many of these shortcomings—which had profoundly blocked progress—turned out to be straightforward for the Arrow developers to fix. However, most of them had been completely off the radar of the Arrow development team. In some cases, the issues had never been reported in the Arrow project’s issue tracker. In other cases, the issues were reported, but without enough context for the developers to recognize their significance. Once equipped with the details and context about these issues, we were able to appropriately prioritize them, and in many cases to resolve them within days.
In retrospect, it’s not so surprising that teams applying Arrow in business can be inconsistent about reporting issues. These teams are often under pressure to complete projects. If they encounter an obstacle when using Arrow, they work around it, finding an alternative solution. Some generous users find the time to dig into the problem, open a detailed ticket, and sometimes contribute a solution—and for this we are immensely grateful—but we recognize that many users are not in a position to do this. We learned that if we want to receive actionable issue reports from a broader range of users, we should offer a more business-friendly option for issue reporting.
Moving slow and fast
Companies running Arrow in production are often slow to upgrade to new versions. Unless there is a specific reason to upgrade, some will continue running older versions of Arrow, sensibly preferring to minimize the disruption and costs that upgrades can entail. However, if an urgent need to upgrade emerges—for example, to fix a newly discovered bug or vulnerability—companies can be ready to move with impressive speed.
Unfortunately, this approach is at odds with the development and release practices of the Arrow project. Arrow has quarterly major releases plus occasional maintenance releases to fix critical bugs. Bugs are fixed only in the newest version of Arrow, so a company that encounters a bug when running an older major version of Arrow cannot fix it except by upgrading to a new major version or developing a custom patch, both of which can be risky. And a company that encounters a bug when running the newest version of Arrow might get it fixed within days, but then wait months for the fix to be shipped in the next release.
Some teams that we spoke with described how these limitations slowed their efforts to expand the use of Arrow. We learned that if we want more businesses to confidently deploy Arrow, we should help companies streamline Arrow upgrades, making it easier for them to stay current. And we should offer more options to companies whose needs are not adequately addressed by the official Arrow release process—options such as backports and fully verified rapid hotfixes.
Sharing the vision
For a casual developer, the decision to use Arrow is often a simple one: it’s in their toolbox and it fulfills an immediate need. But for a professional developer working on a business-critical application, the decision to depend on an open source library like Arrow is more complex. For companies, software development projects can be large upfront investments that slowly pay for themselves in the ensuing years. Companies are eager to future-proof these investments. Before adding a hard dependency on Arrow, companies want reassurances that the project will continue to be healthy in the future, that its releases will be stable, that its maintainers will be reliable, and that its roadmap will accommodate future needs.
The Arrow community welcomes questions about the future of the project. Guided by values of openness and transparency, we providea range of public venues in which anyone can ask questions like these. But many business leaders we spoke with were interested in engaging with us through private conversations. In business, transparent communication sometimes requires a non-disclosure agreement.
In confidential meetings with companies, we gained remarkable insights into fascinating real-world applications of Arrow that we would otherwise not have been privy to. We were able to directly answer questions, quickly brainstorm solutions to problems, and build trust with leaders and engineers. We learned that private meetings can be highly productive and mutually beneficial. Some of the insights gained in these meetings led to important improvements in Arrow, demonstrating that public benefits could arise from private meetings like these.
Another important difference between casual Arrow users and those who are applying Arrow in business-critical applications is that the latter often have budgets to spend. When companies see demonstrable value in a commercial service, they aren’t shy about allocating funds. Speaking with these companies, we learned that the value of unblocking important projects that depend on Arrow is often substantial.
But we recognize that business applications of Arrow often start small—as experiments driven by determined engineers with no funding and no executive support. If successful, these projects move from exploration to development and into production. By providing targeted resources at no cost to engineers working on early-stage projects, we can help to incubate valuable new applications of Arrow.
The subscription for success
Today we’re thrilled to announce an enterprise subscription service for companies building with Apache Arrow. Based on what we learned by speaking with leaders and engineers across a range of industries, we put together a focused set of services designed to accelerate business success with Arrow, available now as an annual subscription. We’re offering three editions tailored to the needs of companies at different stages of the journey with Arrow, including a free edition.
We’re honored to announce KNIME and G-Research as launch customers. We’re delivering value to these two leading companies that are innovating with Arrow to meet different needs in different industries, demonstrating the versatility of this subscription. We’re excited to bring our world-class team of Arrow engineers and visionaries to help meet the challenges of your business. For more information and to sign up,visit our enterprise subscription page.