Mar 14, 2023

Shopping for a Data Warehouse? Put Workloads to the Test with Ibis

Kae Suarez

photo of two skyscrapers connected by bridge
note icon
TL;DR Ibis allows you to experience most of the leading data lake and data warehouse backend engines with nominal code changes, and we think that’s a lot more exciting than it may initially seem.

At one point or another, you have the opportunity to pick the underlying tools that will handle your data and push your business forward. There are plenty of high-quality options, with their own advantages in regard to each other. Between marketing material and anecdotes from peers, you get a general idea of which solutions are popular, and most of the solutions out there offer enough functionality for general use.

However, advertised features aren’t enough to help with the decision that would best fit a specific use case. You may be left wondering how a certain data warehouse will behave with your data, given your specific access patterns and analytical needs. Testing each requires a steep learning curve and may take months. With the recent explosion of data lake and data warehouse options, optimizing the decision requires testing each one. This takes time and effort to engage with the vendor, consultants, and staff — an effort that may go past necessary business deadlines.

Luckily, we already have the technology that allows for testing many of these systems while lowering the learning curve. And what if the interface to all of these backends could be standard? What if we were able to run the same code on all these backends and get to see the experience for each instead of having to be locked in immediately? Ibis offers this.

Let’s look at how the Ibis Project can save you time and money by testing over 15+ backends with a very small change in code.

Ibis: The Portable, Performant Python API

Ibis provides a standard interface to several powerful execution backends. Even better, Ibis is a Pythonic API, instead of yet another attempt to better standardize SQL. This opens up a world of possibilities — you can write all your code using Python, and custom strings or interfaces to other backends can be all but disregarded. Ibis handles all the translation and communication for you.

This frees developers from needing to learn a vendor-specific interface, especially since, when shopping around for a new backend, this isn’t necessarily the final choice. Because Ibis generates optimized code, it also frees developers from needing to climb these learning curves to effectively get the work done.

Explore the Data Warehouse Marketplace with Ibis

Instead of creating different code examples to test backends, now you can use Ibis to develop a single source that has the most basic functionality needed for the workload being tested. With the same codebase, you now have access to test over 15 (and counting!) backends to find the one that fits your needs.

Ibis gives you the freedom to see how your workload would perform and scale on some of the most popular backends today such as Snowflake and PySpark. Now the marketplace turns from hypothetical to practical, because you can test your analytics on cheaply-initialized platforms, whether they are local or on a remote trial basis. Without spending engineering-weeks for each individual platform, you go to each with the same code and can perform relevant tests and benchmarking concurrently.

Avoid Vendor Lock-in

If you’re familiar with the state of the field, you know that, between these products, the interface needed to accomplish the same goal can be very different. For a fair comparison, one would need to make their code for each execution engine somewhat similar to have a good basis for comparison. The beauty of Ibis is that it can translate what you want to the best option in the chosen backend.

Once you decide on the best technology to use, you can continue to use Ibis as your interface. Because it’s a Python API and easy to maintain, if you need to migrate data to a new backend, things will move smoothly. Ibis gives you peace of mind that your application remains robust regardless of where your data lives. When a new technology appears, just move on over to the next best thing.

In a future article, we will show this scenario in practice, testing a real workflow across a couple of the back-ends.

If you want to get started with Ibis right away, check out the Ibis Project’s GitHub — and consider getting started with their community there, too! It’s an exciting piece of open source software that we love to talk about. If you’re already working with the project and want to go further, see how Voltron Data Enterprise Support can help accelerate your success with Ibis.

Photo by: John Towner