Mar 09, 2023

From Laptop to Cloud: Ibis Connects With Your Data at Any Scale

Kae Suarez

photo of skyscrapper captured from ground looking to sky

We’ve hammered time and time again that a massive part of Ibis’s power is its flexibility. Its interface is good on its own, but the fact that it works with 15+ backends is what makes it truly exciting. However, what does this actually look like? What’s a possible application other than just deploying Ibis as an interface on your data center?

What about deploying it everywhere, at every step from idle experimentation to deployment?

Act 1: Developing Expressions with Ibis

Let’s say that you have a dataset that’s constantly growing, but maintains a feature set. For example, a database of products that have the same underlying traits: each one is a row. Executions on the server are powerful and costly, derived from months, if not years, of expert knowledge and careful usage. However, you have an idea that you want to try. Usually, you’d set up something locally, and do an analysis of some sort. Then, if you succeed, you’d pick up your work and translate it. However, with Ibis, there’s no need for translation, which actually opens up more opportunities for experimentation! Instead of trying to set up an environment that matches closely enough to save yourself the effort, you could even start an empty table in a backend like pandas or Polars — especially since you know your schema.

Ibis supports developing expressions on empty tables and will make sure your commands work with your schema. You could also generate fake data and pipe that into your table. Here, we’ll focus on the empty table method.

import ibis

con = ibis.pandas.connect()
# Alternatively, you could use con.from_dataframe() and Pandas to make
# fake_data
con.create_table("fake_data", schema=ibis.schema(<your schema here>))
t = con.table("fake_data")
# From here, you can do whatever you want -- we'll call your analytics code
# a1 and a2
a1 = t.<your analytics here>
a2.= t.<your analytics here>
t.groupby(<your group here>).aggregate([a1, a2])

Locally, we can confirm the code actually works — and if it seems interesting. If it is interesting, we could upgrade a bit.

Act 2: Moving to Small Data with Ibis

Let’s say our curiosity has yielded something interesting — but we don’t know yet if we should put it on our big, expensive resources. Luckily, we can just step up a bit, and use, say, a local instance of DuckDB using a subset of the data. How does that code look? We’ll assume we got the subset on our local disk already — maybe you keep some around!

import ibis
ibis.set_backend("duckdb")
t = ibis.read_parquet("subset.parquet")

# From here, you can do whatever you want -- we'll call your analytics code
# a1 and a2
a1 = t.<your analytics here>
a2.= t.<your analytics here>
t.groupby(<your group here>).aggregate([a1, a2])

Well, other than the initial load, it looks the same. This is the beauty of Ibis — and now, let’s say that this really seems to be worth it on the subset, and you show it to someone who can authorize further exploration. The proof of concept works, and you won’t be wasting any work hours moving further, so we’re ready for the big time.

Act 3: Flying from Laptop to Cloud with Ibis

Now, you’ve shown off a proof of concept, and have your Ibis code from the last two steps. Can you guess what will come next, on your proper data center?

import ibis
con = ibis.<your platform>.connect(<your URI>)
t = con.table(<your table>)

# From here, you can do whatever you want -- we'll call your analytics code
# a1 and a2
a1 = t.<your analytics here>
a2.= t.<your analytics here>
t.groupby(<your group here>).aggregate([a1, a2])

That’s right, the same code — and you already know it works from your smaller tests and can run it here with confidence. Now you’ve worked all the way up, with only writing code once, and safely testing all the way from your laptop to the data center, with real, meaningful artifacts and results made the whole way up.

Conclusion

That’s the same code, all the way down, saving precious time for actually targeting your goals, rather than spending time coding and recoding. Let’s take a look at alternatives:

Standing up identical development resources (Ibis handles the translation regardless of backend)
Writing code for each backend from dev to prod (something Ibis helps avoid).
Using a testbed version of the production environment (Not easy to set up locally and requires coordination with IT, but could easily be done on the cloud for cheap)

By using Ibis, you can use what you already have locally, quickly spin up cloud test environments and then bring your code up to production seamlessly. This way, you write code once, and spend fewer resources, freeing you from needing IT-related approval to stand up a test environment. Visit the Ibis project page for more resources and to install it.

If you’re working with Ibis and want to accelerate your success, learn how Voltron Data Enterprise Support can help you.

Photo by: Jonathan Meyer