Voltron Data Logo
About

Learn more about our company

Contact Us

Get in touch with our team

Theseus

  • How it Works

    Core concepts and architecture overview

  • Control Plane

    Kubernetes deployment guide and best practices

  • Query Profiler

    Analyze and optimize query performance

Arrow
Loading...

In-memory columnar data processing

Ibis
Loading...

Python dataframe API for multiple backends

RAPIDS
Loading...

GPU-accelerated data science and analytics

Dev Blog

Latest updates and technical insights

Benchmarks Report

Read about our 2024 benchmarks for our data engine, Theseus.

The Composable Codex

A 5-part guide to understanding composable

Try Theseus

Product

  • How it Works
  • Control Plane
  • Query Profiler

Resources

  • Blog
  • Composable Codex
  • Benchmarks

Getting Started

  • Test Drive

Theseus

Built for AI workloads, Theseus is a high-performance SQL engine with GPU acceleration.

© 2025 Theseus. All rights reserved.

Terms of ServicePrivacy PolicyCookie Policy
Voltron Data Logo
About

Learn more about our company

Contact Us

Get in touch with our team

Theseus

  • How it Works

    Core concepts and architecture overview

  • Control Plane

    Kubernetes deployment guide and best practices

  • Query Profiler

    Analyze and optimize query performance

Arrow
Loading...

In-memory columnar data processing

Ibis
Loading...

Python dataframe API for multiple backends

RAPIDS
Loading...

GPU-accelerated data science and analytics

Dev Blog

Latest updates and technical insights

Benchmarks Report

Read about our 2024 benchmarks for our data engine, Theseus.

The Composable Codex

A 5-part guide to understanding composable

Try Theseus

From Laptop to Cloud: Ibis Connects With Your Data at Any Scale

K

Kae Suarez

March 9, 2023
From Laptop to Cloud: Ibis Connects With Your Data at Any Scale

We’ve hammered time and time again that a massive part of Ibis’s power is its flexibility. Its interface is good on its own, but the fact that it works with 15+ backends is what makes it truly exciting. However, what does this actually look like? What’s a possible application other than just deploying Ibis as an interface on your data center?

What about deploying it everywhere, at every step from idle experimentation to deployment?

Act 1: Developing Expressions with Ibis

Let’s say that you have a dataset that’s constantly growing, but maintains a feature set. For example, a database of products that have the same underlying traits: each one is a row. Executions on the server are powerful and costly, derived from months, if not years, of expert knowledge and careful usage. However, you have an idea that you want to try. Usually, you’d set up something locally, and do an analysis of some sort. Then, if you succeed, you’d pick up your work and translate it. However, with Ibis, there’s no need for translation, which actually opens up more opportunities for experimentation! Instead of trying to set up an environment that matches closely enough to save yourself the effort, you could even start an empty table in a backend like pandas or Polars — especially since you know your schema.

Ibis supports developing expressions on empty tables and will make sure your commands work with your schema. You could also generate fake data and pipe that into your table. Here, we’ll focus on the empty table method.

python
import ibis

con = ibis.pandas.connect()
# Alternatively, you could use con.from_dataframe() and Pandas to make
# fake_data
con.create_table("fake_data", schema=ibis.schema(<your schema here>))
t = con.table("fake_data")
# From here, you can do whatever you want -- we'll call your analytics code
# a1 and a2
a1 = t.<your analytics here>
a2.= t.<your analytics here>
t.groupby(<your group here>).aggregate([a1, a2])

Locally, we can confirm the code actually works — and if it seems interesting. If it is interesting, we could upgrade a bit.

Act 2: Moving to Small Data with Ibis

Let’s say our curiosity has yielded something interesting — but we don’t know yet if we should put it on our big, expensive resources. Luckily, we can just step up a bit, and use, say, a local instance of DuckDB using a subset of the data. How does that code look? We’ll assume we got the subset on our local disk already — maybe you keep some around!

python
import ibis
ibis.set_backend("duckdb")
t = ibis.read_parquet("subset.parquet")

# From here, you can do whatever you want -- we'll call your analytics code
# a1 and a2
a1 = t.<your analytics here>
a2.= t.<your analytics here>
t.groupby(<your group here>).aggregate([a1, a2])

Well, other than the initial load, it looks the same. This is the beauty of Ibis — and now, let’s say that this really seems to be worth it on the subset, and you show it to someone who can authorize further exploration. The proof of concept works, and you won’t be wasting any work hours moving further, so we’re ready for the big time.

Act 3: Flying from Laptop to Cloud with Ibis

Now, you’ve shown off a proof of concept, and have your Ibis code from the last two steps. Can you guess what will come next, on your proper data center?

python
import ibis
con = ibis.<your platform>.connect(<your URI>)
t = con.table(<your table>)

# From here, you can do whatever you want -- we'll call your analytics code
# a1 and a2
a1 = t.<your analytics here>
a2.= t.<your analytics here>
t.groupby(<your group here>).aggregate([a1, a2])

That’s right, the same code — and you already know it works from your smaller tests and can run it here with confidence. Now you’ve worked all the way up, with only writing code once, and safely testing all the way from your laptop to the data center, with real, meaningful artifacts and results made the whole way up.

Conclusion

That’s the same code, all the way down, saving precious time for actually targeting your goals, rather than spending time coding and recoding. Let’s take a look at alternatives:

  • Standing up identical development resources (Ibis handles the translation regardless of backend)
  • Writing code for each backend from dev to prod (something Ibis helps avoid).
  • Using a testbed version of the production environment (Not easy to set up locally and requires coordination with IT, but could easily be done on the cloud for cheap)

By using Ibis, you can use what you already have locally, quickly spin up cloud test environments and then bring your code up to production seamlessly. This way, you write code once, and spend fewer resources, freeing you from needing IT-related approval to stand up a test environment. Visit the Ibis project page for more resources and to install it.

If you’re working with Ibis and want to accelerate your success, learn how Voltron Data Enterprise Support can help you.

Photo by: Jonathan Meyer

Product

  • How it Works
  • Control Plane
  • Query Profiler

Resources

  • Blog
  • Composable Codex
  • Benchmarks

Getting Started

  • Test Drive

Theseus

Built for AI workloads, Theseus is a high-performance SQL engine with GPU acceleration.

© 2025 Theseus. All rights reserved.

Terms of ServicePrivacy PolicyCookie Policy