Feb 28, 2023

Ibis: Easy, Performant, and Portable Python API for Data Analytics

Fernanda Foertter

long exposure photography work

Imagine this: Some VPs at Big Mart Corp get together and create a fantastic idea that could potentially save millions of dollars. What if we used data from Purchasing, Sales, and Warehouse to improve the efficiency of moving goods and time the purchase of raw materials? Sounds great, but there’s just one problem: to analyze this complicated dance between space, inventory, and sales cycles, it needs to connect to multiple data sources. You’re just the person they asked to get it done.

What if you could write pandas-like analytics and run it at the source?

With Ibis, you can!

In the current world, a team of experts from each department would be assembled, they would meet to decide what data is needed, and how to do it. But each source connects in a different way, each department may even house data in multiple locations. To amortize the impact on business and their respective teams, you’ll likely accomplish this in phases. In the end, the solution would involve some data duplication, monolithic downloads, and loads of CSV files. Not to mention maintaining code written specifically for that Hadoop, SQL, or Spark datasource, maybe even in different languages. As complexity grows it becomes harder to maintain, a major team member wins the lottery and leaves and wrote everything in Rust and no one left behind knows how to keep this going. Sound familiar? Well, there’s another way with Ibis.

All You Need is One

Because Ibis compiles and generates code for many backends, the only language needed is Python. This makes maintaining the codebase very easy. Ibis currently supports 15 backends and counting!

Diagram showing the backends with which Ibis interacts. Ibis is in the center and connects with 15 different backends on the periphery including among others DuckDB, Impala, Snowflake, SQLite, Polars

Reduce Technical Debt

Using a single language simplifies things, but Ibis is also growing and evolving, adding more backends every day. Instead of learning different interfaces or making sure packages don’t conflict with one another, Ibis makes application maintenance and development a breeze.

Diagram comparing side-by-side a situation without Ibis where the development is siloed: each backend requires a different language to interact with the data (SQL for PostgreSQL, PySpark for a Spark Cluster, and Snowflake to access a Data Lake). With Ibis the same code can be re-used against multiple backends.

Fly Like a Bird and Leave the Bear Behind.

Queries shipped by Ibis run at the source. This means typical analytics can be run far away at the source, and only the results are shipped back to the application, in memory. No need to bring everything to pandas either, but still enjoy the pandas-like syntax. And Ibis performs just as well as queries written by hand for the native database engine.

#pandas
df['col1']
#ibis
table_expr['col1']
#pandas
df[['col', 'val']].groupby(['col']).sum()
#ibis
from ibis import _
#the underscore (_) API allows us to reference parent expressions
table_expr.groupby(['col']).aggregate(_['val'].sum())

From Prototype, to Dev, to Prod with a Simple Switch

Start coding right away and use pandas dataframes as a backend, fill it with dummy data to prototype the work. When ready, move onto a dev environment where perhaps some simple postgres databases were created for testing. When ready, deploy to prod and connect to the big SQL database. All of these steps only require a backend change and the rest of your code remains the same. Yes, it’s that Incredible.

Illustration showing that with Ibis, going from Prototyping to Dev, to Prod, only requires changing one line of code. The rest of the codebase can stay the same

Expressive and Extensible

Ibis provides a user-facing API that allows users to define operations. For example, create reductions or translate string dates to Julian. With Ibis, you can add new operations, optimizations, and custom APIs.

import ibis.expr.datatypes as dt
import ibis.expr.rules as rlz
from ibis.expr.operations import ValueOp

class JulianDay(ValueOp):
    arg = rlz.string

    output_dtype = dt.float32
    output_shape = rlz.shape_like('arg')

Join the Community

Voltron Data is a major contributor to Ibis, which was developed by one of our co-founders and CTO, Wes McKinney. Join us to help make Ibis the future of easy, performant, and portable data analytics.