Mar 13, 2023
Ibis 5.0 Preview: Three Features to Get Excited About
Kae Suarez, Phillip Cloud, and Jim Crist-Harif
Ibis is an open source project gaining adoption with developer and enterprise communities (and, by this point, you likely know we’re big fans). It helps users interface with data analytics stacks of various sizes and shapes, and we see it as a critical component for building modern data systems. If you haven’t already heard of it, Ibis is a Python interface for data analytics that provides access to 15+ compute backends, including Snowflake and Trino. By unifying all of these backends under one dataframe interface, Ibis makes analytics easier and more portable. We’ve talked about it a lot before, so we’ll let those articles do any further heavy lifting:
- Engine Agnostic Analytics with Ibis
- Ibis: Easy, Performant, and Portable Python API for Data Analytics
- Ibis Explained: Making DataFrames, Big and Small, More Delightful
The Ibis 5.0 release is coming up, and the team over at the Ibis Project is eager to show off its features — and we can see why. They recently released a handful of articles showcasing three upcoming features, which we’ll cover in this post.
Their first big feature is an expansion on the recently added selectors, which now enables execution over whole sets of columns (up to and including whole tables). Get their take on it here. We like this because it enables processes that are very common, such as normalization across columns. Furthermore, it enables subsetting with fewer lines, and even brings Ibis’s interface closer to pandas while not sacrificing the power of Ibis’s backends.
Author’s Note: I was one of the voices that requested this, and am very happy to see it featured in Ibis 5.0. With this, the interface is as good as I could ask for — but I’m sure there will be more.
Ibis’s interface offered file reading functionality in 4.0, and the backends could be addressed directly to handle file output. It would’ve been nicer, though, if it could all be done in Ibis’s interface…and now it can! As explained here, the Ibis interface will now have
to_csv(). This is a good decision — as powerful as it is to just use the database, files are portable. CSVs can power dashboards and be the de facto format for simple downloads, and Parquet can be served through powerful tools such as Arrow Flight SQL.
For any tool to work with data, there’s an onboarding phase; a time when the user just needs to get started. Often, this looks like finding some toy data to play with, figuring out how to get it into the tool space, then actually trying out the tool’s functionality. With the examples module coming in 5.0 (which the Ibis blog showcased here), learning goes faster! Soon, users will be able to issue a command to download any from a set of small datasets from the cloud, and immediately have them pulled into Ibis. This means anyone can get to their first Ibis expression faster than ever before.
Ibis 5.0 for Datacenters
At Voltron Data, we have a strong focus on enterprise and datacenter-scale applications. There’s plenty that can be accomplished with these new features, and we want to point out a couple of use cases we love:
- Selectors can enable easy feature engineering and selection for AI applications.
- To_file tools enable dashboards to easily serve data subsets.
Examples don’t exactly enable new analytics — but they can assist in onboarding new scientists!
To get started with Ibis and find out what else Ibis can do, visit the Ibis Project website. Ibis is just one high-impact open source initiative that we at Voltron Data support, so we’d love to show other ways you can revolutionize your stack, such as with Apache Arrow and Substrait. If you want help getting started or even pushing new horizons, check out our enterprise support options.
Photo by Vincent van Zalinge