Jan 06, 2023
Inside Ibis: Contributors Weigh In Ahead of the 4.0 Release
It is no secret that we’re big supporters of the Ibis project. We believe in the value Ibis brings to the ecosystem and actively support the open source maintainers who are working on the project. Over the past two years, Ibis has significantly grown in popularity. From 2016 - 2022 Ibis had a total of 8,931,985 downloads total. Over 8.2 Million of those downloads happened in the last two years (without including downloads through Conda).
Downloads table 2016-2022 (Datasource: PyPI Big Query Dataset)
We caught up with key Ibis maintainers and contributors to learn about the progress made last year and what’s in store for the Ibis 4.0 release slated for January 2023. During our conversations, one sentiment was abundantly clear: the team wants to grow the community and see more individual contributions over the next year. Read on to get an inside look at what it’s like to work on Ibis.
First, how would you describe Ibis to someone who hasn’t heard of it?
Ibis is an engine-agnostic Python API for accessing, transforming, and writing tabular data.
Depends on their existing context. For R folks, it's simliar to dplyr / dbplyr, but for Python. For Python people, I would describe it as a pythonic interface for data analysis that executes against a variety of modern SQL engines.
Ibis to me is like PySpark without the Spark. It allows you to use a DSL in Python for analytics - abstracting the back-end. This will help future-proof your analytics workflow for when (not if) you migrate database back-ends. Teams will not have to change code for data platform migrations in the future.
Tell us about a rewarding contribution you made last year.
The removal of intermediate expressions; it required several supplemental changes even before the actual refactor could begin.
Removing the materialize API was probably the most challenging thing I've done this year. Automating releases and reducing CI time while increasing the number of backends are other notable things.
I added the DuckDB backend — that was my first major contribution.
I added a PR which by default instantiates the testing backends once per Pytest invocation. I loved the experience of being reviewed by the Ibis team. The feedback was detailed and specific.
Phillip Cloud is the lead maintainer for Ibis and his work is particularly helpful for contributors to Ibis. Automating releases and reducing CI times not only makes contributing easier, but increases support for future technologies and backends. From simple documentation changes to adding a new backend to massive and complicated refactors, an improved CI and helpful feedback from the team makes contributing to Ibis a breeze.
What’s in store for the next Ibis release?
A revamped Ibis core to enable more advanced features as well as multiple new backends (like Snowflake or Polars).
There's a new Polars backend, an mssql backend and an ibis.read API for getting up and running quickly. Also, the BigQuery backend is now in the main repo again. We’ll support decompiling Ibis expressions to equivalent Python code, and a fancy, new rich table repr for interactive mode.
I see very clear pipelines for free and open source software users — we have Apache Arrow for file storage, the Arrow execution engine (Acero) to read those files, and Ibis to execute queries on them using the Substrait backend. Students and low-budget projects can set these pipelines up to get modern processing power and vast community support for free.
As Ibis’ core continues to be revamped, we are already seeing changes such as the addition of the Polars and Snowflake backends that Phillip and Krisztian mentioned. Additionally, as ibis-substrait matures, it will become easier than ever to utilize Substrait consumers like Acero through Python, creating those free and open source data pipelines that can be immensely valuable to research and development.
Where do you hope Ibis will be a year from today?
I hope that recursive CTE's will be coming soon — so that Ibis can tackle hierarchical aggregation with a pure Ibis solution. I also hope that more teams will replace their PySpark, Scala, and SQL analytical workflows with Ibis workflows to future proof their data solutions.
I hope to see more community contributions. The Ibis team is very friendly and contributing always feels welcome and rewarding. The maintainers openly share helpful feedback that not only pushes the project forward but has helped grow in my understanding of the Python language.
I think its documentation could use more love. I hope to see the team invest in building resources that the community can use to learn or contribute to Ibis.
Get involved with Ibis
Ibis is implementing open source standards so people can develop and contribute to the advancement of the technology without worrying about learning new APIs. If you’re interested in contributing, check out the “Contribution” docs on the Ibis page and be sure to review the Code of Conduct. We think you’ll discover that this is an exciting project and community with serious growth potential.