Mar 23, 2023

How to Use Snowflake and Ibis for Better Analytics

Marlene Mhangami

Blog post description image

Today, data teams are growing faster than ever as businesses increasingly look to data analytics to help them make informed decisions. However, with the high number of data sources and formats, managing and processing large amounts of information can feel overwhelming. One solution to this problem is Snowflake.

Snowflake is a cloud-based data warehousing and analytics platform that offers a scalable solution for storing, processing, and analyzing data. It provides a convenient platform for data warehousing, data lakes, data engineering, and data application development. One of the key benefits of Snowflake is its cloud-native architecture, which allows it to seamlessly integrate with other cloud-based services and data sources. This makes it easy for businesses to centralize their data, while also reducing infrastructure and maintenance costs.

Why Use Ibis with Snowflake?

Snowflake is an excellent enterprise data storage solution that relies primarily on SQL for data analytics workflows. Some data engineers prefer using Python for data analysis, and SQL can be a barrier for them, particularly if they are not as experienced with writing good/performant SQL. These users might also feel more productive working from their local environment, in a Jupyter notebook for example.

Fortunately, by combining Snowflake and Ibis, engineers and data scientists get the readability of Python analytics with the scale and performance of SQL. So what is Ibis? Ibis is a pandas-like, Python library that provides a lightweight, universal interface for data wrangling. It helps Python users explore and transform data of any size, stored anywhere. With Ibis, data analysts and engineers can write Python code that generates SQL queries and pipelines. For Snowflake users, Ibis generates the SQL code necessary to query data stored in their Snowflake database and the results are returned natively in their local Python environment. Ibis also allows users to access the generated SQL code by using the show_sql command

Creating A Pricing Summary Report

As an example of what using Ibis and Snowflake together looks like, we’ll prepare a short pricing summary report. We’ll work with Ibis in a Jupyter notebook in our local IDE to connect to Snowflake and grab a table that has information about a business’s shipping, sales, and returns data. We’ll generate a pricing summary report that helps us understand how the business performed in a certain time interval. Let’s start by installing the Ibis-Snowflake backend using the code below.

install ibis-snowflake backend

!pip install 'ibis-framework[snowflake]'

Next, we’ll import all the necessary dependencies to get going and connect to the Snowflake database using ibis.connect.

The ibis.connect function takes in a URL that looks like this
"snowflake://user:pass@account/database/schema".

The parts of this url are made up of:

  • Snowflake username (listed under your profile tab)
  • Snowflake password (hopefully you know this ;))
  • Snowflake account:
    • You’ll find this under the Admin→Accounts tab in the menu. Clicking on the link next to the letters under the ‘Account’ section on this page will give you a URL that looks something like this https://abcdefg-rr01234.snowflakecomputing.com. Your account consists of all the letters and numbers before snowflakecomputing.com so in this case, it would be abcdefg-rr01234
  • Database name
  • Schema name

Once you have your URL, you can plug it in and run the following code to connect to the Snowflake database. This gives you access to all the tables available in your chosen database.

import dependencies and connect to the Snowflake backend

import ibis 
from ibis import _
ibis.options.interactive = True

con = ibis.connect(
    "snowflake://user:pass@account/database/schema"
)

For this example, we’ll use the Snowflake-provided ‘snowflake_sample_data’ database. From this database, we’ll use the lineitem table. We can connect to the table by running this line of code.

connect to the lineitem table

lineitem = con.tables.lineitem

We’ll now use Ibis to generate a short report that shows the amount of business that was billed, shipped, and returned as of a given date. Our query will list the totals for extended price, discounted extended price, price with tax, average quantity, average price, and average discount. These aggregates are grouped by RETURNFLAG and LINESTATUS and listed in ascending order of the same. A count of the number of line items in each group is included.

create the report

report = lineitem.aggregate(
    by=['L_RETURNFLAG','L_LINESTATUS'],
    sum_qty = _.L_QUANTITY .sum(),
    sum_base_price = _.L_EXTENDEDPRICE.sum(),
    sum_disc_price = (_.L_EXTENDEDPRICE * (1-_.L_DISCOUNT)).sum(),
    sum_charge = (_.L_EXTENDEDPRICE * (1-_.L_DISCOUNT) * (1+_.L_TAX)).sum(),
    avg_qty = _.L_QUANTITY .mean(), 
    avg_price = _.L_EXTENDEDPRICE.mean(),
    avg_disc = _.L_DISCOUNT.mean(),
    count_order = _.count()
).filter(lineitem.L_SHIPDATE <'1998-12-01').order_by(['L_RETURNFLAG','L_LINESTATUS'])

report
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ L_RETURNFLAG  L_LINESTATUS  sum_qty         sum_base_price   sum_disc_price     sum_charge           avg_qty         avg_price       avg_disc        count_order ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ !string!stringdecimal(38, 2)decimal(38, 2)decimal(38, 4)decimal(38, 6)decimal(12, 2)decimal(12, 2)decimal(12, 2)int64       │
├──────────────┼──────────────┼────────────────┼─────────────────┼───────────────────┼─────────────────────┼────────────────┼────────────────┼────────────────┼─────────────┤
│ A            │ F            │    37734107.0056586554400.7353758257134.870055909065222.82769025.5238273.130.051478493 │
│ N            │ F            │      991417.001487504710.381413082168.05411469649223.19437525.5238284.470.0538854 │
│ N            │ O            │    76633518.00114935210409.19109189591897.4720113561024263.01377925.5038248.020.053004998 │
│ R            │ F            │    37719753.0056568041380.9053741292684.604055889619119.83193225.5138250.850.051478870 │
└──────────────┴──────────────┴────────────────┴─────────────────┴───────────────────┴─────────────────────┴────────────────┴────────────────┴────────────────┴─────────────┘

Ibis uses Rich under the hood so tables are presented in a clear way. For more perspective, here’s a side-by-side comparison of the SQL query needed to generate the same results and our Ibis query. For Python users, Ibis should feel more intuitive and lightweight. This entire workflow was also completed in a Jupyter notebook in a local IDE.

Ibis and SQL code side by side comparasion

Conclusion

In summary, Snowflake is a powerful data warehousing and analytics platform, and using it with Ibis provides software engineers and data scientists more flexibility. By using Ibis, data teams can unlock the full potential of their analytics workflows, while still benefiting from the scalability, reliability, and security of Snowflake’s cloud-based infrastructure.

If you’re interested in building a data analytics system that uses the power of Snowflake and Ibis, don’t hesitate to reach out to our team at hello@voltrondata.com. We specialize in helping data teams leverage these technologies to gain valuable insights from their data and make informed decisions.

Photo by Kelly