Feb 21, 2023
Running an Arrow Flight SQL Server and Querying Data with JDBC and ADBC
Philip Moore, Tom Drabas, David Li
In our previous blog post, we discussed how Apache Arrow Flight SQL allows any compliant database to share data in the Apache Arrow columnar format while avoiding the speed penalty of serialization/deserialization bottlenecks.
In this post, we will show you how to run a Flight SQL server using a DuckDB backend. We will also query it with JDBC (for tool compatibility) and the brand-new, performance-optimized ADBC (Arrow Database Connectivity) drivers.
Running the Arrow Flight SQL Server
To get started, you can run the secured Flight SQL server (all the code is stored here) by running our pre-built Docker container image. By default, the container mounts a very small (scale factor 0.01) sample TPC-H database that you can query from a JDBC or ADBC client.
If you have Docker installed (and running) - you can run the server by opening a terminal and running the following command:
docker run --name flight-sql \
--detach \
--rm \
--tty \
--init \
--publish 31337:31337 \
--env FLIGHT_PASSWORD="testing123" \
--pull missing \
voltrondata/flight-sql:latest
Interacting with the Flight SQL Server via JDBC
Next, we will connect from our host computer to the locally running Docker Flight SQL server. We will use the Apache Arrow Flight SQL JDBC driver, which you can download here. To connect, use the following JDBC string; if you changed the password value when starting the server you will need to match the value specified for the FLIGHT_PASSWORD environment variable:
jdbc:arrow-flight-sql://localhost:31337?useEncryption=true&user=flight_username&password=testing123&disableCertificateVerification=true
Many SQL IDE tools support loading 3rd party JDBC drivers so feel free to use your tool of choice.
For instructions on how to use DBeaver Community Edition with the Arrow Flight SQL JDBC driver - see this repo.
Note: while the data is transferred between the Flight SQL server and JDBC client is in Apache Arrow format, JDBC of course uses Java types. JDBC requires conversion to/from the native Arrow columnar format to a row-based format JDBC is familiar with. Flight SQL JDBC can still be faster than a non-Arrow-based protocol/JDBC driver, because it avoids converting data on the server and takes advantage of Arrow Flight RPC for fast data transfer, optimizing 2 of 3 performance opportunities.
But, performance can be even better with ADBC.
Interacting with the Arrow Flight SQL Server via the New Python ADBC Flight SQL Driver
Unlike JDBC, ADBC uses Arrow data everywhere, avoiding data conversions. For more details, you can see our intro to ADBC blog post. To learn more about the advantages of using ADBC with Flight SQL, you can read our blog post here.
To connect to your running Flight SQL server via ADBC in Python follow these steps:
1- Ensure you have Python 3.9 or above installed
2- Setup a Python virtual environment, activate it, and install the driver:
# Create a Python virtual environment
python3 -m venv ./venv
# Activate the virtual environment
. ./venv/bin/activate
# Install the requirements including the new Arrow ADBC Flight SQL driver
pip install --upgrade pip
pip install pandas pyarrow adbc_driver_flightsql
# Start the python interactive shell
python
3- The following code queries the Flight SQL server (specifically the “nation” table). You can run it interactively by pasting each line into the console:
import adbc_driver_flightsql.dbapi as flight_sql
flight_password = "testing123" # Use an env var in production code!
with flight_sql.connect(uri="grpc+tls://localhost:31337",
db_kwargs={"username": "flight_username",
"password": flight_password,
"adbc.flight.sql.client_option.tls_skip_verify": "true" # Not needed if you use a trusted CA-signed TLS cert
}
) as conn:
with conn.cursor() as cur:
cur.execute("SELECT n_nationkey, n_name FROM nation WHERE n_nationkey = ?",
parameters=[24]
)
x = cur.fetch_arrow_table()
print(x)
You should see the following output:
pyarrow.Table
n_nationkey: int32
n_name: string
----
n_nationkey: [[24]]
n_name: [["UNITED STATES"]]
Congratulations! You just queried your Arrow Flight SQL server using the Python ADBC Flight SQL driver!
Tear down
After you are done testing, you can stop the Flight SQL docker container with the following terminal command:
docker stop flight-sql
The ADBC Flight SQL driver is DB-API compliant, so it can be used by Python frameworks such as Ibis and SQLAlchemy for reading from and writing data to a Flight SQL server.
Just like JDBC drivers, ADBC will soon be integrated by front-end tools, giving you the benefits of a GUI with the performance of Arrow. We are excited about the future ADBC will bring.
Summary
In this article, we showed how to run an Arrow Flight SQL server and connect via two driver standards: JDBC and the new Python ADBC Flight SQL driver.
- For more information about Apache Arrow - see the Arrow homepage.
- For more information about Arrow Flight SQL server, see this page.
- For more information about ADBC - see the ADBC homepage.
A Voltron Data Enterprise Support subscription can help accelerate your success with Apache Arrow (including Flight SQL and ADBC) even further!
Photo by Jerry Zhang