Voltron Data Logo
About

Learn more about our company

Contact Us

Get in touch with our team

Theseus

  • How it Works

    Core concepts and architecture overview

  • Control Plane

    Kubernetes deployment guide and best practices

  • Query Profiler

    Analyze and optimize query performance

Arrow
Loading...

In-memory columnar data processing

Ibis
Loading...

Python dataframe API for multiple backends

RAPIDS
Loading...

GPU-accelerated data science and analytics

Dev Blog

Latest updates and technical insights

Benchmarks Report

Read about our 2024 benchmarks for our data engine, Theseus.

The Composable Codex

A 5-part guide to understanding composable

Try Theseus

Product

  • How it Works
  • Control Plane
  • Query Profiler

Resources

  • Blog
  • Composable Codex
  • Benchmarks

Getting Started

  • Test Drive

Theseus

Built for AI workloads, Theseus is a high-performance SQL engine with GPU acceleration.

© 2021-2025 Voltron Data, Inc. All rights reserved.

Terms of ServicePrivacy PolicyCookie Policy
Voltron Data Logo
About

Learn more about our company

Contact Us

Get in touch with our team

Theseus

  • How it Works

    Core concepts and architecture overview

  • Control Plane

    Kubernetes deployment guide and best practices

  • Query Profiler

    Analyze and optimize query performance

Arrow
Loading...

In-memory columnar data processing

Ibis
Loading...

Python dataframe API for multiple backends

RAPIDS
Loading...

GPU-accelerated data science and analytics

Dev Blog

Latest updates and technical insights

Benchmarks Report

Read about our 2024 benchmarks for our data engine, Theseus.

The Composable Codex

A 5-part guide to understanding composable

Try Theseus

Zero-Copy Sharing using Apache Arrow and Golang

M

Matthew Topol

July 20, 2023
Zero-Copy Sharing using Apache Arrow and Golang

Voltron Data is building open source standards that support a multi-language language future. Central to this vision is the Apache Arrow project.

To demonstrate Arrow’s cross-language capabilities, Matt Topol, Staff Software Engineer and author of "In-Memory Analytics with Apache Arrow", wrote a series of blogs covering the Apache Arrow Golang (Go) module. This series will help users get started with Arrow and Go and see how you can use both to build effective data science workflows.

This is the final post in our four-part series. Access the full series below:

  • Part 1: Use Apache Arrow and Go for Your Data Workflows
  • Part 2: Make Data Files Easier to Work With Using Golang and Apache Arrow
  • Part 3: Data Transfer with Apache Arrow and Golang

It’s time for the final post in our series to get you started with Apache Arrow and Golang. Our previous post covered how to efficiently send your data across the network using Arrow IPC and Arrow Flight RPC. In this post, we’re covering a different situation: sending data within the same process by sharing the memory directly without copying. Let’s hop down this rabbit hole!

Caring is Sharing… Your Local Memory

Okay, picture this: You have an awesome data utility that can handle a useful task, but it doesn’t have any way to be directly callable from your environment of choice (in our case Go, but this could be any language / environment). One way to do this might be to write your data out to some file and then use the utility to read that file, but if your data is large enough that can be very costly in disk space, memory, and CPU time. Alternatively, the utility can be a service you call, but that means you have to pay the cost to send the data across the network. What if we could hand the utility a pointer to the data and then use it “as-is” without any copying? With the Arrow C data interface, you can!

You can read more about the rationale and goals behind the C data interface in the Arrow docs, but the point I’m getting at is that the Go package provides utilities to both import and export data via this interface. The drawback is that it does require CGO which has a few caveats that I won’t get into here.

Let’s walk through a simple example: suppose you want to utilize DuckDB. Well, DuckDB has an Apache Arrow compatible interface that is exposed and uses the C data interface so that you can avoid extra copies of the results. One way you could utilize this is as follows:

First, we set up the necessary C flags to link against libduckdb.so and include the header:

go
// #cgo LDFLAGS: -lduckdb
// #include <duckdb.h>
import "C"

Then, we’ll have a function that accepts a query string and returns the results or an error. In a real situation, we’d want to store the pointers to the DuckDB connection and database, but for our purposes here we’ll just close them at the end of the function with defer.

go
import (
    ...
    "github.com/apache/arrow/go/v12/arrow/cdata"
    ...
)

func queryDuckDB(query string) (arrow.Array, error) {
    var (
        db C.duckdb_database
        cnxn C.duckdb_connection
        result C.duckdb_arrow
        dbpath string = ...
    )
		cpath := C.CString(dbpath)
    defer C.free(unsafe.Pointer(cpath))

    // in a real scenario, you'd keep the db and connection open longer
    // than just for the length of this function call, but this serves the
    // example fine
    if state := C.duckdb_open(cpath, &db); state == C.DuckDBError {
        return nil, errors.New("open error")
    }
    defer C.duckdb_close(&db)

    if state := C.duckdb_connect(db, &cnxn); state == C.DuckDBError {
        return nil, errors.New("connect error")
    }
    defer C.duckdb_disconnect(&cnxn)

    // now we can query the database!
    ...
}

Finally, we can send the query and import the result data without having to copy it: by using the pointers.

go
cquery := C.CString(query)
defer C.free(unsafe.Pointer(cquery))

state := C.duckdb_query_arrow(cnxn, cquery, &result)
if state == C.DuckDBError {
	return nil, errors.New("query error")
}
defer C.duckdb_destroy_arrow(&result)

// okay, now we can actually fetch the data!
var schema cdata.CArrowSchema
var arr cdata.CArrowArray

state := C.duckdb_query_arrow_schema(result,
     (*C.duckdb_arrow_schema)(unsafe.Pointer(&schema)))
if state == C.DuckDBError {
    return nil, errors.New("schema error")
}

state := C.duckdb_query_arrow_array(result, 
        (*C.duckdb_arrow_array)(unsafe.Pointer(&arr)))
if state == C.DuckDBError {
    cdata.ReleaseCArrowSchema(&schema)
    return nil, errors.New("array error")
}

_, arr, err := cdata.ImportCArray(&arr, &schema)
if err != nil {
    return nil, err
}

// the arrow.Array now owns the C allocated memory and ArrowArray's
// release callback will be called when the internal ArrayData's
// refcount goes to 0 and it is cleaned up
return arr, nil

Next Steps…

Hopefully at this point, I’ve presented a compelling case for utilizing Apache Arrow and Go to write useful utilities and services for manipulating data and/or building workflows! If you want to learn more, you’ve got a few options:

  • As mentioned, you can read the documentation on pkg.go.dev
  • Check out my book, “In-Memory Analytics with Apache Arrow” for many more examples and in-depth descriptions of the Arrow format and use cases. (Note: It also has a corresponding GitHub repository with all the code samples from the book, released under the MIT license.)
  • Check out ADBC if you want to be able to query various databases (like DuckDB) easily, with all the low-level work done for you already!

It has been a pleasure to present these Arrow and Golang examples with you. If you’re interested in learning more about how Voltron Data helps enterprises design and build data systems using projects like Arrow, you can learn about our approach here.

Product

  • How it Works
  • Control Plane
  • Query Profiler

Resources

  • Blog
  • Composable Codex
  • Benchmarks

Getting Started

  • Test Drive

Theseus

Built for AI workloads, Theseus is a high-performance SQL engine with GPU acceleration.

© 2021-2025 Voltron Data, Inc. All rights reserved.

Terms of ServicePrivacy PolicyCookie Policy