Apr 07, 2022

Apache Arrow New Contributor’s Guide

Alenka Frim

Close Up Photo of People Holding Puzzle Pieces

Contributing to open source projects, like Apache Arrow,is a unique experience that can help grow your knowledge and even create social connections. Open source depends on the participation of the greater community. We are excited that you are here! In this post we will provide a high level overview of how you can get involved as acontributor to the Apache Arrow community.

Before Getting Started

First, a little bit of background. Voltron Data is a major contributor to Apache Arrow. In fact, one of our founders is an original developer of Arrow. We believe in the framework so much that it’s at the core of the tools we are building! This is why we want to support anyone and everyone who wants to be involved in this project.

The New Contributor’s Guide, published with Arrow version 7.0.0, is a new resource that includes step-by-step instructions on how to start contributing to Arrow. From setting up the tools and finding good first issues to addressing a simple documentation correction or a harder PyArrow bug fix that touches the C++ library, the guide can help!

Now, let’s break down the contribution process found in the guide. The graphic below provides a preview of what you can expect and illustrates the steps a bit more clearly.

Set Up and Build

The first step to contributing is to set up all the necessary tools needed to build and work with the codebase. You will be working directly with the source code, so you need to set up Git, fork the Arrow repository and build the library.

The Apache Arrow project is quite large and includes many different languages. The build process may seem daunting but don’t be discouraged, the community is here to help you. Any difficulty you come across is an opportunity to improve the documentation. Reporting a build problem or a missing part in the documentation is important as it helps enhance the content so future contributors can have a smoother experience.

With this, you have accomplished the most important step, developing a stable environment, which will help with productivity later on. Congratulations! Now that the setup is complete and your development version of the library is working, you can begin your search for an issue to work on.

Using JIRA and Finding a Good First Issue

Finding a good first issue to address can be a bit like creating art, but it’s also part of the fun.The Apache Arrow projectuses JIRA to track the work so you may need to create an account to sign in. The guide will help you navigate the dashboard. There is also information about using labels and components of the project to avoid redundancy and make finding issues easier for other contributors.

Once you become familiar with how the project uses JIRA, you should be able to find a good issue to tackle. If you’re unsure of where to begin, add a comment to the tickets asking for more information about the issues that seem interesting to you. You may already have a bug or a feature you’d like to tackle in mind, so turn to the guide for details about how to create a new JIRA issue.

Making the Contribution

Once an issue has been identified and your environment is set up, you can start the really fun part! You may need to explore the codebase and then find your way around as you go. Reading the codebase will help you see how things are written, commented on, connected and just the overall structure. Go ahead, poke around a bit to get your bearings.

Once you’ve made your code changes and you have verified that the change work in your local environment, you will need to create tests that cover the new code to make sure anything added doesn’t break Arrow. You may need to create several of these just to be sure. After the tests are complete and you have tested with a working branch, you will want to double check that your code matches the style guide before creating your first contribution.🎉

The Arrow project uses Git for version control and a workflow based on pull requests. That means that you contribute the changes to the code by creating a pull request against the official Arrow repository. Arrow maintainers will be notified when a pull request is created and they will get to it as soon as possible.

After the pull request is submitted, you will discuss your changes with Arrow’s maintainers. The discussion occurs in the form of comments and suggestions in GitHub.

One Final Note & Additional Resources

Hopefully, this summary will make getting started feel a bit more manageable. We also like to encourage all new contributors to read the Arrow community’s Code of Conduct.The Apache Arrow Foundation strives to create a welcoming community where communication is respectful toward other contributors. Following The Apache Way, the Arrow community has been working hard to make contributing to the project easier by improving the clarity of the documentation and updating it regularly.

Arrow has also started offering tutorials that can provide a clearer picture of the contribution process by language. For now, the tutorials cover adding functionality to R and Python bindings for the Arrow C++ library. We hope to add more tutorials that highlight work on the C++ portions of the library in the future.

Finally, check out Arrow’s Cookbook with recipes for C++, R and Python (and Java coming soon). These recipes for Arrow libraries can aid your use of the codebase for different projects or research. For additional information on the contribution process, see: Contributing Documentation, Continuous Integration, and Reviewing Contributions.

And, don’t forget, if you’re working within the Apache Arrow ecosystem, we’re here to support you. Check out Voltron Data Enterprise Support subscription options today.

Have fun, and good luck!