Buf + Pants = ❤️
What is Protocol Buffers?
I’m glad you asked! Protocol Buffers, or protobufs for short, allow you to structure
and serialize data in a forward-compatible and backward-compatible way. Think of
it like JSON, but smaller and faster. Your protobufs (stored in .proto
files)
can then be compiled and code for your language of choice generated, allowing you
to interact with your data with the same ease as any other object in your codebase -
regardless if it’s Python, Go or something else.
Imagine we want to create a protobuf to represent a user in our system. Here’s what
a simple project/user.proto
would look like, ignoring the rather horrible formatting
for now:
syntax = "proto3";
message User
{
int32 id = 1;
string username = 2;
string email = 3;
bool active = 4;
enum Rank {
ADMIN = 0;
MOD = 1;
REGULAR = 2;
}
Rank rank = 5;
}
Normally, at this stage, we’d have to invoke protoc
and turn this protobuf into
code for our language(s). Luckily for us, Pants handles that automatically,
so we can jump straight into coding! Here are two Python examples, one for writing
and one for reading:
from project.user_pb2 import User
user = User(
id=1,
username="foobar",
email="[email protected]",
active=True,
rank=User.Rank.REGULAR,
)
print(user.SerializeToString()) # b'\x08\x01\x12\x06foobar\x1a\[email protected] \x01(\x02'
import sys
from project.user_pb2 import User
user = User()
with open(sys.argv[1], "rb") as f:
user.ParseFromString(f.read())
print(user.username) # foobar
Pretty neat, huh? Since Protobufs are forward-compatible and backward-compatible,
we could extend our protobuf with a string password_hash = 6;
field and the
code samples above would continue to work, even if we end up deploying an updated
version of only one of the code samples. If only the writer is updated, then the
reader will simply ignore the unknown password_hash
field, and if we only update
the reader it will simply use the default value for password_hash
’s type (an
empty string in this case).
All good so far? Sweet, but let’s buf it up a notch!
Introducing Buf #
As of 2.11, Pants now has support for buf
,
a CLI tool for working with Protocol Buffers. More specifically Pants supports
buf
’s formatting and linting
capabilities. To enable them, simply add "pants.backend.codegen.protobuf.lint.buf"
to backend_packages
in your pants.toml
.
Remember the not-so-pretty protobuf we defined above? After running ./pants fmt project/user.proto
,
it looks like this:
syntax = "proto3";
message User {
int32 id = 1;
string username = 2;
string email = 3;
bool active = 4;
enum Rank {
ADMIN = 0;
MOD = 1;
REGULAR = 2;
}
Rank rank = 5;
}
Much better! With automated formatting we don’t have to waste time and energy to
decide how .proto
files should be formatted. Did I mention there’s support for
linting as well? Let’s run ./pants lint project/user.proto
!
project/user.proto:1:1:Files must have a package defined.
project/user.proto:10:5:Enum value name "ADMIN" should be prefixed with "RANK_".
project/user.proto:10:5:Enum zero value name "ADMIN" should be suffixed with "_UNSPECIFIED".
project/user.proto:11:5:Enum value name "MOD" should be prefixed with "RANK_".
project/user.proto:12:5:Enum value name "REGULAR" should be prefixed with "RANK_".
Yikes! Buf’s linting capabilities enforces consistency and keeps our protobufs in
line with best practices. One of the things it wants us to do in the example above
is to prefix our enum value names with the name of the enum (e.g. RANK_ADMIN
instead of just ADMIN
). This is because protobufs use C++ scoping rules, making
it impossible for two enums to have the same enum value names. There are also a
couple of other errors in there as well, if you’re interested in the explaination
for them (and others) you can check out Buf’s well written and explanatory
documentation on the subject.
As mentioned before, protobufs are forward-compatible and backward-compatible - but it’s up to the developer to avoid making breaking changes, and using Buf’s linter will make sure that your protobufs are structured and written in a consistent and thought through way, allowing you to extend and work with with them much easier now as well as in the future.
My Pants experience #
During my almost ten years at Snowfall, one of my proudest achievements has been to take the company’s CI/CD from barely existing to what others have called state of the art. About a year ago we improved that setup further by throwing out a bunch of custom tools and scripts that required both energy and time to maintain and replaced them with Pants.
There were some things we had and did that Pants wasn’t capable off yet though,
and one of them was buf
’s linting. Now, if Pants was a proprietary, closed
software we’d send in a feature request and hope for the best, but luckily it’s
not! Fine. Okay, you got me. I did open a feature request
at first anyway, but you have to start somwhere.
I eventually took the time to look into it further though. Writing a new linter for Pants, how hard can it be? Turns out, not hard at all! Since I had already been dabbling with Pants for a while, I knew the core concepts, and reading through the plugin documentation as well as looking at existing linters gave me enough knowledge to get a proof of concept working. I had some issues with transitive dependencies as well as source roots, but the Pants maintainers and the rest of the community was very welcoming and willing to help out (extra shout out to Eric Arellano!), so it didn’t take long before those issues were sorted.
All in all, support for buf lint
took a few days to complete, while support for buf format
took an evening. It was a very pleasant experience and whetted my appetite for
contributing more code to Pants, so who knows what the future will bring!