"IT is like a shark, if you stop moving you will die!"

Vladimir Dejanović

Protocol Buffers basic stuff you need to know

11 Jul 2018     8 min read

Protocol Buffers is binary protocol, it was developed in Google and made publicly available. First publicly available version was Protocol Buffers version 2. Most recent implementation at the time of writing this article is Protocol Buffers version 3. Version 1 was never publicly available.

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data. Protocol buffers currently support generated code in Java, Python, Objective-C, C++, C#, JS and more. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

Reasons to look at Protocol Buffers

You might ask yourself why would I even look at or consider Protocol buffers, so let us take a look at few reason.

Size of Data

Fact that Protobuf is binary protocol gives it some very nice characteristics. By default it's throughput is much higher compared to textual protocols, for example JSON and XML. This shouldn’t come as a surprise. However, difference in throughput between protocol buffers and JSON, for example, can be made smaller by compressing data when sent using textual protocols, and some other tricks can be used. If this things are done Protocol buffers will still be better in throughput, but not with so large margin, so my suggestion would be to always do it in case you are using textual formats for transfering data.

CPU load

Another important thing that need to be taken into account is how CPU intensive is marshaling and unmarshaling of data. Textual protocols will be much more CPU intensive by default, and this isn't easily addressed, as for example difference in throughput.

If you are thinking that maybe this isn't such a big thing, think for a second about your use case. How much data do you transfer and who is consuming and producing your data. If you are hosting your servers in some cloud and are paying by the CPU, then this might have impact on your bill. Mobile devices aren't powerful as our computers with CPU, and any intensive CPU workload will drain their batteries. In both of this cases binary protocol as Protobuf is a good choice for better performance and user experience.

Usage

Now that we know few reasons why you should learn more about protocol buffers let us take a look how it looks like and how we can use it.

In case of protocol buffers all start with schema or to be more precise proto file. If this proto file is used as part of gRPC then there are maven and gradle plugins to automatically generate classes and similar for us. In case that protocol buffers will be used as a stand alone then we can generate all classes and similar manually using protocol buffers compiler.

Init proto file

Let us look at simple proto file

syntax = "proto3";

package xyz.itshark.blog.protobuf;

option java_package = "xyz.itshark.blog.protobuf.generated";
option java_multiple_files = true;

Here we are defining syntax as proto3 - Protocol buffers Version 3.

In second line we are saying that all generated classes and rest should be part of package “xyz.itshark.blog.protobuf”.

Next two lines add some specific behaviour in case we generate java code. In case of java code we will use different package “xyz.itshark.blog.protobuf.generated”, also we say that we want multiple files generated instead of single class containing all classes defined in proto file.

First message

Let us add this to our proto file

message Example {
     int32 id = 1;
     string first_name = 2;
     string last_name = 3;
}

This is simple example of message in protobuf. It’s name is Example and it consists of three fields: id, first_name and last_name. If you know C or C++ this might look very similar to struct.

First field has type int32 and name id. Also, there is something strange at the end, sign “=” followed by number 1. All fields have this, only different numbers. This numbers are “tags” and they are used to position values inside binary message for faster and easier serialization and deserialization of data. It is very important to remember that once some number is used in some message, it can never be changed or reused. If for example after some time we decide that we don’t want to have id in our message we can change message definition to look like this

message Example {
     string first_name = 2;
     string last_name = 3;
     Int32 new_id = 4;
}

However, we can’t do something like this

message Example {
     string not_valid = 1;
     string first_name = 2;
     string last_name = 3;
}

Since it will produce different message with same tags and would crash any application which is still using old proto file.

Second field in our example message is of type string and has name first_name. If you are from java world, you will notice that this isn’t camel case, proper way of naming data in Java. This is standard way of naming fields in Protocol buffers. Protocol buffers compiler will parse proto file and generate code with appropriate syntax for language at hand. Don’t forget that protobuf is language agnostic and used in different languages which have different syntax logic for naming classes, attributes and similar.

All fields in message Example are optional. To be more precise all fields in case of Protocol Buffer Version 3 are optional. In case of version 2 it was needed to explicitly mark fields as optional or as mandatory. Problem with mandatory fields was backward compatibility and fact that they could never be removed. After analyzing a lot of use cases, it was decided that mandatory fields are causing more harm than good in the long run, and in version 3 all fields were made optional by default.

Generate initial code

You need to download protobuf compiler for you system from this location https://github.com/google/protobuf/releases , once you install it on your system just run this command and you will get generated Java code ready to be used in your project

$ protoc example.proto --java_out=~\code\blog\

Advanced Message

Let us look at some little more advanced message now

List/Arrays

One thing that you might ask yourself is how to define list or array of elements inside message. In order to do so you just need to add repeated in front of field definition

message Advanced {
 repeated string text = 4;
}
Enum

Lot of programing languages have enums, you can also define them in Protocol Buffers. To define enum you just need to add something like this

enum Status {
SUCCESS = 0;
            FAIL = 1;
            RANDOM = 2;
}

After this you can use it as any other type

message Advanced {
 repeated string text = 4;
             Status my_status = 3;
}
Message in Message

You can add messages also inside other messages. For example in our case we defined message Example, so we can add field in message Advanced of type Example

message Advanced {
 repeated string text = 4;
         Status my_status = 3;
         Example message_example = 5;
}

You can define message inside message and in that way make it use “private”, only inside that message.

message Example2 {
message Internal {
       string text = 1;
        }

Internal valid_message = 1;
}

Message Invalid {
           Internal can_not_be_used_here = 1;
}

First message from this example (Example2) is valid message, while second message (Invalid) is not valid as stated by its name, since Internal can be used only inside Example2.

Full proto file

syntax = "proto3";

package xyz.itshark.blog.protobuf;

option java_package = "xyz.itshark.blog.protobuf.generated";
option java_multiple_files = true;

enum Status {
    SUCCESS = 0;
    FAIL = 1;
    RANDOM = 2;
}

message Example {
    int32 id = 1;
    string first_name = 2;
    string last_name = 3;
}

message Advanced {
    repeated string text = 4;
    Status my_status = 3;
    Example message_example = 5;
}

message Example2 {
message Internal {
string text = 1;
        }

Internal valid_message = 1;
}

Usage of Protocol Buffers in java code

Once we have our proto file ready, and we generate java code, next step is to use it in combination with our code. If you used protoc to generate code, you can just copy it to appropriate place in your project. Once this is done you can have some code like this to create instance of Example object

        Example example = Example.newBuilder()
                .setId(1)
                .setFirstName("First")
                .setLastName("Last")
                .build();

As you can see we are using builders to generate instances of Protocol Buffers messages. To generate Advanced message we can use some code like this.

        Advanced advanced = Advanced.newBuilder()
                .setMyStatus(Status.RANDOM)
                .addText("some text")
                .addText("other text")
                .setMessageExample(example)
                .build();

And this is all there is to it.

Resources