scala - How to encode string in protobuf using UnknownFieldSet

I am writing a raw protobuf message with the library com.google.protobuf, leveraging UnknownFieldSet and I am encountering a problem when encoding strings as they sometimes break the result.

I want to encode:

1 -> ["stuff", "stuff"]
2 -> ["stuff","android.microphone","stuff"]

which I figured can be done using the following code:

import com.google.protobuf.{ByteString, UnknownFieldSet}

// ....
def doEncoding() : UnknownFieldSet  = {
    UnknownFieldSet.newBuilder()
        .addField(1,UnknownFieldSet.Field.newBuilder()
          .addLengthDelimited(ByteString.copyFromUtf8("stuff"))
          .addLengthDelimited(ByteString.copyFromUtf8("stuff"))
          .build())
        .addField(2,UnknownFieldSet.Field.newBuilder()
          .addLengthDelimited(ByteString.copyFromUtf8("stuff"))
          .addLengthDelimited(ByteString.copyFromUtf8("android.microphone"))
          .addLengthDelimited(ByteString.copyFromUtf8("stuff"))
          .build())
        .build()
}

However, dumping the resulting bytes into a file using .toByteArray on the UnknownFieldSet and then reading the data using protod results in an unexpected data structure:

[0a] 1 string: (5) stuff (73 74 75 66 66)
[0a] 1 string: (5) stuff (73 74 75 66 66)
[12] 2 string: (5) stuff (73 74 75 66 66)
[12] 2 string: (18) android.microphone

    [61] 12 fixed64/double: 7867336003066946670 (0x6d2e64696f72646e) (8.381649661287266e+217)
    [69] 13 fixed64/double: 7308901739622527587 (0x656e6f68706f7263) (3.9466026192472086e+180)
[12] 2 string: (5) stuff (73 74 75 66 66)

The first array is fine, but the second is broken and contains data values never entered.

What am I doing wrong when adding the string to the raw protobuf?

I am writing a raw protobuf message with the library com.google.protobuf, leveraging UnknownFieldSet and I am encountering a problem when encoding strings as they sometimes break the result.

I want to encode:

1 -> ["stuff", "stuff"]
2 -> ["stuff","android.microphone","stuff"]

which I figured can be done using the following code:

import com.google.protobuf.{ByteString, UnknownFieldSet}

// ....
def doEncoding() : UnknownFieldSet  = {
    UnknownFieldSet.newBuilder()
        .addField(1,UnknownFieldSet.Field.newBuilder()
          .addLengthDelimited(ByteString.copyFromUtf8("stuff"))
          .addLengthDelimited(ByteString.copyFromUtf8("stuff"))
          .build())
        .addField(2,UnknownFieldSet.Field.newBuilder()
          .addLengthDelimited(ByteString.copyFromUtf8("stuff"))
          .addLengthDelimited(ByteString.copyFromUtf8("android.microphone"))
          .addLengthDelimited(ByteString.copyFromUtf8("stuff"))
          .build())
        .build()
}

However, dumping the resulting bytes into a file using .toByteArray on the UnknownFieldSet and then reading the data using protod results in an unexpected data structure:

[0a] 1 string: (5) stuff (73 74 75 66 66)
[0a] 1 string: (5) stuff (73 74 75 66 66)
[12] 2 string: (5) stuff (73 74 75 66 66)
[12] 2 string: (18) android.microphone

    [61] 12 fixed64/double: 7867336003066946670 (0x6d2e64696f72646e) (8.381649661287266e+217)
    [69] 13 fixed64/double: 7308901739622527587 (0x656e6f68706f7263) (3.9466026192472086e+180)
[12] 2 string: (5) stuff (73 74 75 66 66)

The first array is fine, but the second is broken and contains data values never entered.

What am I doing wrong when adding the string to the raw protobuf?

Share Improve this question asked Feb 15 at 14:07 Sim 4,1844 gold badges41 silver badges81 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 2

This is because Protobuf messages can be ambiguous and Protobufs rely on a schema (Protobuf) to disambiguate. Corollary: Multiple Protobuf schema may produce the same Protobuf message.

message X {
  string s = 1;
}

Using your preferred Protobuf SDK, the following message:

X{
  S: "android.microphone",
}

Marshals to (hex-encoded):

0a12616e64726f69642e6d6963726f70686f6e65

And using protoc to decode the message without a schema:

printf "0a12616e64726f69642e6d6963726f70686f6e65" \
| xxd -r -p \
| protoc --decode_raw

1 {
  12: 0x6d2e64696f72646e
  13: 0x656e6f68706f7263
}

These values match your fixed64/double values.

Using protoc with the schema, decodes the string correctly:

protoc --decode=X x.proto

x: "android.microphone"

You can corroborate this with Protobuf Decoder too using the hex-encoded output above.

This is unavoidable without a schema.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

scala - How to encode string in protobuf using UnknownFieldSet - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)