Ten Cache Misses

Crushing Haskell like a Tin Can

Generics and Protocol Buffers

We use Protocol Buffers extensively, and from talking to some folks at BayHac'12 it may be time to revisit the state of protobuf in Haskell.

To be fair, the protocol-buffers package is great. It’s extremely full featured, well tested and I can’t complain about the performance. But when most parties involved are running Haskell, maintaining separate .proto files is more than just a chore. Properly integrating the hprotoc preprocessor into a build system has also proven to be a challenge primarily due to the n:m mapping of source files to target modules.

After spending a little time this evening hacking around, I’ve come up with an alternate solution that looks promising and doesn’t require external files or additional build tools. Though it’s far from a production effort, the type-level version of the code is available on Github for all your forking needs.

Note: GHC 7.2 or up is required for Generic support.

So what does it look like?

By defining a set of types that allow tagging a record field with a field number…

1
2
3
newtype Required (n :: Nat) t = Required t
newtype Optional (n :: Nat) t = Optional t
newtype Packed   (n :: Nat) t = Packed t

and a few more to override the default base-128 varint encoding

1
2
newtype Fixed t  = Fixed t
newtype Signed t = Signed t

… should give you enough rope to write regular Haskell records that are efficiently (de)serialized with very little fuss. Create an annotated record, derive a Generic instance and you’re done.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE DeriveGeneric #-}

import Data.Hex (unhex)
import Data.Monoid (Last)
import Data.Serialize (runGet)
import Data.Text (Text)
import GHC.Generics

data TestRec = TestRec
  { field1 :: Required 1 (Last Int64)
  , field2 :: Optional 2 (Last Text)
  , field3 :: Optional 3 (Last Int64)
  } deriving (Generic, Show)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
*Pb> print $ runGet decodeMessage =<< unhex "089601120774657374696e67"
TestRec
  { field1 = Required (Last {getLast = Just 150})
  , field2 = Optional (Last {getLast = Just "testing"})
  , field3 = Optional (Last {getLast = Nothing})
  }

*Pb> print $ runGet decodeMessage =<< unhex "089601189701"
TestRec
  { field1 = Required (Last {getLast = Just 150})
  , field2 = Optional (Last {getLast = Nothing})
  , field3 = Optional (Last {getLast = Just 151})
  }

*Pb> print $ runGet decodeMessage =<< unhex "089601"
TestRec
  { field1 = Required (Last {getLast = Just 150})
  , field2 = Optional (Last {getLast = Nothing})
  , field3 = Optional (Last {getLast = Nothing})
  }

As you should expect in Haskell, changing a field to an unsupported type such as an Int will reward you with a nice (if not misleading) build break:

1
2
3
data TestRec = TestRec
  { field3 :: Optional 3 (Last Int)
  } deriving (Generic, Show)
1
2
3
4
5
6
7
Pb.hs:272:27:
    No instance for (Wire Int)
      arising from a use of `decodeMessage'
    Possible fix: add an instance declaration for (Wire Int)
    In the first argument of `runGet', namely `decodeMessage'
    In the first argument of `(=<<)', namely `runGet decodeMessage'
    In the expression: runGet decodeMessage =<< unhex "089601"

Update: 2/8/2013:

Steve and I are working on completing this work, check out our progress on Github.

Comments