Ten Cache Misses

Crushing Haskell like a Tin Can

The Great GHC Primop Shootout

I’ve terribly hacked glued together a few microbenchmarks comparing the LLVM primop madness against a more traditional FFI and a native Haskell deserializer. The FFI “parser” used here is a bit contrived as it’s primarly measuring Storable and FFI overhead and does no parsing, but it’s interesting all the same. Send pull requests with a more full featured implementation and I’ll update this post accordingly.

As always, the code for this post is available on Github.

  • primop: The Clang mangled primop parser, multiple return results are passed in STG registers
  • lotsa: Each return result (out parameter) is allocated and marshalled individually
  • justOne: A Storable instance is used to marshal the return results all at once
  • cereal: A native Haskell deserializer using the cereal package

The chart was lifted from a much larger report, courtesy of Criterion.