A varint implementation for PG? - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | A varint implementation for PG? |
Date | |
Msg-id | 20191210015054.5otdfuftxrqb5gum@alap3.anarazel.de Whole thread Raw |
Responses |
Re: A varint implementation for PG?
Re: A varint implementation for PG? |
List | pgsql-hackers |
Hi, I several times, most recently for the record format in the undo patchset, wished for a fast variable width integer implementation for postgres. Using very narrow integers, for space efficiency, solves the space usage problem, but leads to extensibility / generality problems. Other cases where I wanted a variable width integer encodings are, non exhaustively: variable-width TIDs, on-disk representation for a smaller SQL level integer, WAL format, wire protocol, ... I spent a few days experimenting with different variable length encodings for unsigned and signed integers. My goal is/was to find an encoding and a prototype that has reasonably low space overhead and that is very quick to encode/decode. The probably most common encoding is often called LEB128. It basically uses 7 bits in each byte for data, and one bit as a "continuation" bit. While not bad space-wise, it's not great for decoding (but also encoding) speed. The problem is a) that there's a branch for each byte, which will result in poorly predictable code, b) the 7 byte base needs to be changed to an 8 byte base (not as expensive as the branching issue, but quite noticably). I think a better encoding is something similar to UTF-8. That is, encoding the byte length in unary in the leading byte, with the data following that prefix (plus the separator ending the prefix). That allows to avoid per-byte branching during decoding. In contrast to UTF-8, in my prototype I decided to encode the prefix in unary as the number of leading 0 bits - the main advantage is that that makes it possible to use instructions like "count leading zeroes" to determine the length (otherwise one needs to invert the data first). To encode negative numbers, I chose to use what's often called "zigzag" encoding. One cannot just naively use the above encoding for signed integers, as otherwise two's complement negative numbers would always be of the maximum length (due to the leading 1 bit(s)). Instead the to-be-encoded number is multiplied by two (i.e. <<1), and the sign bit is stored at the lowest bit; negative numbers are additionally stored with bit bits inverted. The only disadvantage of that encoding that I am aware of is that encoded signed varints cannot just be memcmp'd (which they could with e.g. a single sign bit). Alternatively one could store the sign directly after the "separator bit" (the one ending the unary length indicator). But then there's the problem that the number has leading data after the indicator. It's also a bit harder to write the code for that encoding. I wrote the code for that first, and I think it's very likely worth going for the simpler approach. In my benchmarks encoding + decoding a varint in this way costs approximately 23 cycles with my intel skylake cpu. I think pipelining is hiding a bit of the latency, but not too much. As the benchmark currently is stupid and just encodes all 32bit integers, it's overly friendly to the branch predictor however. Even with those caveats, I think that's a pretty good result. Other encodings were more expensive. And I think there's definitely some room for optimization left. I've pushed a repo containing my initial implementation of the above to https://github.com/anarazel/varint_experiment and attached a tarball containing it. The header file contains a more detailed description of the format. So far I've only implemented uint64_t and int64_t encoding/decoding functions. It'd definitely make sense to implement a 4 byte variant, and potentially also one supporting larger lengths. I think it'd make sense to have a variant support length up to 64bit long unary length indicator (i.e. 64bytes of data (potentially minus one bit)). If data lengths longer than that are required for a use case, it probably is better to either a) use the max-representable 8 byte integer as an indicator that the length is stored or b) sacrifice another bit to represent whether the integer is the data itself or the length. Not sure if this is worthwhile. Do others see use in this? If so, does the encoding outlined above sound reasonable? I'm using a variant of this for a proposal for a different undo record format, but I really hope this would be interesting for others too. Greetings, Andres Freund
Attachment
pgsql-hackers by date: