Re: pglz performance - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: pglz performance |
Date | |
Msg-id | 20190805072625.pzebnrs3fvk5wjen@alap3.anarazel.de Whole thread Raw |
In response to | Re: pglz performance (Petr Jelinek <petr@2ndquadrant.com>) |
Responses |
Re: pglz performance
|
List | pgsql-hackers |
Hi, On 2019-08-05 00:19:28 +0200, Petr Jelinek wrote: > It carries that information inside the compressed value, like I said in the > other reply, that's why the extra byte. I'm not convinced that that is a good plan - imo the reference to the compressed data should carry that information. I.e. in the case of toast, at least toast pointers should hold enough information to determine the compression algorithm. And in the case of WAL, the WAL record should contain that. Consider e.g. adding support for slice fetching of datums compressed with some algorithm - we should be able to determine whether that's possible without fetching the datum (so we can take a non-exceptional path for datums compressed otherwise). Similarly, for WAL, we should be able to detect whether an incompatible compression format is used, without having to invoke a generic compression routine that then fails in some way. Or adding compression reporting for WAL to xlogdump. I also don't particularly like baking in the assumption that we don't support tuples larger than 1GB in further places. To me it seems likely that we're going to have to fix that, and it's hard enough already... I know that my patch did that too... For external datums I suggest encoding the compression method as a distinct VARTAG_ONDISK_COMPRESSED, and then have that include the compression method as a field. For in-line compressed values (so VARATT_IS_4B_C), doing something roughly like you did, indicating the type of metadata following using the high bit sounds reasonable. But I think I'd make it so that if the highbit is set, the struct is instead entirely different, keeping a full 4 byte byte length, and including the compression type header inside the struct. Perhaps I'd combine the compression type with the high-bit-set part? So when the high bit is set, it'd be something like { int32 vl_len_; /* varlena header (do not touch directly!) */ /* * Actually only 7 bit, the high bit determines whether this * is the old compression header (if unset), or this type of header * (if set). */ uint8 type; /* * Stored as uint8[4], to avoid unnecessary alignment padding. */ uint8[4] length; char va_data[FLEXIBLE_ARRAY_MEMBER]; } I think it's worth spending some effort trying to get this right - we'll be bound by design choices for a while. It's kinda annoying that toast datums aren't better designed. Creating them from scratch, I'd make it something like: 1) variable-width integer describing the "physical length", so that tuple deforming can quickly determine length - all the ifs necessary to determine lengths are a bottleneck. I'd probably just use a 127bit encoding, if the high bit is set, there's a following length byte. 2) type of toasted datum, probably also in a variable width encoding, starting at 1byte. Not that I think it's likely we'd overrun 256 types - but it's cheap enough to just declare the high bit as an length extension bit. These are always stored unaligned. So there's no need to deal with padding bytes having to be zero to determine whether we're dealing with a 1byte datum etc. Then, type dependant: For In-line uncompressed datums 3a) alignment padding, amount determined by 2) above, i.e. we'd just have different types for different amounts of alignment. Probably using some heuristic to use unaligned when either dealing with data that doesn't need alignment, or when the datum is fairly small, so copying to get the data as unaligned won't be a significant penalty. 4a) data For in-line compressed datums 3b) compression metadata {varint rawsize, varint compression algorithm} 4b) unaligned compressed data - there's no benefit in keeping it aligned For External toast for uncompressed data: 3d) {toastrelid, valueid, varint rawsize} For External toast for compressed data: 3e) {valueid, toastrelid, varint compression_algorithm, varint rawsize, varint extsize} That'd make it a lot more extensible, easier to understand, faster to decode in a lot of cases, remove a lot of arbitrary limits. Yes, it'd increase the header size for small datums to two bytes, but I think that'd be more than bought back by the other improvements. Greetings, Andres Freund
pgsql-hackers by date: