Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | 20200623200042.gkzftoz5n6kn6lgh@alap3.anarazel.de Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Custom compression methods
|
List | pgsql-hackers |
Hi, On 2020-06-23 14:27:47 -0400, Robert Haas wrote: > On Mon, Jun 22, 2020 at 4:53 PM Andres Freund <andres@anarazel.de> wrote: > > > Or maybe we add 1 or 2 "privileged" built-in compressors that get > > > dedicated bit-patterns in the upper 2 bits of the size field, with the > > > last bit pattern being reserved for future algorithms. (e.g. 0x00 = > > > pglz, 0x01 = lz4, 0x10 = zstd, 0x11 = something else - see within for > > > details). > > > > Agreed. I favor an approach roughly like I'd implemented below > > https://postgr.es/m/20130605150144.GD28067%40alap2.anarazel.de > > I.e. leave the vartag etc as-is, but utilize the fact that pglz > > compressed datums starts with a 4 byte length header, and that due to > > the 1GB limit, the first two bits currently have to be 0. That allows to > > indicate 2 compression methods without any space overhead, and > > additional compression methods are supported by using an additional byte > > (or some variable length encoded larger amount) if both bits are 1. https://postgr.es/m/20130621000900.GA12425%40alap2.anarazel.de is a thread with more information / patches further along. > I think there's essentially no difference between these two ideas, > unless the two bits we're talking about stealing are not the same in > the two cases. Am I missing something? I confused this patch with the approach in https://www.postgresql.org/message-id/d8576096-76ba-487d-515b-44fdedba8bb5%402ndquadrant.com sorry for that. It obviously still differs by not having lower space overhead (by virtue of not having a 4 byte 'va_cmid', but no additional space for two methods, and then 1 byte overhead for 256 more), but that's not that fundamental a difference. I do think it's nicer to hide the details of the compression inside toast specific code as the version in the "further along" thread above did. The varlena stuff feels so archaic, it's hard to keep it all in my head... I think I've pondered that elsewhere before (but perhaps just on IM with you?), but I do think we'll need a better toast pointer format at some point. It's pretty fundamentally based on having the 1GB limit, which I don't think we can justify for that much longer. Using something like https://postgr.es/m/20191210015054.5otdfuftxrqb5gum%40alap3.anarazel.de I'd probably make it something roughly like: 1) signed varint indicating "in-place" length 1a) if positive, it's "plain" "in-place" data 1b) if negative, data type indicator follows. abs(length) includes size of metadata. 2) optional: unsigned varint metadata type indicator 3) data Because 1) is the size of the data, toast datums can be skipped with a relatively low amount of instructions during tuple deforming. Instead of needing a fair number of branches, as the case right now. So a small in-place uncompressed varlena2 would have an overhead of 1 byte up to 63 bytes, and 2 bytes otherwise (with 8 kb pages at least). An in-place compressed datum could have an overhead as low as 3 bytes (1 byte length, 1 byte indicator for type of compression, 1 byte raw size), although I suspect it's rarely going to be useful at that small sizes. Anyway. I think it's probably reasonable to utilize those two bits before going to a new toast format. But if somebody were more interested in working on toastv2 I'd not push back either. Regards, Andres
pgsql-hackers by date: