Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | CAFiTN-v3soZKaYtR2ig43t4haJJx3FZXMd2hDaj3E1mtSjwJPg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Wed, Jun 24, 2020 at 5:30 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Jun 23, 2020 at 4:00 PM Andres Freund <andres@anarazel.de> wrote: > > https://postgr.es/m/20130621000900.GA12425%40alap2.anarazel.de is a > > thread with more information / patches further along. > > > > I confused this patch with the approach in > > https://www.postgresql.org/message-id/d8576096-76ba-487d-515b-44fdedba8bb5%402ndquadrant.com > > sorry for that. It obviously still differs by not having lower space > > overhead (by virtue of not having a 4 byte 'va_cmid', but no additional > > space for two methods, and then 1 byte overhead for 256 more), but > > that's not that fundamental a difference. > > Wait a minute. Are we saying there are three (3) dueling patches for > adding an alternate TOAST algorithm? It seems like there is: > > This "custom compression methods" thread - vintage 2017 - Original > code by Nikita Glukhov, later work by Ildus Kurbangaliev > The "pluggable compression support" thread - vintage 2013 - Andres Freund > The "plgz performance" thread - vintage 2019 - Petr Jelinek > > Anyone want to point to a FOURTH implementation of this feature? > > I guess the next thing to do is figure out which one is the best basis > for further work. I have gone through these 3 threads and here is a summary of what I understand from them. Feel free to correct me if I have missed something. #1. Custom compression methods: Provide a mechanism to create/drop compression methods by using external libraries, and it also provides a way to set the compression method for the columns/types. There are a few complexities with this approach those are listed below: a. We need to maintain the dependencies between the column and the compression method. And the bigger issue is, even if the compression method is changed, we need to maintain the dependencies with the older compression methods as we might have some older tuples that were compressed with older methods. b. Inside the compressed attribute, we need to maintain the compression method so that we know how to decompress it. For this, we use 2 bits from the raw_size of the compressed varlena header. #2. pglz performance: Along with pglz this patch provides an additional compression method using lz4. The new compression method can be enabled/disabled during configure time or using SIGHUP. We use 1 bit from the raw_size of the compressed varlena header to identify the compression method (pglz or lz4). #3. pluggable compression: This proposal is to replace the existing pglz algorithm, with the snappy or lz4 whichever is better. As per the performance data[1], it appeared that the lz4 is the winner in most of the cases. - This also provides an additional patch to plugin any compression method. - This will also use 2 bits from the raw_size of the compressed attribute for identifying the compression method. - Provide an option to select the compression method using GUC, but the comments in the patch suggest to remove the GUC. So it seems that GUC was used only for the POC. - Honestly, I did not clearly understand from this patch set that whether it proposes to replace the existing compression method with the best method (and the plugin is just provided for performance testing) or it actually proposes an option to have pluggable compression methods. IMHO, We can provide a solution based on #1 and #2, i.e. we can provide a few best compression methods in the core, and on top of that, we can also provide a mechanism to create/drop the external compression methods. [1] https://www.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: