Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | CAFiTN-sySfmVMiiNEGo8QOirmzXw3P4n8+DUXnt85=eNNfpD4w@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: [HACKERS] Custom compression methods
|
List | pgsql-hackers |
On Tue, Oct 27, 2020 at 10:54 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Oct 22, 2020 at 5:56 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, Oct 22, 2020 at 10:41 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Thu, Oct 22, 2020 at 2:11 AM Tomas Vondra > > > <tomas.vondra@2ndquadrant.com> wrote: > > > > > > > > On Wed, Oct 21, 2020 at 01:59:50PM +0530, Dilip Kumar wrote: > > > > >On Sat, Oct 17, 2020 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > >> > > > > >> On Tue, Oct 13, 2020 at 10:30 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > >> > > > > > >> > On Mon, Oct 12, 2020 at 7:32 PM Tomas Vondra > > > > >> > <tomas.vondra@2ndquadrant.com> wrote: > > > > >> > > > > > > >> > > On Mon, Oct 12, 2020 at 02:28:43PM +0530, Dilip Kumar wrote: > > > > >> > > > > > > > >> > > >> ... > > > > >> > > > > > > > >> > > >I have worked on this patch, so as discussed now I am maintaining the > > > > >> > > >preserved compression methods using dependency. Still PRESERVE ALL > > > > >> > > >syntax is not supported, I will work on that part. > > > > >> > > > > > > > >> > > > > > > >> > > Cool, I'll take a look. What's your opinion on doing it this way? Do you > > > > >> > > think it's cleaner / more elegant, or is it something contrary to what > > > > >> > > the dependencies are meant to do? > > > > >> > > > > > >> > I think this looks much cleaner. Moreover, I feel that once we start > > > > >> > supporting the custom compression methods then we anyway have to > > > > >> > maintain the dependency so using that for finding the preserved > > > > >> > compression method is good option. > > > > >> > > > > >> I have also implemented the next set of patches. > > > > >> 0004 -> Provide a way to create custom compression methods > > > > >> 0005 -> Extention to implement lz4 as a custom compression method. > > > > > > > > > >In the updated version I have worked on some of the listed items > > > > >> A pending list of items: > > > > >> 1. Provide support for handling the compression option > > > > >> - As discussed up thread I will store the compression option of the > > > > >> latest compression method in a new field in pg_atrribute table > > > > >> 2. As of now I have kept zlib as the second built-in option and lz4 as > > > > >> a custom compression extension. In Offlist discussion with Robert, he > > > > >> suggested that we should keep lz4 as the built-in method and we can > > > > >> move zlib as an extension because lz4 is faster than zlib so better to > > > > >> keep that as the built-in method. So in the next version, I will > > > > >> change that. Any different opinion on this? > > > > > > > > > >Done > > > > > > > > > >> 3. Improve the documentation, especially for create_compression_method. > > > > >> 4. By default support table compression method for the index. > > > > > > > > > >Done > > > > > > > > > >> 5. Support the PRESERVE ALL option so that we can preserve all > > > > >> existing lists of compression methods without providing the whole > > > > >> list. > > > > > > > > > >1,3,5 points are still pending. > > > > > > > > > > > > > Thanks. I took a quick look at the patches and I think it seems fine. I > > > > have one question, though - toast_compress_datum contains this code: > > > > > > > > > > > > /* Call the actual compression function */ > > > > tmp = cmroutine->cmcompress((const struct varlena *) value); > > > > if (!tmp) > > > > return PointerGetDatum(NULL); > > > > > > > > > > > > Shouldn't this really throw an error instead? I mean, if the compression > > > > library returns NULL, isn't that an error? > > > > > > I don't think that we can throw an error here because pglz_compress > > > might return -1 if it finds that it can not reduce the size of the > > > data and we consider such data as "incompressible data" and return > > > NULL. In such a case the caller will try to compress another > > > attribute of the tuple. I think we can handle such cases in the > > > specific handler functions. > > > > I have added the compression failure error in lz4.c, please refer > > lz4_cmcompress in v9-0001 patch. Apart from that, I have also > > supported the PRESERVE ALL syntax to preserve all the existing > > compression methods. I have also rebased the patch on the current > > head. > > I have added the next patch to support the compression options. I am > storing the compression options only for the latest compression > method. Basically, based on this design we would be able to support > the options which are used only for compressions. As of now, the > compression option infrastructure is added and the compression options > for inbuilt method pglz and the external method zlib are added. Next, > I will work on adding the options for the lz4 method. In the attached patch set I have also included the compression option support for lz4. As of now, I have only supported the acceleration for LZ4_compress_fast. There is also support for the dictionary-based compression but if we try to support that then we will need the dictionary for decompression also. Since we are only keeping the options for the current compression methods, we can not support dictionary-based options as of now. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: