Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id CAFiTN-vp=or=1x=DjttV=WfLPsSWnKUVecL2PFGJHOqrrXTsrg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Mon, Oct 5, 2020 at 11:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Oct 5, 2020 at 3:37 AM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>
> Thanks, Tomas for your feedback.
>
> > 9) attcompression ...
> >
> > The main issue I see is what the patch does with attcompression. Instead
> > of just using it to store a the compression method, it's also used to
> > store the preserved compression methods. And using NameData to store
> > this seems wrong too - if we really want to store this info, the correct
> > way is either using text[] or inventing charvector or similar.
>
> The reason for using the NameData is the get it in the fixed part of
> the data structure.
>
> > But to me this seems very much like a misuse of attcompression to track
> > dependencies on compression methods, necessary because we don't have a
> > separate catalog listing compression methods. If we had that, I think we
> > could simply add dependencies between attributes and that catalog.
>
> Basically, up to this patch, we are having only built-in compression
> methods and those can not be dropped so we don't need any dependency
> at all.  We just want to know what is the current compression method
> and what is the preserve compression methods supported for this
> attribute.  Maybe we can do it better instead of using the NameData
> but I don't think it makes sense to add a separate catalog?
>
> > Moreover, having the catalog would allow adding compression methods
> > (from extensions etc) instead of just having a list of hard-coded
> > compression methods. Which seems like a strange limitation, considering
> > this thread is called "custom compression methods".
>
> I think I forgot to mention while submitting the previous patch that
> the next patch I am planning to submit is, Support creating the custom
> compression methods wherein we can use pg_am catalog to insert the new
> compression method.  And for dependency handling, we can create an
> attribute dependency on the pg_am row.  Basically, we will create the
> attribute dependency on the current compression method AM as well as
> on the preserved compression methods AM.   As part of this, we will
> add two build-in AMs for zlib and pglz, and the attcompression field
> will be converted to the oid_vector (first OID will be of the current
> compression method, followed by the preserved compression method's
> oids).
>
> > 10) compression parameters?
> >
> > I wonder if we could/should allow parameters, like compression level
> > (and maybe other stuff, depending on the compression method). PG13
> > allowed that for opclasses, so perhaps we should allow it here too.
>
> Yes, that is also in the plan.   For doing this we are planning to add
> an extra column in the pg_attribute which will store the compression
> options for the current compression method. The original patch was
> creating an extra catalog pg_column_compression, therein it maintains
> the oid of the compression method as well as the compression options.
> The advantage of creating an extra catalog is that we can keep the
> compression options for the preserved compression methods also so that
> we can support the options which can be used for decompressing the
> data as well.  Whereas if we want to avoid this extra catalog then we
> can not use that compression option for decompressing.  But most of
> the options e.g. compression level are just for the compressing so it
> is enough to store for the current compression method only.  What's
> your thoughts?
>
> Other comments look fine to me so I will work on them and post the
> updated patch set.

I have fixed the other comments except this,

> 2) I see index_form_tuple does this:
>
>     Datum  cvalue = toast_compress_datum(untoasted_values[i],
>                                          DefaultCompressionMethod);

> which seems wrong - why shouldn't the indexes use the same compression
> method as the underlying table?

I will fix this in the next version of the patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Rahila Syed
Date:
Subject: Re: [PATCH] Automatic HASH and LIST partition creation
Next
From: Pavel Borisov
Date:
Subject: Re: [PATCH] Automatic HASH and LIST partition creation