Home > mailing lists

Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From	Dilip Kumar
Subject	Re: [HACKERS] Custom compression methods
Date	October 5, 2020 08:47:28
Msg-id	CAFiTN-t9nb2rRdL3uauRPgv5rAP-yBuAKQKfY=jiRApSmDC4MQ@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Custom compression methods (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: [HACKERS] Custom compression methods Re: [HACKERS] Custom compression methods
List	pgsql-hackers

Tree view

On Mon, Oct 5, 2020 at 3:37 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Thanks, Tomas for your feedback.

> 9) attcompression ...
>
> The main issue I see is what the patch does with attcompression. Instead
> of just using it to store a the compression method, it's also used to
> store the preserved compression methods. And using NameData to store
> this seems wrong too - if we really want to store this info, the correct
> way is either using text[] or inventing charvector or similar.

The reason for using the NameData is the get it in the fixed part of
the data structure.

> But to me this seems very much like a misuse of attcompression to track
> dependencies on compression methods, necessary because we don't have a
> separate catalog listing compression methods. If we had that, I think we
> could simply add dependencies between attributes and that catalog.

Basically, up to this patch, we are having only built-in compression
methods and those can not be dropped so we don't need any dependency
at all.  We just want to know what is the current compression method
and what is the preserve compression methods supported for this
attribute.  Maybe we can do it better instead of using the NameData
but I don't think it makes sense to add a separate catalog?

> Moreover, having the catalog would allow adding compression methods
> (from extensions etc) instead of just having a list of hard-coded
> compression methods. Which seems like a strange limitation, considering
> this thread is called "custom compression methods".

I think I forgot to mention while submitting the previous patch that
the next patch I am planning to submit is, Support creating the custom
compression methods wherein we can use pg_am catalog to insert the new
compression method.  And for dependency handling, we can create an
attribute dependency on the pg_am row.  Basically, we will create the
attribute dependency on the current compression method AM as well as
on the preserved compression methods AM.   As part of this, we will
add two build-in AMs for zlib and pglz, and the attcompression field
will be converted to the oid_vector (first OID will be of the current
compression method, followed by the preserved compression method's
oids).

> 10) compression parameters?
>
> I wonder if we could/should allow parameters, like compression level
> (and maybe other stuff, depending on the compression method). PG13
> allowed that for opclasses, so perhaps we should allow it here too.

Yes, that is also in the plan.   For doing this we are planning to add
an extra column in the pg_attribute which will store the compression
options for the current compression method. The original patch was
creating an extra catalog pg_column_compression, therein it maintains
the oid of the compression method as well as the compression options.
The advantage of creating an extra catalog is that we can keep the
compression options for the preserved compression methods also so that
we can support the options which can be used for decompressing the
data as well.  Whereas if we want to avoid this extra catalog then we
can not use that compression option for decompressing.  But most of
the options e.g. compression level are just for the compressing so it
is enough to store for the current compression method only.  What's
your thoughts?

Other comments look fine to me so I will work on them and post the
updated patch set.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Masahiko Sawada
Date: 05 October 2020, 08:36:43
Subject: Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From: Amit Kapila
Date: 05 October 2020, 08:52:45
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist

Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

Previous

Next