Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id CA+Tgmoa3qzewoh9JvXVm5BEA_qKewK9hfJJhyAZFXYhsx9P-1g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: [HACKERS] Custom compression methods  (Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] Custom compression methods  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Wed, Feb 10, 2021 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> [ new patches ]

I think that in both varattrib_4b and toast_internals.h it would be
better to pick a less generic field name. In toast_internals.h it's
just info; in postgres.h it's va_info. But:

[rhaas pgsql]$ git grep info | wc -l
   24552

There are no references in the current source tree to va_info, so at
least that one is greppable, but it's still not very descriptive. I
suggest info -> tcinfo and va_info -> va_tcinfo, where "tc" stands for
"TOAST compression". Looking through 24552 references to info to find
the ones that pertain to this feature might take longer than searching
the somewhat shorter list of references to tcinfo, which prepatch is
just:

[rhaas pgsql]$ git grep tcinfo | wc -l
       0

I don't see why we should allow for datum_decompress to be optional,
as toast_decompress_datum_slice does. Likely every serious compression
method will support that anyway. If not, the compression AM can deal
with the problem, rather than having the core code do it. That will
save some tiny amount of performance, too.

src/backend/access/compression/Makefile is missing a copyright header.

It's really sad that lz4_cmdecompress_slice allocates
VARRAWSIZE_4B_C(value) + VARHDRSZ rather than slicelength + VARHDRSZ
as pglz_cmdecompress_slice() does. Is that a mistake, or is that
necessary for some reason? If it's a mistake, let's fix it. If it's
necessary, let's add a comment about why, probably starting with
"Unfortunately, ....".

I think you have a fairly big problem with row types. Consider this example:

create table t1 (a int, b text compression pglz);
create table t2 (a int, b text compression lz4);
create table t3 (x t1);
insert into t1 values (1, repeat('foo', 1000));
insert into t2 values (1, repeat('foo', 1000));
insert into t3 select t1 from t1;
insert into t3 select row(a, b)::t1 from t2;

rhaas=# select pg_column_compression((t3.x).b) from t3;
 pg_column_compression
-----------------------
 pglz
 lz4
(2 rows)

That's not good, because now

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Jonah H. Harris"
Date:
Subject: Re: Extensibility of the PostgreSQL wire protocol
Next
From: Jan Wieck
Date:
Subject: Re: Extensibility of the PostgreSQL wire protocol