Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id CAFiTN-uHnBhkVBDYznSS-gNGtaLzXCOLVxXQ1SxhnkzYAKRU0A@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Custom compression methods
List pgsql-hackers
On Thu, Feb 11, 2021 at 3:26 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Feb 10, 2021 at 3:06 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > I think you have a fairly big problem with row types. Consider this example:
> >
> > create table t1 (a int, b text compression pglz);
> > create table t2 (a int, b text compression lz4);
> > create table t3 (x t1);
> > insert into t1 values (1, repeat('foo', 1000));
> > insert into t2 values (1, repeat('foo', 1000));
> > insert into t3 select t1 from t1;
> > insert into t3 select row(a, b)::t1 from t2;
> >
> > rhaas=# select pg_column_compression((t3.x).b) from t3;
> >  pg_column_compression
> > -----------------------
> >  pglz
> >  lz4
> > (2 rows)
> >
> > That's not good, because now

Yeah, that's really bad.

> ...because now I hit send too soon. Also, because now column b has
> implicit dependencies on both compression AMs and the rest of the
> system has no idea. I think we probably should have a rule that
> nothing except pglz is allowed inside of a record, just to keep it
> simple. The record overall can be toasted so it's not clear why we
> should also be toasting the original columns at all, but I think
> precedent probably argues for continuing to allow PGLZ, as it can
> already be like that on disk and pg_upgrade is a thing. The same kind
> of issue probably exists for arrays and range types.

While constructing a row type from the tuple we flatten the external
data I think that would be the place to decompress the data if they
are not compressed with PGLZ.  For array-type, we are already
detoasting/decompressing the source attribute see "construct_md_array"
so the array type doesn't have this problem.  I haven't yet checked
the range type.  Based on my analysis it appeared that the different
data types are getting constructed at different paths so maybe we
should find some centralized place or we need to make some function
call in all such places so that we can decompress the attribute if
required before forming the tuple for the composite type.

I have quickly hacked the code and after that, your test case is working fine.

postgres[55841]=# select pg_column_compression((t3.x).b) from t3;
 pg_column_compression
-----------------------
 pglz

(2 rows)

-> now the attribute 'b' inside the second tuple is decompressed
(because it was not compressed with PGLZ) so the compression method of
b is NULL

postgres[55841]=# select pg_column_compression((t3.x)) from t3;
 pg_column_compression
-----------------------

 pglz
(2 rows)

--> but the second row itself is compressed back using the local
compression method of t3

W.R.T the attached patch, In HeapTupleHeaderGetDatum, we don't even
attempt to detoast if there is no external field in the tuple, in POC
I have got rid of that check, but I think we might need to do better.
Maybe we can add a flag in infomask to detect whether the tuple has
any compressed data or not as we have for detecting the external data
(HEAP_HASEXTERNAL).

So I will do some more analysis in this area and try to come up with a
clean solution.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Greg Nancarrow
Date:
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Next
From: Michael Paquier
Date:
Subject: Re: pg_cryptohash_final possible out-of-bounds access (per Coverity)