Re: Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Re: [HACKERS] Custom compression methods
Date
Msg-id CAFiTN-viudxb0kGwtwsXhSCAaNZ4OUYcP-7PM+tr-rEbgGeHQw@mail.gmail.com
Whole thread Raw
In response to Re: Re: [HACKERS] Custom compression methods  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Re: [HACKERS] Custom compression methods
List pgsql-hackers
On Thu, Aug 13, 2020 at 5:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>

There was some question which Robert has asked in this mail, please
find my answer inline.  Also, I have a few questions regarding further
splitting up this patch.

> On Fri, Jun 19, 2020 at 10:33 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >

> >
> > - One thing I really don't like about the patch is that it consumes a
> > bit from infomask2 for a new flag HEAP_HASCUSTOMCOMPRESSED. infomask
> > bits are at a premium, and there's been no real progress in the decade
> > plus that I've been hanging around here in clawing back any bit-space.
> > I think we really need to avoid burning our remaining bits for
> > anything other than a really critical need, and I don't think I
> > understand what the need is in this case.

IIUC,  the main reason for using this flag is for taking the decision
whether we need any detoasting for this tuple.  For example, if we are
rewriting the table because the compression method is changed then if
HEAP_HASCUSTOMCOMPRESSED bit is not set in the tuple header and tuple
length, not tup->t_len > TOAST_TUPLE_THRESHOLD then we don't need to
call heap_toast_insert_or_update function for this tuple.  Whereas if
this flag is set then we need to because we might need to uncompress
and compress back using a different compression method.  The same is
the case with INSERT into SELECT * FROM.

 I might be missing
> > something, but I'd really strongly suggest looking for a way to get
> > rid of this. It also invents the concept of a TupleDesc flag, and the
> > flag invented is TD_ATTR_CUSTOM_COMPRESSED; I'm not sure I see why we
> > need that, either.

This is also used in a similar way as the above but for the target
table, i.e. if the target table has the custom compressed attribute
then maybe we can not directly insert the tuple because it might have
compressed data which are compressed using the default compression
methods.

> > - It seems like this kind of approach has a sort of built-in
> > circularity problem. It means that every place that might need to
> > detoast a datum needs to be able to access the pg_am catalog. I wonder
> > if that's actually true. For instance, consider logical decoding. I
> > guess that can do catalog lookups in general, but can it do them from
> > the places where detoasting is happening? Moreover, can it do them
> > with the right snapshot? Suppose we rewrite a table to change the
> > compression method, then drop the old compression method, then try to
> > decode a transaction that modified that table before those operations
> > were performed. As an even more extreme example, suppose we need to
> > open pg_am, and to do that we have to build a relcache entry for it,
> > and suppose the relevant pg_class entry had a relacl or reloptions
> > field that happened to be custom-compressed. Or equally suppose that
> > any of the various other tables we use when building a relcache entry
> > had the same kind of problem, especially those that have TOAST tables.
> > We could just disallow the use of non-default compressors in the
> > system catalogs, but the benefits mentioned in
> > http://postgr.es/m/5541614A.5030208@2ndquadrant.com seem too large to
> > ignore.
> >
> > - I think it would be awfully appealing if we could find some way of
> > dividing this great big patch into some somewhat smaller patches. For
> > example:
> >
> > Patch #1. Add syntax allowing a compression method to be specified,
> > but the only possible choice is pglz, and the PRESERVE stuff isn't
> > supported, and changing the value associated with an existing column
> > isn't supported, but we can add tab-completion support and stuff.
> >
> > Patch #2. Add a second built-in method, like gzip or lz4.

I have already extracted these 2 patches from the main patch set.
But, in these patches, I am still storing the am_oid in the toast
header.  I am not sure can we get rid of that at least for these 2
patches?  But, then wherever we try to uncompress the tuple we need to
know the tuple descriptor to get the am_oid but I think that is not
possible in all the cases.  Am I missing something here?

> > Patch #3. Add support for changing the compression method associated
> > with a column, forcing a table rewrite.
> >
> > Patch #4. Add support for PRESERVE, so that you can change the
> > compression method associated with a column without forcing a table
> > rewrite, by including the old method in the PRESERVE list, or with a
> > rewrite, by not including it in the PRESERVE list.

Does this make sense to have Patch #3 and Patch #4, without having
Patch #5? I mean why do we need to support rewrite or preserve unless
we have the customer compression methods right? because the build-in
compression method can not be dropped so why do we need to preserve?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: list of extended statistics on psql
Next
From: Amit Kapila
Date:
Subject: Re: ReplicationSlotsComputeRequiredXmin seems pretty questionable