Home > mailing lists

Re: Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Re: [HACKERS] Custom compression methods
Date	August 25, 2020 17:49:51
Msg-id	CA+TgmoYYeMv=eJqf2JF8VWDPe=S6rYfH78DOxzE-Sa2kQHhiPw@mail.gmail.com Whole thread
In response to	Re: Re: [HACKERS] Custom compression methods (Dilip Kumar <dilipbalaut@gmail.com>)
Responses	Re: Re: [HACKERS] Custom compression methods
List	pgsql-hackers

Tree view

On Mon, Aug 24, 2020 at 2:12 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> IIUC,  the main reason for using this flag is for taking the decision
> whether we need any detoasting for this tuple.  For example, if we are
> rewriting the table because the compression method is changed then if
> HEAP_HASCUSTOMCOMPRESSED bit is not set in the tuple header and tuple
> length, not tup->t_len > TOAST_TUPLE_THRESHOLD then we don't need to
> call heap_toast_insert_or_update function for this tuple.  Whereas if
> this flag is set then we need to because we might need to uncompress
> and compress back using a different compression method.  The same is
> the case with INSERT into SELECT * FROM.

This doesn't really seem worth it to me. I don't see how we can
justify burning an on-disk bit just to save a little bit of overhead
during a rare maintenance operation. If there's a performance problem
here we need to look for another way of mitigating it. Slowing CLUSTER
and/or VACUUM FULL down by a large amount for this feature would be
unacceptable, but is that really a problem? And if so, can we solve it
without requiring this bit?

> > > something, but I'd really strongly suggest looking for a way to get
> > > rid of this. It also invents the concept of a TupleDesc flag, and the
> > > flag invented is TD_ATTR_CUSTOM_COMPRESSED; I'm not sure I see why we
> > > need that, either.
>
> This is also used in a similar way as the above but for the target
> table, i.e. if the target table has the custom compressed attribute
> then maybe we can not directly insert the tuple because it might have
> compressed data which are compressed using the default compression
> methods.

I think this is just an in-memory flag, which is much less precious
than an on-disk bit. However, I still wonder whether it's really the
right design. I think that if we offer lz4 we may well want to make it
the default eventually, or perhaps even right away. If that ends up
causing this flag to get set on every tuple all the time, then it
won't really optimize anything.

> I have already extracted these 2 patches from the main patch set.
> But, in these patches, I am still storing the am_oid in the toast
> header.  I am not sure can we get rid of that at least for these 2
> patches?  But, then wherever we try to uncompress the tuple we need to
> know the tuple descriptor to get the am_oid but I think that is not
> possible in all the cases.  Am I missing something here?

I think we should instead use the high bits of the toast size word for
patches #1-#4, as discussed upthread.

> > > Patch #3. Add support for changing the compression method associated
> > > with a column, forcing a table rewrite.
> > >
> > > Patch #4. Add support for PRESERVE, so that you can change the
> > > compression method associated with a column without forcing a table
> > > rewrite, by including the old method in the PRESERVE list, or with a
> > > rewrite, by not including it in the PRESERVE list.
>
> Does this make sense to have Patch #3 and Patch #4, without having
> Patch #5? I mean why do we need to support rewrite or preserve unless
> we have the customer compression methods right? because the build-in
> compression method can not be dropped so why do we need to preserve?

I think that patch #3 makes sense because somebody might have a table
that is currently compressed with pglz and they want to switch to lz4,
and I think patch #4 also makes sense because they might want to start
using lz4 for future data but not force a rewrite to get rid of all
the pglz data they've already got. Those options are valuable as soon
as there is more than one possible compression algorithm, even if
they're all built in. Now, as I said upthread, it's also true that you
could do #5 before #3 and #4. I don't think that's insane. But I
prefer it in the other order, because I think having #5 without #3 and
#4 wouldn't be too much fun for users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Bruce Momjian
Date: 25 August 2020, 17:42:29
Subject: Re: some unused parameters cleanup

From: Andreas Karlsson
Date: 25 August 2020, 17:50:55
Subject: Re: some unused parameters cleanup

Re: Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

Previous

Next