Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id CAFiTN-uPbdsBYtOojKaeYVeoGKZ6UU1dZG-nVy+=OA7w7eJf8A@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Custom compression methods
List pgsql-hackers
On Thu, Feb 11, 2021 at 8:17 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Feb 11, 2021 at 7:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > W.R.T the attached patch, In HeapTupleHeaderGetDatum, we don't even
> > attempt to detoast if there is no external field in the tuple, in POC
> > I have got rid of that check, but I think we might need to do better.
> > Maybe we can add a flag in infomask to detect whether the tuple has
> > any compressed data or not as we have for detecting the external data
> > (HEAP_HASEXTERNAL).
>
> No. This feature isn't close to being important enough to justify
> consuming an infomask bit.

Okay,

> I don't really see why we need it anyway. If array construction
> already categorically detoasts, why can't we do the same thing here?
> Would it really cost that much? In what case? Having compressed values
> in a record we're going to store on disk actually seems like a pretty
> dumb idea. We might end up trying to recompress something parts of
> which have already been compressed.
>

If we refer the comments atop function "toast_flatten_tuple_to_datum"

---------------
*    We have a general rule that Datums of container types (rows, arrays,
 *    ranges, etc) must not contain any external TOAST pointers.  Without
 *    this rule, we'd have to look inside each Datum when preparing a tuple
 *    for storage, which would be expensive and would fail to extend cleanly
 *    to new sorts of container types.
 *
 *    However, we don't want to say that tuples represented as HeapTuples
 *    can't contain toasted fields, so instead this routine should be called
 *    when such a HeapTuple is being converted into a Datum.
 *
 *    While we're at it, we decompress any compressed fields too.  This is not
 *    necessary for correctness, but reflects an expectation that compression
 *    will be more effective if applied to the whole tuple not individual
 *    fields.  We are not so concerned about that that we want to deconstruct
 *    and reconstruct tuples just to get rid of compressed fields, however.
 *    So callers typically won't call this unless they see that the tuple has
 *    at least one external field.
----------------

It appears that the general rule we want to follow is that while
creating the composite type we want to flatten any external pointer,
but while doing that we also decompress any compressed field  with the
assumption that compressing the whole row/array will be a better idea
instead of keeping them compressed individually.  However, if there
are no external toast pointers then we don't want to make an effort to
just decompress the compressed data.

Having said that I don't think this rule is followed throughout the
code for example

1. "ExecEvalRow" is calling HeapTupleHeaderGetDatum only if there is
any external field and which is calling "toast_flatten_tuple_to_datum"
so this is following the rule.
2. "ExecEvalWholeRowVar" is calling "toast_build_flattened_tuple", but
this is just flattening the external toast pointer but not doing
anything to the compressed data.
3. "ExecEvalArrayExpr" is calling "construct_md_array", which will
detoast the attribute if attlen is -1, so this will decompress any
compressed data even though there is no external toast pointer.

So in 1 we are following the rule but in 2 and 3 we are not.

IMHO, for the composite data types we should make common a rule and we
should follow that everywhere.   As you said it will be good if we can
always detoast any external/compressed data, that will help in getting
better compression as well as fetching the data will be faster because
we can avoid multi level detoasting/decompression.  I will analyse
this further and post a patch for the same.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: public schema default ACL
Next
From: "Joel Jacobson"
Date:
Subject: Re: [HACKERS] GSoC 2017: Foreign Key Arrays