Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | CAFiTN-uPbdsBYtOojKaeYVeoGKZ6UU1dZG-nVy+=OA7w7eJf8A@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Custom compression methods
|
List | pgsql-hackers |
On Thu, Feb 11, 2021 at 8:17 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Feb 11, 2021 at 7:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > W.R.T the attached patch, In HeapTupleHeaderGetDatum, we don't even > > attempt to detoast if there is no external field in the tuple, in POC > > I have got rid of that check, but I think we might need to do better. > > Maybe we can add a flag in infomask to detect whether the tuple has > > any compressed data or not as we have for detecting the external data > > (HEAP_HASEXTERNAL). > > No. This feature isn't close to being important enough to justify > consuming an infomask bit. Okay, > I don't really see why we need it anyway. If array construction > already categorically detoasts, why can't we do the same thing here? > Would it really cost that much? In what case? Having compressed values > in a record we're going to store on disk actually seems like a pretty > dumb idea. We might end up trying to recompress something parts of > which have already been compressed. > If we refer the comments atop function "toast_flatten_tuple_to_datum" --------------- * We have a general rule that Datums of container types (rows, arrays, * ranges, etc) must not contain any external TOAST pointers. Without * this rule, we'd have to look inside each Datum when preparing a tuple * for storage, which would be expensive and would fail to extend cleanly * to new sorts of container types. * * However, we don't want to say that tuples represented as HeapTuples * can't contain toasted fields, so instead this routine should be called * when such a HeapTuple is being converted into a Datum. * * While we're at it, we decompress any compressed fields too. This is not * necessary for correctness, but reflects an expectation that compression * will be more effective if applied to the whole tuple not individual * fields. We are not so concerned about that that we want to deconstruct * and reconstruct tuples just to get rid of compressed fields, however. * So callers typically won't call this unless they see that the tuple has * at least one external field. ---------------- It appears that the general rule we want to follow is that while creating the composite type we want to flatten any external pointer, but while doing that we also decompress any compressed field with the assumption that compressing the whole row/array will be a better idea instead of keeping them compressed individually. However, if there are no external toast pointers then we don't want to make an effort to just decompress the compressed data. Having said that I don't think this rule is followed throughout the code for example 1. "ExecEvalRow" is calling HeapTupleHeaderGetDatum only if there is any external field and which is calling "toast_flatten_tuple_to_datum" so this is following the rule. 2. "ExecEvalWholeRowVar" is calling "toast_build_flattened_tuple", but this is just flattening the external toast pointer but not doing anything to the compressed data. 3. "ExecEvalArrayExpr" is calling "construct_md_array", which will detoast the attribute if attlen is -1, so this will decompress any compressed data even though there is no external toast pointer. So in 1 we are following the rule but in 2 and 3 we are not. IMHO, for the composite data types we should make common a rule and we should follow that everywhere. As you said it will be good if we can always detoast any external/compressed data, that will help in getting better compression as well as fetching the data will be faster because we can avoid multi level detoasting/decompression. I will analyse this further and post a patch for the same. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: