Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id CAFiTN-twikXFB-TiL+_XZp+ZPcb_mjXaSahqVG5_VDthonTg6g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Custom compression methods
List pgsql-hackers
On Wed, Feb 3, 2021 at 2:07 AM Robert Haas <robertmhaas@gmail.com> wrote:

While going through your comments, I need some suggestions about the
one except this all other comments looks fine to me.
>
> It puzzles me that CompareCompressionMethodAndDecompress() calls
> slot_getallattrs() just before clearing the slot. It seems like this
> ought to happen before we loop over the attributes, so that we don't
> need to call slot_getattr() every time.

Yeah, actually, I thought I would avoid calling slot_getallattrs if
none of the attributes got decompress.  I agree if we call this before
we can avoid calling slot_getattr but slot_getattr
is only called for the attribute which has attlen -1. I agree that if
we call slot_getattr for attnum n then it will deform all the
attributes before that.  But then slot_getallattrs only need to deform
the remaining attributes not all.  But maybe we can call the
slot_getallattrs as soon as we see the first attribute with attlen -1
and then avoid calling subsequent slot_getattr, maybe that is better
than compared to what I have because we will avoid calling
slot_getattr for many attributes, especially when there are many
verlena.

 See the comment for that
> function. But even if we didn't do that for some reason, why would we
> do it here? If it's already been done, it shouldn't do anything, and
> if it hasn't been done, it might overwrite some of the values we just
> poked into tts_values.

It will not overwrite those values because slot_getallattrs will only
fetch the values for "attnum > slot->tts_nvalid" so whatever we
already fetched will not be overwritten.  Just did that at the end to
optimize the normal cases where we are not doing "insert into select *
from" so that those can get away without calling slot_getallattrs at
all.  However, maybe calling slot_getattr for each varlena might cost
us extra so I am okay to call slot_getallattrs this early.

 It also seems suspicious that we can get away
> with clearing the slot and then again marking it valid. I'm not sure
> it really works like that. Like, can't clearing the slot invalidate
> pointers stored in tts_values[]? For instance, if they are pointers
> into an in-memory heap tuple, tts_heap_clear() is going to free the
> tuple; if they are pointers into a buffer, tts_buffer_heap_clear() is
> going to unpin it.

Yeah, that's completely wrong.  I think I missed that part.  One
solution can be that we can just detach the tuple from the slot and
then materialize so it will form the tuple with new values and then we
can clear the old tuple.  But that seems a bit hacky.

 I think the supported procedure for this sort of
> thing is to have a second slot, set tts_values, tts_isnull etc. and
> then materialize the slot. After materializing the new slot, it's
> independent of the old slot, which can then be cleared. See for
> example tts_virtual_materialize().

Okay, so if we take a new slot then we need to set this slot reference
in the ScanState also otherwise that might point to the old slot.  I
haven't yet analyzed where all we might be keeping the reference to
that old slot.  Or I am missing something.

Anyway, I will get a better idea once I try to implement this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Faulty HEAP_XMAX_LOCK_ONLY & HEAP_KEYS_UPDATED hintbit combination
Next
From: "Drouvot, Bertrand"
Date:
Subject: Re: Minimal logical decoding on standbys