Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | CAFiTN-twikXFB-TiL+_XZp+ZPcb_mjXaSahqVG5_VDthonTg6g@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Custom compression methods
|
List | pgsql-hackers |
On Wed, Feb 3, 2021 at 2:07 AM Robert Haas <robertmhaas@gmail.com> wrote: While going through your comments, I need some suggestions about the one except this all other comments looks fine to me. > > It puzzles me that CompareCompressionMethodAndDecompress() calls > slot_getallattrs() just before clearing the slot. It seems like this > ought to happen before we loop over the attributes, so that we don't > need to call slot_getattr() every time. Yeah, actually, I thought I would avoid calling slot_getallattrs if none of the attributes got decompress. I agree if we call this before we can avoid calling slot_getattr but slot_getattr is only called for the attribute which has attlen -1. I agree that if we call slot_getattr for attnum n then it will deform all the attributes before that. But then slot_getallattrs only need to deform the remaining attributes not all. But maybe we can call the slot_getallattrs as soon as we see the first attribute with attlen -1 and then avoid calling subsequent slot_getattr, maybe that is better than compared to what I have because we will avoid calling slot_getattr for many attributes, especially when there are many verlena. See the comment for that > function. But even if we didn't do that for some reason, why would we > do it here? If it's already been done, it shouldn't do anything, and > if it hasn't been done, it might overwrite some of the values we just > poked into tts_values. It will not overwrite those values because slot_getallattrs will only fetch the values for "attnum > slot->tts_nvalid" so whatever we already fetched will not be overwritten. Just did that at the end to optimize the normal cases where we are not doing "insert into select * from" so that those can get away without calling slot_getallattrs at all. However, maybe calling slot_getattr for each varlena might cost us extra so I am okay to call slot_getallattrs this early. It also seems suspicious that we can get away > with clearing the slot and then again marking it valid. I'm not sure > it really works like that. Like, can't clearing the slot invalidate > pointers stored in tts_values[]? For instance, if they are pointers > into an in-memory heap tuple, tts_heap_clear() is going to free the > tuple; if they are pointers into a buffer, tts_buffer_heap_clear() is > going to unpin it. Yeah, that's completely wrong. I think I missed that part. One solution can be that we can just detach the tuple from the slot and then materialize so it will form the tuple with new values and then we can clear the old tuple. But that seems a bit hacky. I think the supported procedure for this sort of > thing is to have a second slot, set tts_values, tts_isnull etc. and > then materialize the slot. After materializing the new slot, it's > independent of the old slot, which can then be cleared. See for > example tts_virtual_materialize(). Okay, so if we take a new slot then we need to set this slot reference in the ScanState also otherwise that might point to the old slot. I haven't yet analyzed where all we might be keeping the reference to that old slot. Or I am missing something. Anyway, I will get a better idea once I try to implement this. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: