Re: [PATCH] Covering SPGiST index - Mailing list pgsql-hackers

From Pavel Borisov
Subject Re: [PATCH] Covering SPGiST index
Date
Msg-id CALT9ZEG1FOPVM0ewfgw13VgQ7RwjxrpoEbY9psRBexQgPH4q5Q@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Covering SPGiST index  (Pavel Borisov <pashkin.elfe@gmail.com>)
List pgsql-hackers
вт, 17 нояб. 2020 г. в 11:36, Pavel Borisov <pashkin.elfe@gmail.com>:
I've started to review this, and I've got to say that I really disagree
with this choice:

+ * If there are INCLUDE columns, they are stored after a key value, each
+ * starting from its own typalign boundary. Unlike IndexTuple, first INCLUDE
+ * value does not need to start from MAXALIGN boundary, so SPGiST uses private
+ * routines to access them.
Tom, I took a stab at making the code for tuple creation/decomposition more optimal. Now I see several options for this:
1. Included values can be added after key value as a whole index tuple. Pro of this: it reuses existing code perfectly. Con is that it will introduce extra (empty) index tuple header.
2. Existing state: pro is that in my opinion, it has the least possible gaps. The con is the need to duplicate much of the existing code with some modification. Frankly I don't like this duplication very much even if it is only a private spgist code.
2A. Existing state can be shifted into fewer changes in index_form_tuple and index_deform_tuple if I shift the null mask after the tuple header and before the key value (SpGistTupleHeader+nullmask chunk will be maxaligned). This is what I proposed in the previous answer. I tried to work on this variant but it will need to duplicate index_form_tuple and index_deform_tuple code into private version. The reason is that spgist tuple has its own header of different size and in my understanding, it can not be incorporated using index_form_tuple.
3. I can split index_form_tuple into two parts: a) header adding and size calculation,  b) filling attributes. External (a), which could be constructed differently for SpGist, and internal (b) being universal.
3A. I can make index_form_tuple accept pointer as an argument to create tuple in already palloced memory area (with the shift to its start). So external caller will be able to incorporate headers after calling index_form_tuple routine.

Maybe there is some other way I don't imagine yet. Which way do you think for me better to follow? What is your advice?
 
I also find it unacceptable that you stuck a tuple descriptor pointer into
the rd_amcache structure.  The relcache only supports that being a flat
blob of memory.  I see that you tried to hack around that by having
spgGetCache reconstruct a new tupdesc every time through, but (a) that's
actually worse than having no cache at all, and (b) spgGetCache doesn't
really know much about the longevity of the memory context it's called in.
This could easily lead to dangling tuple pointers, serious memory bloat
from repeated tupdesc construction, or quite possibly both.  Safer would
be to build the tupdesc during initSpGistState(), or maybe just make it
on-demand.  In view of the previous point, I'm also wondering if there's
any way to get the relcache's regular rd_att tupdesc to be useful here,
so we don't have to build one during scans at all.
I see that FormData_pg_attribute's inside TupleDescData are situated in a single memory chunk. If I add it at the ending of allocated SpGistCache as a copy of this chunk (using memcpy), not a pointer to it as it is now, will it be safe for use?
Or maybe it would still bel better to initialize tuple descriptor any time initSpGistState called without trying to cache it?

What will you advise?

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

pgsql-hackers by date:

Previous
From: Victor Yegorov
Date:
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Next
From: Peter Geoghegan
Date:
Subject: Re: Deleting older versions in unique indexes to avoid page splits