"Teodor Sigaev" <teodor@sigaev.ru> writes:
>> A closer reading, however, shows that at least for cases like intarray,
>> btree_gist, etc., the detoasting of an index value is being done in the
>> gist decompress function, so the value seen via GISTENTRY in the other
>> functions should already have been detoasted once.
>
> Right, any stored value form index should be decompressed by GiST decompress
> support method.
The problem is that this is the only place in the code where we make wholesale
assumptions that a datum that comes from a tuple (heap tuple or index tuple)
isn't toasted. There are other places but they're always flagged with big
comments explaining *why* the datum can't be toasted and they're minor
localized instances, not a whole subsystem.
This was one of the assumptions that the packed varlena code depended on: that
anyone looking at a datum from a tuple would always detoast it even if they
had formed the tuple themselves and never passed it through the toaster. The
*only* place this has come up as a problem is in GIST.
I would say we could just exempt all the GIST data types from packed varlenas
except that doesn't even solve the problem completely. There's at least one
place, _int_gist.c, where the entry type is just a plain array. So unless I
exempt *all* arrays the arrays it gets out of the index tuples it forms will
be packed and need to be detoasted.
So I'm leaning towards going through all of the GIST modules and replacing all
the (Type*)DatumGetPointers formulations with actually DatumGetType and all
the (Type*)PG_GETARG_POINTER() formulations with PG_GETARG_TYPE(). And having
those macros call PG_DETOAST_DATUM().
How would you feel about that?
There are two downsides I see:
It's an extra check against the toast flag bits which is extra cpu. But this
is how the rest of the Postgres source works and we don't think the extra cpu
cost is significant.
There may be places that assume they won't leak detoasted copies of datums. If
you could help point those places out they should just need PG_FREE_IF_COPY()
calls or in some cases a pg_detoast_datum_copy() call earlier in the correct
memeory context. This again is how the rest of the postgres source handles
this.
-- Gregory Stark EnterpriseDB http://www.enterprisedb.com