Composite Datums containing toasted fields are a bad idea(?) - Mailing list pgsql-hackers

From Tom Lane
Subject Composite Datums containing toasted fields are a bad idea(?)
Date
Msg-id 29007.1396038881@sss.pgh.pa.us
Whole thread Raw
Responses Re: Composite Datums containing toasted fields are a bad idea(?)
Re: Composite Datums containing toasted fields are a bad idea(?)
List pgsql-hackers
Way way back in commit ae93e5fd6e8a7e2321e87d23165d9d7660cde598,
we established a coding rule that it was okay for composite Datums
to contain external (out-of-line) toasted fields, as long as such
toasting didn't go more than one level deep in any tuple.  This meant
that heap_form_tuple had to go through nontrivial pushups to maintain
that invariant: each composite field has to be inspected to see if any
of its component fields are external datums.  Surprisingly, no one has
complained about the cost of the lookups that are required to see
whether fields are composite in the first place.

However, in view of today's bug report from Jan Pecek, I'm wondering
if we shouldn't rethink this.  Jan pointed out that the array code was
failing to prevent composites-with-external-fields from getting into
arrays, and after a bit of looking around I'm afraid there are more such
bugs elsewhere.  One example is in the planner's evaluate_expr(), which
supposes that just PG_DETOAST_DATUM() is enough to make a value safe for
long-term storage in a plan tree.  Range types are making the same sort
of assumption as arrays (hm, can you have a range over a composite type?
Probably, considering we have sort operators for composites).  And there
are places in the index AMs that seem to assume likewise, which is an
issue for AMs in which an indexed value could be composite.

I think we might be better off to get rid of toast_flatten_tuple_attribute
and instead insist that composite Datums never contain any external toast
pointers in the first place.  That is, places that call heap_form_tuple
to make a composite Datum (rather than a tuple that's due to be stored
to disk) would be responsible for detoasting any fields with external
values first.  We could make a wrapper routine for heap_form_tuple to
take care of this rather than duplicating the code in lots of places.

From a performance standpoint this is probably a small win.  In cases
where a composite Datum is formed and ultimately saved to disk, it should
be a win, since we'd have to detoast those fields anyway, and we can avoid
the overhead of an extra disassembly and reassembly of the composite
value.  If the composite value is never sent to disk, it's a bit harder
to tell: we lose if the specific field value is never extracted from the
composite, but on the other hand we win if it's extracted more than once.
In any case, adding the extra syscache lookups involved in doing
toast_flatten_tuple_attribute in lots more places isn't appealing.

From a code correctness standpoint, the question really is whether we can
find all the places that build composite datums more easily than we can
find all the places that ought to be calling toast_flatten_tuple_attribute
and aren't.  I have to admit I'm not sure about that.  There seem to be
basically two places to fix in the main executor (ExecEvalRow and
ExecEvalFieldStore), and roughly half a dozen calls of heap_form_tuple in
the various PLs, but I'm not sure I've not missed some cases.

One thing we could do to try to flush out missed cases is to remove
heap_form_tuple's setting of the tuple-Datum header fields, pushing
that functionality into the new wrapper routine.  Then, any un-updated
code would generate clearly invalid composite datums, rather than only
failing in infrequent corner cases.

Another issue is what about third-party code.  There seems to be risk
either way, but it would accrue to completely different code depending
on which way we try to fix this.

Thoughts?
        regards, tom lane



pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: [COMMITTERS] pgsql: Fix EquivalenceClass processing for nested append relations.
Next
From: Andres Freund
Date:
Subject: Re: Something flaky in the "relfilenode mapping" infrastructure