Composite Datums containing toasted fields are a bad idea(?) - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Composite Datums containing toasted fields are a bad idea(?) |
Date | |
Msg-id | 29007.1396038881@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Composite Datums containing toasted fields are a bad idea(?)
Re: Composite Datums containing toasted fields are a bad idea(?) |
List | pgsql-hackers |
Way way back in commit ae93e5fd6e8a7e2321e87d23165d9d7660cde598, we established a coding rule that it was okay for composite Datums to contain external (out-of-line) toasted fields, as long as such toasting didn't go more than one level deep in any tuple. This meant that heap_form_tuple had to go through nontrivial pushups to maintain that invariant: each composite field has to be inspected to see if any of its component fields are external datums. Surprisingly, no one has complained about the cost of the lookups that are required to see whether fields are composite in the first place. However, in view of today's bug report from Jan Pecek, I'm wondering if we shouldn't rethink this. Jan pointed out that the array code was failing to prevent composites-with-external-fields from getting into arrays, and after a bit of looking around I'm afraid there are more such bugs elsewhere. One example is in the planner's evaluate_expr(), which supposes that just PG_DETOAST_DATUM() is enough to make a value safe for long-term storage in a plan tree. Range types are making the same sort of assumption as arrays (hm, can you have a range over a composite type? Probably, considering we have sort operators for composites). And there are places in the index AMs that seem to assume likewise, which is an issue for AMs in which an indexed value could be composite. I think we might be better off to get rid of toast_flatten_tuple_attribute and instead insist that composite Datums never contain any external toast pointers in the first place. That is, places that call heap_form_tuple to make a composite Datum (rather than a tuple that's due to be stored to disk) would be responsible for detoasting any fields with external values first. We could make a wrapper routine for heap_form_tuple to take care of this rather than duplicating the code in lots of places. From a performance standpoint this is probably a small win. In cases where a composite Datum is formed and ultimately saved to disk, it should be a win, since we'd have to detoast those fields anyway, and we can avoid the overhead of an extra disassembly and reassembly of the composite value. If the composite value is never sent to disk, it's a bit harder to tell: we lose if the specific field value is never extracted from the composite, but on the other hand we win if it's extracted more than once. In any case, adding the extra syscache lookups involved in doing toast_flatten_tuple_attribute in lots more places isn't appealing. From a code correctness standpoint, the question really is whether we can find all the places that build composite datums more easily than we can find all the places that ought to be calling toast_flatten_tuple_attribute and aren't. I have to admit I'm not sure about that. There seem to be basically two places to fix in the main executor (ExecEvalRow and ExecEvalFieldStore), and roughly half a dozen calls of heap_form_tuple in the various PLs, but I'm not sure I've not missed some cases. One thing we could do to try to flush out missed cases is to remove heap_form_tuple's setting of the tuple-Datum header fields, pushing that functionality into the new wrapper routine. Then, any un-updated code would generate clearly invalid composite datums, rather than only failing in infrequent corner cases. Another issue is what about third-party code. There seems to be risk either way, but it would accrue to completely different code depending on which way we try to fix this. Thoughts? regards, tom lane
pgsql-hackers by date: