Thread: Virtual tuple slots versus TOAST: big problem
I looked into this 8.1 bug reported by Alexey Beschiokov: http://archives.postgresql.org/pgsql-bugs/2005-11/msg00192.php The executive summary is: it looks like a kluge solution isn't hard, but solving it in a more reasonable fashion is going to require some significant API changes inside the backend :-( The problem is that ExecInsert() does ExecMaterializeSlot() to get a physical tuple from the virtual tuple slot it's initially handed. After this step, the TupleTableSlot contains that same physical tuple. ExecInsert then passes the bare tuple to some subroutines and the Slot to other subroutines. In the particular case shown by Alexey, the sequence is:ExecConstraints is called with the slotheap_insert (and thence tuptoaster.c) is called with the bare tupleExecInsertIndexTuplesis called with the slot The problem is that ExecConstraints accesses some fields of the slot, causing partial extraction of tuple fields and setup of the tts_values[] array within the slot. After that, the tuptoaster proceeds to smash down the tuple to a size it likes, which it does by scribbling on the HeapTuple structure it's handed. That makes the TupleTableSlot structure inconsistent --- it has tts_values fields pointing at memory that's no longer part of the tuple actually stored in the slot. When ExecInsertIndexTuples then tries to extract more fields from the slot, a crash is not unlikely. (It's annoying that we didn't find this during beta, but a failure requires ExecInsertIndexTuples to try to access fields to the right of the last one fetched by ExecConstraints, and it only matters if the tuple actually got toasted in between, so it is a bit of a corner case.) ExecUpdate has the same bug. I don't think there are any other places, because ExecMaterializeSlot isn't used elsewhere ATM. The problem did not exist before 8.1 because TupleTableSlots didn't contain extra info beyond the bare tuple, so tuptoaster could hack that without rendering the Slot inconsistent. I think that a kluge fix is possible by setting tts_nvalid to zero after invoking heap_insert or heap_update, so that ExecInsertIndexTuples will be forced to recompute tts_values instead of reusing the previous data. This is probably the right thing to do in the 8.1 branch, but it's incredibly ugly and we need to fix it better going forward. "A better fix" seems to require passing the TupleTableSlot, not just the bare tuple, down to the toaster --- else there is no way for the toaster to update the data structure that it's accidentally invalidating. This seems like it might be a good idea anyway on performance grounds: we could save one cycle of heap_deform_tuple and heap_formtuple in the case where toasting is needed, if the toaster is invoked on the tuple while it's still in virtual-slot format. The problem is that given the current structure, that means changing the APIs of heap_insert and heap_update, or else making near-duplicate versions that take a TupleTableSlot instead of a bare tuple. Neither of these things seem real attractive. If we wanted to avoid forming a physical tuple until the last moment we'd also need to change the APIs associated with triggers, ie make them work on Slots not tuples. This'd be even more invasive. It would likely be cleaner and more efficient in the long run, but there's a lot of code to touch, and breaking user-defined triggers doesn't seem palatable at all. Any thoughts on the best way to proceed? regards, tom lane
On Sat, 2005-11-19 at 12:22 -0500, Tom Lane wrote: > "A better fix" seems to require passing the TupleTableSlot, not just the > bare tuple, down to the toaster --- else there is no way for the toaster > to update the data structure that it's accidentally invalidating. This > seems like it might be a good idea anyway on performance grounds: we > could save one cycle of heap_deform_tuple and heap_formtuple in the case > where toasting is needed, if the toaster is invoked on the tuple while > it's still in virtual-slot format. The problem is that given the > current structure, that means changing the APIs of heap_insert and > heap_update, or else making near-duplicate versions that take a > TupleTableSlot instead of a bare tuple. Neither of these things seem > real attractive. We changed the heap_insert API for 8,1 so would it be a problem to change it again for 8.2 Other API changes sound likely, so seems less of a problem. > If we wanted to avoid forming a physical tuple until > the last moment we'd also need to change the APIs associated with > triggers, ie make them work on Slots not tuples. This'd be even more > invasive. It would likely be cleaner and more efficient in the long > run, but there's a lot of code to touch, and breaking user-defined > triggers doesn't seem palatable at all. But if there are no triggers, then no problem. Not sure I like this idea, but we could support both old and new trigger APIs. If that was marked in the catalog, then we'd know when to change slots into tuples. What is the performance loss/gain by waiting? On a different note, I'd like to consider how we can reduce the overhead for a HeapTuple for when we do HashAgg and HashJoin. Currently the overhead is substantial, with at least 12 bytes unused in most cases, out of a total current overhead of > 20 bytes per tuple. Reducing that overhead would increase the limit at which we start to do sorts. Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > On Sat, 2005-11-19 at 12:22 -0500, Tom Lane wrote: >> ... The problem is that given the >> current structure, that means changing the APIs of heap_insert and >> heap_update, or else making near-duplicate versions that take a >> TupleTableSlot instead of a bare tuple. Neither of these things seem >> real attractive. > We changed the heap_insert API for 8,1 so would it be a problem to > change it again for 8.2 It's not so much heap_insert/update, it's simple_heap_insert/update, which are called in a *lot* of places. Nonetheless, we have made such changes before (the simple_xxx routines didn't exist at all a few years ago), so it's not out of the question to do it again. A more constrained change would just alter the tuptoaster API so that it hands back an entirely new tuple instead of scribbling on the header it's handed. This doesn't save any overhead but it fixes the problem in a reasonably robust way, instead of expecting callers to compensate. (I'm unconvinced that my quick hack of yesterday plugged all the holes.) We'd need to add a couple lines to the heapam.c routines to free the separately allocated tuples, but that's no big deal. > What is the performance loss/gain by waiting? It's really hard to tell, but in any case it's nil if the tuple isn't long enough to need toasting. So I'm not sure we should go through major pushups to change it. regards, tom lane
On 11/20/2005 11:23 AM, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: >> On Sat, 2005-11-19 at 12:22 -0500, Tom Lane wrote: >>> ... The problem is that given the >>> current structure, that means changing the APIs of heap_insert and >>> heap_update, or else making near-duplicate versions that take a >>> TupleTableSlot instead of a bare tuple. Neither of these things seem >>> real attractive. > >> We changed the heap_insert API for 8,1 so would it be a problem to >> change it again for 8.2 > > It's not so much heap_insert/update, it's simple_heap_insert/update, > which are called in a *lot* of places. Nonetheless, we have made such > changes before (the simple_xxx routines didn't exist at all a few years > ago), so it's not out of the question to do it again. > > A more constrained change would just alter the tuptoaster API so that > it hands back an entirely new tuple instead of scribbling on the header > it's handed. This doesn't save any overhead but it fixes the problem > in a reasonably robust way, instead of expecting callers to compensate. > (I'm unconvinced that my quick hack of yesterday plugged all the holes.) > We'd need to add a couple lines to the heapam.c routines to free the > separately allocated tuples, but that's no big deal. > >> What is the performance loss/gain by waiting? > > It's really hard to tell, but in any case it's nil if the tuple isn't > long enough to need toasting. So I'm not sure we should go through > major pushups to change it. Assuming that the saved header values don't need to be recomputed if the tuple doesn't need to be toasted at all, I think that toasting is expensive enough so that recomputing those values is hardly noticed. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck <JanWieck@Yahoo.com> writes: > On 11/20/2005 11:23 AM, Tom Lane wrote: > Assuming that the saved header values don't need to be recomputed if the > tuple doesn't need to be toasted at all, I think that toasting is > expensive enough so that recomputing those values is hardly noticed. Yeah, probably so. I'll just make the localized API change then. One side effect of changing it as I suggest is that when control comes back out of heap_insert/update, the caller will still have the originally passed tuple and not the toasted version that actually went to disk. AFAICS this is OK (of course we have to make sure t_self and other header fields are updated in both copies). The main impact of the change is that FormIndexDatum will receive the pre-toasting version of the tuple, but that is actually a *good* thing --- right now, it just has to fetch back untoasted fields anyway, if there's anything it needs to do with them. regards, tom lane