Re: BUG #17798: Incorrect memory access occurs when using BEFORE ROW UPDATE trigger - Mailing list pgsql-bugs
From | Robert Haas |
---|---|
Subject | Re: BUG #17798: Incorrect memory access occurs when using BEFORE ROW UPDATE trigger |
Date | |
Msg-id | CA+TgmoYkOp=jDg15ugJDwxqoA6drAPBZmOJkhZJ11HZWhEYx6w@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #17798: Incorrect memory access occurs when using BEFORE ROW UPDATE trigger (Alexander Lakhin <exclusion@gmail.com>) |
Responses |
Re: BUG #17798: Incorrect memory access occurs when using BEFORE ROW UPDATE trigger
|
List | pgsql-bugs |
On Wed, Jan 10, 2024 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > As far as I can see, oldslot and newslot hold different pointer values; > e. g., with debug logging added (see attached (please forgive me dirty > hacks inside)), I get: > LOG: ExecBRUpdateTriggers after ExecGetUpdateNewTuple() 1; oldslot: 0x758e3c8, slot->tts_tupleDescriptor: 0x7362fc8, > slot->buffer: 1810, slot->buffer->refcount: 1, slot->tts_values: 0x758e438, i: 1, slot->tts_values[ii]: 0x9982c74 > STATEMENT: UPDATE bruttest SET cnt = cnt + 1; > LOG: ExecBRUpdateTriggers after ExecGetUpdateNewTuple() 1; newslot: 0x747fb48, slot->tts_tupleDescriptor: 0x7362fc8, > slot->buffer: 0, slot->buffer->refcount: 0, slot->tts_values: 0x747fbb8, i: 1, slot->tts_values[ii]: 0x9982c74 > STATEMENT: UPDATE bruttest SET cnt = cnt + 1; > ==00:00:00:05.052 3934530== Invalid read of size 1 > ==00:00:00:05.052 3934530== at 0x1EECAC: heap_compute_data_size (heaptuple.c:244) > ... > ==00:00:00:05.052 3934530== Address 0x9982c74 is in a rw- anonymous segment > ==00:00:00:05.052 3934530== > (when executing `make check` for the isolation test [1] under Valgrind) > > But as we can see, both slots use the same buffer (have pointers to the > same attribute values) as a result of the following operations: > EEOP_SCAN_FETCHSOME: > slot_getsomeattrs(scanslot, op->d.fetch.last_var); > ... > EEOP_ASSIGN_SCAN_VAR: > resultslot->tts_values[resultnum] = scanslot->tts_values[attnum] > (More details upthread in [2]) > > In the meantime, newslot doesn't hold a pin at all. OK, thanks for the further explanation. I think I'm starting to understand what's going on here. Maybe you already understand all of this, but in my own words: ExecGetUpdateNewTuple() calls ExecProject() which documents that the results are short-lived and you should materialize the slot if that's a problem. ExecModifyTable() does slot = ExecGetUpdateNewTuple(...), does not materialize it, and then passes that slot as the penultimate argument to ExecUpdate(), which calls ExecUpdatePrologue(), which calls ExecBRUpdateTriggers(), and the value returned by ExecModifyTable()'s call to ExecGetUpdateNewTuple() is now stored in newslot, but it still hasn't been materialized. So any operation that's performed on oldslot at this point invalidates newslot. But ExecBRUpdateTriggers() now calls GetTupleForTrigger() to clobber oldslot, so now newslot is also junk. That's technically OK, because we're not going to access the contents of newslot any more. We're instead going to refresh them: if (epqslot_candidate != NULL) { TupleTableSlot *epqslot_clean; epqslot_clean = ExecGetUpdateNewTuple(relinfo, epqslot_candidate, oldslot); if (newslot != epqslot_clean) ExecCopySlot(newslot, epqslot_clean); } Since the slot returned by this call to ExecGetUpdateNewTuple() is a function of the same projection info as when it was called from ExecModifyTable(), the slot returned here must be the same slot that was returned there, namely newslot. So, as you say, newslot == epqslot_clean, and we've just clobbered it with updated values. Therefore we're back in the situation where oldslot is fully valid and newslot is only valid until somebody does something to oldslot. Thus, the portion of your patch that simplifies this code doesn't fix any bug, but rather just simplifies some unnecessarily complex logic. But the next thing that happens in the existing code is: trigtuple = ExecFetchSlotHeapTuple(oldslot, true, &should_free_trig); That is going to again invalidate newslot, because it can materialize oldslot. That's a problem if we ever use newslot after this point, and we do, unless we materialize newslot first, so your patch adds ExecMaterializeSlot(newslot) right before this, fixing the problem. What I don't really understand is why failure to do this causes an invalid memory access in ExecFetchSlotHeapTuple(). According to the above analysis, the invalid memory access ought to occur the first time something touches newslot after this call to ExecFetchSlotHeapTuple(), but if I understand your valgrind trace correctly, it's showing that ExecFetchSlotHeapTuple() is touching something it shouldn't. Do you understand what's happening here? I think the question we should be asking ourselves here is whether we're really OK with having such a vast distance between the point where newslot is first populated and where materialization happens. It seems like playing with fire to populate a slot with ephemeral data in ExecModifyTable() and then press the materialize button four stack frames down. There's a lot of opportunity for things to go wrong there. On the other hand, this is very performance-sensitive code, so maybe materializing here is the only workable solution. If that's the case, though, we should probably add comments making the lifetime of various slots clearer, because neither Tom nor I was able to understand what the heck was happening here on first reading, which is probably not a good sign. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-bugs by date: