Thread: Breakage in trigger.c
I can't believe that the coding at trigger.c line 2010 ff is correct: /* * Skip executing cancelled events, and events done by * transactions that are not aborted. */ if (!(event->dte_event & TRIGGER_DEFERRED_CANCELED) || (event->dte_event & TRIGGER_DEFERRED_DONE && TransactionIdIsValid(event->dte_done_xid) && !TransactionIdDidAbort(event->dte_done_xid))) { Surely the sense of this is backwards, and it should be if (!(event->dte_event & TRIGGER_DEFERRED_CANCELED) && !(event->dte_event & TRIGGER_DEFERRED_DONE && TransactionIdIsValid(event->dte_done_xid) && !TransactionIdDidAbort(event->dte_done_xid))) { AFAICT we don't actually use TRIGGER_DEFERRED_CANCELED, so the existing coding never "skips" at all here, which makes it just a performance loss rather than visible misbehavior. I'm also concerned about the fact that the per-item states have dti_done_xid values distinct from the whole-event value. It's not obvious that a rollback of the subxact that did one item implies a rollback of the subxact that last marked the event as scanned. Can anyone offer a proof that that's OK? If it is OK, is it really necessary to have per-item dti_done_xid at all? Finally, surely the "Mark the event done" case should advance prev_event? As-is the code is capable of messing up the list links. regards, tom lane
On Mon, Sep 06, 2004 at 05:55:29PM -0400, Tom Lane wrote: > I can't believe that the coding at trigger.c line 2010 ff is correct: > > /* > * Skip executing cancelled events, and events done by > * transactions that are not aborted. > */ > if (!(event->dte_event & TRIGGER_DEFERRED_CANCELED) || > (event->dte_event & TRIGGER_DEFERRED_DONE && > TransactionIdIsValid(event->dte_done_xid) && > !TransactionIdDidAbort(event->dte_done_xid))) > { > > Surely the sense of this is backwards, and it should be > > if (!(event->dte_event & TRIGGER_DEFERRED_CANCELED) && > !(event->dte_event & TRIGGER_DEFERRED_DONE && > TransactionIdIsValid(event->dte_done_xid) && > !TransactionIdDidAbort(event->dte_done_xid))) > { Hmmm ... yes, I think you are right. I remember staring at that test for a long time, decomposing it on paper, I wrote the version you propose and then revert to the other one, etc. > AFAICT we don't actually use TRIGGER_DEFERRED_CANCELED, so the existing > coding never "skips" at all here, which makes it just a performance loss > rather than visible misbehavior. I think the fact that the items are also checked means that they are not done more than once. > I'm also concerned about the fact that the per-item states have > dti_done_xid values distinct from the whole-event value. It's > not obvious that a rollback of the subxact that did one item implies > a rollback of the subxact that last marked the event as scanned. > Can anyone offer a proof that that's OK? If it is OK, is it really > necessary to have per-item dti_done_xid at all? Not sure about this one. What happened is that I saw a "done" flag in the events, another one in the item, so I added a Xid flag to each to be able to check them indepently. All in all I don't think I grok this code enough to make a judgement either way. > Finally, surely the "Mark the event done" case should advance > prev_event? As-is the code is capable of messing up the list links. Not sure. Why doesn't the other arm of that conditional need it? All in all I think this highlights a big shortcoming of this project. If there was much more extensive regression testing, this code could be refactored with much more confidence, and this kind of bugs wouldn't be there on the first place. Sadly, it's too hard to do unit testing in this kind of code, which makes refactoring dangerous and costly. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Management by consensus: I have decided; you concede. (Leonard Liu)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Mon, Sep 06, 2004 at 05:55:29PM -0400, Tom Lane wrote: >> Finally, surely the "Mark the event done" case should advance >> prev_event? As-is the code is capable of messing up the list links. > Not sure. Why doesn't the other arm of that conditional need it? In that arm the event is being deleted from the list. prev_event should always point to the last nondeleted event. (BTW, seems this code is missing a bet. If the same subtransaction both adds and fires the event, then couldn't we delete it immediately? That would extend the immediately-deletable case to many more situations.) > All in all I think this highlights a big shortcoming of this project. > If there was much more extensive regression testing, this code could be > refactored with much more confidence, and this kind of bugs wouldn't be > there on the first place. Regression testing is unlikely to expose small performance bugs, which is what the first one is. The third one is only really going to come to light if you test a case where a single query inside a subtransaction adds both deletable and nondeletable entries (and does so in the right order). Again I'm unconvinced that regression tests could be expected to catch this; we may have tests that exercise such cases, but they aren't doing it inside a subtransaction --- and only someone who was already aware of the potential for a bug in this spot would be likely to think to try it. In short I have little confidence in "more tests!" as a panacea for anything. regards, tom lane
On Mon, 6 Sep 2004, Tom Lane wrote: > Surely the sense of this is backwards, and it should be > > if (!(event->dte_event & TRIGGER_DEFERRED_CANCELED) && > !(event->dte_event & TRIGGER_DEFERRED_DONE && > TransactionIdIsValid(event->dte_done_xid) && > !TransactionIdDidAbort(event->dte_done_xid))) > { > I'm also concerned about the fact that the per-item states have > dti_done_xid values distinct from the whole-event value. It's > not obvious that a rollback of the subxact that did one item implies > a rollback of the subxact that last marked the event as scanned. > Can anyone offer a proof that that's OK? If it is OK, is it really > necessary to have per-item dti_done_xid at all? I don't think that case can occur. The transaction marking the event will have seen one of the following states I believe: a) All items were marked by this transactionb) Some items were already marked by a parent transactionc) Some items were alreadymarked by a previous committed subtransaction In the first case, it's marked them so it's okay. In the second and third, I think the only way for the the item marking subtransaction to abort after marking the event would involve aborting a common parent transaction which would abort both. I think the per-item one is necessary for SET CONSTRAINTS (some of the deferred actions on a particular event may have been done as per b or c above). > Finally, surely the "Mark the event done" case should advance > prev_event? As-is the code is capable of messing up the list links. As something for the future, it looks to me like subtransactions won't delink items ever right now, where I think it'd be safe to do so for items generated from the same subtransaction but I haven't looked to see if we're keeping that info.