Thread: Breakage in trigger.c

Breakage in trigger.c

From
Tom Lane
Date:
I can't believe that the coding at trigger.c line 2010 ff is correct:
       /*        * Skip executing cancelled events, and events done by        * transactions that are not aborted.
 */       if (!(event->dte_event & TRIGGER_DEFERRED_CANCELED) ||           (event->dte_event & TRIGGER_DEFERRED_DONE &&
          TransactionIdIsValid(event->dte_done_xid) &&            !TransactionIdDidAbort(event->dte_done_xid)))
{

Surely the sense of this is backwards, and it should be
       if (!(event->dte_event & TRIGGER_DEFERRED_CANCELED) &&           !(event->dte_event & TRIGGER_DEFERRED_DONE &&
         TransactionIdIsValid(event->dte_done_xid) &&             !TransactionIdDidAbort(event->dte_done_xid)))
{

AFAICT we don't actually use TRIGGER_DEFERRED_CANCELED, so the existing
coding never "skips" at all here, which makes it just a performance loss
rather than visible misbehavior.

I'm also concerned about the fact that the per-item states have
dti_done_xid values distinct from the whole-event value.  It's
not obvious that a rollback of the subxact that did one item implies
a rollback of the subxact that last marked the event as scanned.
Can anyone offer a proof that that's OK?  If it is OK, is it really
necessary to have per-item dti_done_xid at all?

Finally, surely the "Mark the event done" case should advance
prev_event?  As-is the code is capable of messing up the list links.
        regards, tom lane


Re: Breakage in trigger.c

From
Alvaro Herrera
Date:
On Mon, Sep 06, 2004 at 05:55:29PM -0400, Tom Lane wrote:
> I can't believe that the coding at trigger.c line 2010 ff is correct:
> 
>         /*
>          * Skip executing cancelled events, and events done by
>          * transactions that are not aborted.
>          */
>         if (!(event->dte_event & TRIGGER_DEFERRED_CANCELED) ||
>             (event->dte_event & TRIGGER_DEFERRED_DONE &&
>              TransactionIdIsValid(event->dte_done_xid) &&
>              !TransactionIdDidAbort(event->dte_done_xid)))
>         {
> 
> Surely the sense of this is backwards, and it should be
> 
>         if (!(event->dte_event & TRIGGER_DEFERRED_CANCELED) &&
>             !(event->dte_event & TRIGGER_DEFERRED_DONE &&
>               TransactionIdIsValid(event->dte_done_xid) &&
>               !TransactionIdDidAbort(event->dte_done_xid)))
>         {

Hmmm ... yes, I think you are right.  I remember staring at that test
for a long time, decomposing it on paper, I wrote the version you
propose and then revert to the other one, etc.


> AFAICT we don't actually use TRIGGER_DEFERRED_CANCELED, so the existing
> coding never "skips" at all here, which makes it just a performance loss
> rather than visible misbehavior.

I think the fact that the items are also checked means that they are not
done more than once.

> I'm also concerned about the fact that the per-item states have
> dti_done_xid values distinct from the whole-event value.  It's
> not obvious that a rollback of the subxact that did one item implies
> a rollback of the subxact that last marked the event as scanned.
> Can anyone offer a proof that that's OK?  If it is OK, is it really
> necessary to have per-item dti_done_xid at all?

Not sure about this one.  What happened is that I saw a "done" flag in
the events, another one in the item, so I added a Xid flag to each to be
able to check them indepently.  All in all I don't think I grok this
code enough to make a judgement either way.

> Finally, surely the "Mark the event done" case should advance
> prev_event?  As-is the code is capable of messing up the list links.

Not sure.  Why doesn't the other arm of that conditional need it?


All in all I think this highlights a big shortcoming of this project.
If there was much more extensive regression testing, this code could be
refactored with much more confidence, and this kind of bugs wouldn't be
there on the first place.  Sadly, it's too hard to do unit testing in
this kind of code, which makes refactoring dangerous and costly.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Management by consensus: I have decided; you concede.
(Leonard Liu)



Re: Breakage in trigger.c

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> On Mon, Sep 06, 2004 at 05:55:29PM -0400, Tom Lane wrote:
>> Finally, surely the "Mark the event done" case should advance
>> prev_event?  As-is the code is capable of messing up the list links.

> Not sure.  Why doesn't the other arm of that conditional need it?

In that arm the event is being deleted from the list.  prev_event should
always point to the last nondeleted event.

(BTW, seems this code is missing a bet. If the same subtransaction both
adds and fires the event, then couldn't we delete it immediately?  That
would extend the immediately-deletable case to many more situations.)

> All in all I think this highlights a big shortcoming of this project.
> If there was much more extensive regression testing, this code could be
> refactored with much more confidence, and this kind of bugs wouldn't be
> there on the first place.

Regression testing is unlikely to expose small performance bugs, which
is what the first one is.  The third one is only really going to come to
light if you test a case where a single query inside a subtransaction
adds both deletable and nondeletable entries (and does so in the right
order).  Again I'm unconvinced that regression tests could be expected
to catch this; we may have tests that exercise such cases, but they aren't
doing it inside a subtransaction --- and only someone who was already
aware of the potential for a bug in this spot would be likely to think
to try it.

In short I have little confidence in "more tests!" as a panacea for
anything.
        regards, tom lane


Re: Breakage in trigger.c

From
Stephan Szabo
Date:
On Mon, 6 Sep 2004, Tom Lane wrote:

> Surely the sense of this is backwards, and it should be
>
>         if (!(event->dte_event & TRIGGER_DEFERRED_CANCELED) &&
>             !(event->dte_event & TRIGGER_DEFERRED_DONE &&
>               TransactionIdIsValid(event->dte_done_xid) &&
>               !TransactionIdDidAbort(event->dte_done_xid)))
>         {

> I'm also concerned about the fact that the per-item states have
> dti_done_xid values distinct from the whole-event value.  It's
> not obvious that a rollback of the subxact that did one item implies
> a rollback of the subxact that last marked the event as scanned.
> Can anyone offer a proof that that's OK?  If it is OK, is it really
> necessary to have per-item dti_done_xid at all?

I don't think that case can occur.

The transaction marking the event will have seen one of the following
states I believe:
a) All items were marked by this transactionb) Some items were already marked by a parent transactionc) Some items were
alreadymarked by a previous committed subtransaction
 

In the first case, it's marked them so it's okay. In the second and third,
I think the only way for the the item marking subtransaction to abort
after marking the event would involve aborting a common parent transaction
which would abort both.

I think the per-item one is necessary for SET CONSTRAINTS (some of the
deferred actions on a particular event may have been done as per b or c
above).

> Finally, surely the "Mark the event done" case should advance
> prev_event?  As-is the code is capable of messing up the list links.

As something for the future, it looks to me like subtransactions won't
delink items ever right now, where I think it'd be safe to do so for items
generated from the same subtransaction but I haven't looked to see if
we're keeping that info.