Re: Some other odd buildfarm failures - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Some other odd buildfarm failures
Date
Msg-id 31271.1419607920@sss.pgh.pa.us
Whole thread Raw
In response to Re: Some other odd buildfarm failures  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Some other odd buildfarm failures  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> Still, I don't think this is a reasonable test design.  We have
>> absolutely no idea what behaviors are being triggered in the other
>> tests, except that they are unrelated to what those tests think they
>> are testing.

> I can of course move it to a separate parallel test, but I don't think
> that should be really necessary.

I've not proven this rigorously, but it seems obvious in hindsight:
what's happening is that when the object_address test drops everything
with DROP CASCADE, other processes are sometimes just starting to execute
the event trigger when the DROP commits.  When they go to look up the
trigger function, they don't find it, leading to "cache lookup failed for
function".  The fact that the complained-of OID is slightly variable, but
always in the range of OIDs that would be getting assigned around this
point in a "make check" run, buttresses the theory.

I thought about changing the object_address test so that it explicitly
drops the event trigger first.  But that would not be a fix, it would
just make the timing harder to hit (ie, a victim process would need to
lose control for longer at the critical point).

Since I remain of the opinion that a test called object_address has no
damn business causing global side-effects, I think there are two
reasonable fixes:

1. Remove the event trigger.  This would slightly reduce the test's
coverage.

2. Run that whole test as a single transaction, so that the event trigger
is created and dropped in one transaction and is never seen as valid by
any concurrent test.

A long-term idea is to try to fix things so that there's sufficient
locking to make dropping an event trigger and immediately dropping its
trigger function safe.  But I'm not sure that's either possible or a good
idea (the lock obtained by DROP would bring the entire database to a
standstill ...).
        regards, tom lane



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: BUG #12330: ACID is broken for unique constraints
Next
From: Marko Tiikkaja
Date:
Subject: Re: BUG #12330: ACID is broken for unique constraints