Re: FW: Intermittent Stats Failiures: firefly: HEAD - Mailing list pgsql-hackers

From Tom Lane
Subject Re: FW: Intermittent Stats Failiures: firefly: HEAD
Date
Msg-id 27826.1137011321@sss.pgh.pa.us
Whole thread Raw
In response to FW: Intermittent Stats Failiures: firefly: HEAD  ("Larry Rosenman" <lrosenman@pervasive.com>)
List pgsql-hackers
"Larry Rosenman" <lrosenman@pervasive.com> writes:
>> Ever since the stats collector changes, I've seen intermittent
>> failures on 'firefly' in the buildfarm.

Yeah, you're not the only one.  We haven't figured out what's causing
them.  But while fooling with Joachim Wieland's pg_sleep patch just
now, I was struck by an idea: on machines where select() is
interruptible by signals, it is possible that the do_sleep() function
won't wait as long as specified.  This could easily cause the observed
regression diff, if the test doesn't wait long enough for the stats
collector to update the stats.

It's not immediately obvious what signal might be arriving at the
backend, given that there's not supposed to be any other database
operations going on.  It's barely possible that a SIGUSR1 (sinval
catchup interrupt) could be generated here, if one of the previous
group of tests were still in the process of shutting down its backend.
So I'm not sure about this theory ... but at least it's a theory.

If the theory is correct then the just-committed pg_sleep patch
should provide a permanent solution.  We'll have to wait and see
if we see any more of those errors.

If we don't see any more such errors in HEAD for awhile, it might
be worth back-patching the implementation of pg_sleep into the
older branches' regression tests, so we don't keep seeing intermittent
regression failures in them either.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Overflow of bgwriter's request queue
Next
From: Robert Treat
Date:
Subject: sort operation leads planner to different number of rows?