Re: pg_terminate_backend() issues - Mailing list pgsql-hackers

From Tom Lane
Subject Re: pg_terminate_backend() issues
Date
Msg-id 10589.1208364398@sss.pgh.pa.us
Whole thread Raw
In response to Re: pg_terminate_backend() issues  (Magnus Hagander <magnus@hagander.net>)
Responses Re: pg_terminate_backend() issues  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
Magnus Hagander <magnus@hagander.net> writes:
> Tom Lane wrote:
>> I'm willing to enable a SIGTERM-based pg_terminate_backend for 8.4
>> if there is some reasonable amount of testing done during this
>> development cycle to try to expose any problems.

> If someone can come up with an automated script to do this kind of
> testing, I can commit a VM or three to running this 24/7 for a month,
> easily... But I don't trust myself in coming up with a test-case that's
> good enough :-P

The closest thing I can think of to an automated test is to run repeated
sets of the parallel regression tests, and each time SIGTERM a randomly
chosen backend at a randomly chosen time.  Then see if anything "funny"
happens.  The hard part here is distinguishing expected from unexpected
regression outputs, especially in view of the fact that some of the
tests depend on database contents set up by earlier tests.

I'm thinking that you could automatically discard the regression diff
for the specific test that got SIGTERM'd, as long as it looked like
the normal output up to the point where the "terminated by
administrator" error appears.  Then what you'd have is the potential for
downstream failures due to things not being created, which *should* fall
into a fairly stylized set of possible diffs.  So get the script to
throw away any diffs that exactly match ones seen previously.  Run it
for awhile, and then hand-validate the set of diffs that it's saved
... or if any of 'em look funny, report.

One gotcha I can think of is that killing the prepared_xacts test
can leave you with open 2PC transactions, which will interfere with
starting the next cycle of the tests (you have to kill them before you
can dropdb).  But you could add a "rollback prepared" to the driver
script to clean out any uncommitted prepared xact.

Whether this is workable or not depends on the size of the set of
"expected" downstream-failure diffs.  My gut feeling from many years of
watching regression test crashes is that it'd be large but not
completely impractical to look through by hand.

I haven't time to write something like that myself, but offhand it seems
like it could be done without more than a day or so's work, especially
if you start from the buildfarm infrastructure.

BTW, don't forget to include autovac workers in the set of SIGTERM
target candidates.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Lessons from commit fest
Next
From: Bruce Momjian
Date:
Subject: Re: pg_terminate_backend() issues