hyrax versus isolationtester.c's hard-wired timeouts - Mailing list pgsql-hackers

From Tom Lane
Subject hyrax versus isolationtester.c's hard-wired timeouts
Date
Msg-id 22964.1575842935@sss.pgh.pa.us
Whole thread Raw
Responses Re: hyrax versus isolationtester.c's hard-wired timeouts  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Buildfarm member hyrax has been intermittently failing the
deadlock-parallel isolation test ever since that went in.
I finally got around to looking at this closely, and what
seems to be happening is simply that isolationtester.c's
hard-wired three-minute timeout for the completion of any
one test step is triggering.  hyrax uses CLOBBER_CACHE_ALWAYS
and it seems to be a little slower than other animals using
CLOBBER_CACHE_ALWAYS, so it's unsurprising that it's showing
the symptom and nobody else is.

There are two things we could do about this:

1. Knock the hard-wired setting up a tad, maybe to 5 minutes.
Easy but doesn't seem terribly future-proof.

2. Make the limit configurable somehow, probably from an
environment variable.  There's precedent for that (PGCTLTIMEOUT),
and it would provide a way for owners of especially slow buildfarm
members to adjust things ... but it would require owners of
especially slow buildfarm animals to adjust things.

Any preferences?  (Actually, it wouldn't be unreasonable to do
both things, I suppose.)

BTW, I notice that isolationtester.c fails to print any sort of warning
notice when it decides it's waited too long.  This seems like a
spectacularly bad idea in hindsight: it's not that obvious why the test
case failed.  Plus there's no way to tell exactly which connection it
decided to send a PQcancel to.  So independently of the timeout-length
issue, I think we ought to also make it print something like
"isolationtester: waited too long for something to happen, canceling
step thus-and-so".

            regards, tom lane



pgsql-hackers by date:

Previous
From: Dent John
Date:
Subject: Re: The flinfo->fn_extra question, from me this time.
Next
From: Thomas Munro
Date:
Subject: Re: logical decoding : exceeded maxAllocatedDescs for .spill files