Re: [HACKERS] Instability in select_parallel regression test - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Instability in select_parallel regression test
Date
Msg-id 19120.1487346335@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Instability in select_parallel regression test  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] Instability in select_parallel regression test  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Amit Kapila <amit.kapila16@gmail.com> writes:
> On Fri, Feb 17, 2017 at 11:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In short, it looks to me like ExecShutdownGatherWorkers doesn't actually
>> wait for parallel workers to finish (as its comment suggests is
>> necessary), so that on not-too-speedy machines the worker slots may all
>> still be in use when the next command wants some.

> ExecShutdownGatherWorkers() do wait for workers to exit/finish, but it
> doesn't wait for the postmaster to free the used slots and that is how
> that API is supposed to work.  There is good chance that on slow
> machines the slots get freed up much later by postmaster after the
> workers have exited.

That seems like a seriously broken design to me, first because it can make
for a significant delay in the slots becoming available (which is what's
evidently causing these regression failures), and second because it's
simply bad design to load extra responsibilities onto the postmaster.
Especially ones that involve touching shared memory.

I think this needs to be changed, and promptly.  Why in the world don't
you simply have the workers clearing their slots when they exit?
We don't have an expectation that regular backends are incompetent to
clean up after themselves.  (Obviously, a crash exit is a different
case.)

> I think what we need to do
> here is to move the test that needs workers to execute before other
> parallel query tests where there is no such requirement.

That's not fixing the problem, it's merely averting your eyes from
the symptom.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Keith Fiske
Date:
Subject: Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Sum aggregate calculation for single precsion real