Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests
Date
Msg-id CA+TgmoYx6mynFL5aDs7+xjZ01QrY8smp+Zr=5BxAseODZdZPWA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests
List pgsql-hackers
On Thu, Jun 15, 2017 at 5:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> I think you're right.  So here's a theory:
>
>>> 1. The ERROR mapping the DSM segment is just a case of the worker the
>>> losing a race, and isn't a bug.
>
>> I concur that this is a possibility,
>
> Actually, no, it isn't.  I tried to reproduce the problem by inserting
> a sleep into ParallelWorkerMain, and could not.  After digging around
> in the code, I realize that the leader process *can not* exit the
> parallel query before the workers start, at least not without hitting
> an error first, which is not happening in these examples.  The reason
> is that nodeGather cannot deem the query done until it's seen EOF on
> each tuple queue, which it cannot see until each worker has attached
> to and then detached from the associated shm_mq.

Oh.  That's sad.  It definitely has to wait for any tuple queues that
have been attached to be detached, but it would be better if it didn't
have to wait for processes that haven't even attached yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: [HACKERS] pg_waldump command line arguments
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] WIP: Data at rest encryption