Home > mailing lists

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests
Date	June 15, 2017 20:12:35
Msg-id	CA+TgmoYxQCzTVgKXrVYyobJNnrA=0gzN1rKh4uzdDapVTKAxpA@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests (Amit Kapila <amit.kapila16@gmail.com>) Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Thu, Jun 15, 2017 at 10:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, as Amit points out, there are entirely legitimate ways for that
>> to happen.  If the leader finishes the whole query itself before the
>> worker reaches the dsm_attach() call, it will call dsm_detach(),
>> destroying the segment, and the worker will hit this ERROR.  That
>> shouldn't happen very often in the real world, because we ought not to
>> select a parallel plan in the first place unless the query is going to
>> take a while to run, but the select_parallel test quite deliberately
>> disarms all of the guards that would tend to discourage such plans.
>
> But we know, from the subsequent failed assertion, that the leader was
> still trying to launch parallel workers.  So that particular theory
> doesn't hold water.

Is there any chance that it's already trying to launch parallel
workers for the *next* query?

>> Of course, as Amit also points out, it could also be the result of
>> some bug, but I'm not sure we have any reason to think so.
>
> The fact that we've only seen this on cygwin leads the mind in the
> direction of platform-specific problems.  Both this case and lorikeet's
> earlier symptoms could be explained if the parameters passed from leader
> to workers somehow got corrupted occasionally; so that's what I've been
> thinking about, but I'm not seeing anything.

Could be -- but it could also be timing-related.  If we are in fact
using cygwin's fork emulation, the documentation for it explains that
it's slow: https://www.cygwin.com/faq.html#faq.api.fork

Interestingly, it also mentions that making it work requires
suspending the parent while the child is starting up, which probably
does not happen on any other platform.  Of course it also makes my
theory that the child doesn't reach dsm_attach() before the parent
finishes the query pretty unlikely.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Tom Lane
Date: 15 June 2017, 20:05:56
Subject: Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

From: "Daniel Verite"
Date: 15 June 2017, 20:13:16
Subject: Re: [HACKERS] Disallowing multiple queries per PQexec()

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests - Mailing list pgsql-hackers

Previous

Next