Re: [HACKERS] parallel.c oblivion of worker-startup failures - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date
Msg-id CA+TgmoYaqPQ5Uk5jdNGBdqeZHjMHw1TKEbGQLgOOfJuEV9ZFtQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] parallel.c oblivion of worker-startup failures  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] parallel.c oblivion of worker-startup failures  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, Dec 19, 2017 at 5:01 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I think it would have been much easier to fix this problem if we would
> have some way to differentiate whether the worker has stopped
> gracefully or not.  Do you think it makes sense to introduce such a
> state in the background worker machinery?

I think it makes a lot more sense to just throw an ERROR when the
worker doesn't shut down cleanly, which is currently what happens in
nearly all cases.  It only fails to happen for fork() failure and
other errors that happen very early in startup.  I don't think there's
any point in trying to make this code more complicated to cater to
such cases.  If fork() is failing, the fact that parallel query is
erroring out rather than running with fewer workers is likely to be a
good thing.  Your principle concern in that situation is probably
whether your attempt to log into the machine and kill some processes
is also going to die with 'fork failure', and having PostgreSQL
consume every available process slot is not going to make that easier.
On the other hand, if workers are failing so early in startup that
they never attach to the error queue, then they're probably all
failing the same way and trying to cope with that problem in any way
other than throwing an error is going to result in parallelism being
silently disabled with no notification to the user, which doesn't seem
good to me either.

So basically I think it's right to treat these as error conditions,
not try to continue the work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: access/parallel.h lacks PGDLLIMPORT
Next
From: Robert Haas
Date:
Subject: Re: Top-N sorts verses parallelism