Re: [HACKERS] parallel.c oblivion of worker-startup failures - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date
Msg-id CA+TgmoYHWmC_pv8RkGbjJ-E-JoNL=Ts34vnfAyOVHcdaTAXM7g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] parallel.c oblivion of worker-startup failures  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] parallel.c oblivion of worker-startup failures
List pgsql-hackers
On Wed, Jan 24, 2018 at 5:52 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> If we made the Gather node wait only for workers that actually reached
>> the Gather -- either by using a Barrier or by some other technique --
>> then this would be a lot less fragile.  And the same kind of technique
>> would work for parallel CREATE INDEX.
>
> The use of a barrier has problems of its own [1], though, of which one
> is unique to the parallel_leader_participation=off case. I thought
> that you yourself agreed with this [2] -- do you?
>
> Another argument against using a barrier in this way is that it seems
> like way too much mechanism to solve a simple problem. Why should a
> client of parallel.h not be able to rely on nworkers_launched (perhaps
> only after "verifying it can be trusted")? That seem like a pretty
> reasonable requirement for clients to have for any framework for
> parallel imperative programming.

Well, I've been resisting that approach from the very beginning of
parallel query.  Eventually, I hope that we're going to go in the
direction of changing our mind about how many workers parallel
operations use "on the fly".  For example, if there are 8 parallel
workers available and 4 of them are in use, and you start a query (or
index build) that wants 6 but only gets 4, it would be nice if the
other 2 could join later after the other operation finishes and frees
some up.  That, of course, won't work very well if parallel operations
are coded in such a way that the number of workers must be nailed down
at the very beginning.

Now maybe all that seems like pie in the sky, and perhaps it is, but I
hold out hope.  For queries, there is another consideration, which is
that some queries may run with parallelism but actually finish quite
quickly - it's not desirable to make the leader wait for workers to
start when it could be busy computing.  That's a lesser consideration
for bulk operations like parallel CREATE INDEX, but even there I don't
think it's totally negligible.

For both reasons, it's much better, or so it seems to me, if parallel
operations are coded to work with the number of workers that show up,
rather than being inflexibly tied to a particular worker count.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: reducing isolation tests runtime
Next
From: Tom Lane
Date:
Subject: Re: reducing isolation tests runtime