Home > mailing lists

Re: [HACKERS] parallel.c oblivion of worker-startup failures - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date	January 25, 2018 05:37:57
Msg-id	CA+TgmoYHWmC_pv8RkGbjJ-E-JoNL=Ts34vnfAyOVHcdaTAXM7g@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] parallel.c oblivion of worker-startup failures (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: [HACKERS] parallel.c oblivion of worker-startup failures
List	pgsql-hackers

Tree view

On Wed, Jan 24, 2018 at 5:52 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> If we made the Gather node wait only for workers that actually reached
>> the Gather -- either by using a Barrier or by some other technique --
>> then this would be a lot less fragile.  And the same kind of technique
>> would work for parallel CREATE INDEX.
>
> The use of a barrier has problems of its own [1], though, of which one
> is unique to the parallel_leader_participation=off case. I thought
> that you yourself agreed with this [2] -- do you?
>
> Another argument against using a barrier in this way is that it seems
> like way too much mechanism to solve a simple problem. Why should a
> client of parallel.h not be able to rely on nworkers_launched (perhaps
> only after "verifying it can be trusted")? That seem like a pretty
> reasonable requirement for clients to have for any framework for
> parallel imperative programming.

Well, I've been resisting that approach from the very beginning of
parallel query.  Eventually, I hope that we're going to go in the
direction of changing our mind about how many workers parallel
operations use "on the fly".  For example, if there are 8 parallel
workers available and 4 of them are in use, and you start a query (or
index build) that wants 6 but only gets 4, it would be nice if the
other 2 could join later after the other operation finishes and frees
some up.  That, of course, won't work very well if parallel operations
are coded in such a way that the number of workers must be nailed down
at the very beginning.

Now maybe all that seems like pie in the sky, and perhaps it is, but I
hold out hope.  For queries, there is another consideration, which is
that some queries may run with parallelism but actually finish quite
quickly - it's not desirable to make the leader wait for workers to
start when it could be busy computing.  That's a lesser consideration
for bulk operations like parallel CREATE INDEX, but even there I don't
think it's totally negligible.

For both reasons, it's much better, or so it seems to me, if parallel
operations are coded to work with the number of workers that show up,
rather than being inflexibly tied to a particular worker count.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Tom Lane
Date: 25 January 2018, 05:35:00
Subject: Re: reducing isolation tests runtime

From: Tom Lane
Date: 25 January 2018, 05:46:59
Subject: Re: reducing isolation tests runtime

Re: [HACKERS] parallel.c oblivion of worker-startup failures - Mailing list pgsql-hackers

Previous

Next