Home > mailing lists

Re: ExecGather() + nworkers - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: ExecGather() + nworkers
Date	January 11, 2016 00:44:51
Msg-id	CAM3SWZRs1mTvrKkAsY1XBShGZXkD6-HNxX3gq7x-p-dz0ZtkMg@mail.gmail.com Whole thread Raw
In response to	Re: ExecGather() + nworkers (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: ExecGather() + nworkers (Peter Geoghegan <pg@heroku.com>) Re: ExecGather() + nworkers (Robert Haas <robertmhaas@gmail.com>) Re: ExecGather() + nworkers (Amit Kapila <amit.kapila16@gmail.com>) Re: ExecGather() + nworkers (Amit Kapila <amit.kapila16@gmail.com>)
List	pgsql-hackers

Tree view

On Sun, Jan 10, 2016 at 9:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I'm not sure why the test for nworkers following the
>> LaunchParallelWorkers() call doesn't look like this, though:
>>
>>     /* Set up tuple queue readers to read the results. */
>>     if (pcxt->nworkers_launched > 0)
>>     {
>>         ...
>>     }
>
> Hmm, yeah, I guess it could do that.

That would make it clearer as an example.

>> But going to this additional trouble (detecting no workers launched on
>> the basis of !nworkers_launched) suggests that simply testing
>> nworkers_launched would be wrong, which AFAICT it isn't. Can't we just
>> do that, and in so doing also totally remove the "for" loop shown
>> here?
>
> I don't see how the for loop goes away.

I meant that some code in the "for" loop goes away. Not all of it.
Just the more obscure code. As I said, I'm mostly pointing this out
out of concern for making it clearer as example code.

>> In the case of parallel sequential scan, it looks like one worker can
>> be helpful, because then the gather node (leader process) can run the
>> plan itself to some degree, and so there are effectively 2 processes
>> scanning at a minimum (unless 0 workers could be allocated to begin
>> with). How useful is it to have a parallel scan when this happens,
>> though?
>
> Empirically, that's really quite useful.  When you have 3 or 4
> workers, the leader really doesn't make a significant contribution to
> the work, but what I've seen in my testing is that 1 worker often runs
> almost twice as fast as 0 workers.

I suppose that makes sense, given that parallel sequential scan works
best when most tuples are eliminated in workers; there ought not to be
many tuples filling the single worker's queue anyway.

> I don't really understand why this should be so.  I thought the idea
> of parallel sort is (roughly) that each worker should read data until
> it fills work_mem, sort that data, and write a tape.  Repeat until no
> data remains.  Then, merge the tapes.  I don't see any reason at all
> why this shouldn't work just fine with a leader and 1 worker.

It will work fine with a leader and 1 worker -- the code will be
correct, and without any special cases. But it will be a suboptimal
use of resources. From the caller's point of view, there is no reason
to think it will be faster, and some reason to think it will be
slower. A particular concern for parallel sort is that the sort might
not use enough memory to need to be an external sort, but you
effectively force it to be one by making it a parallel sort (that is
not ideal in the long term, but it's a good compromise for 9.6's
parallel index build stuff). You're also consuming a
BackgroundWorkerSlot for the duration of the sort, in an environment
where evidently those are in short supply.

Now, you might wonder why it is that the leader cannot also sort runs,
just as a worker would. It's possible, but it isn't exactly
straightforward. You have to have special cases in several places,
even though it probably is going to be uncommon to only have one
BackgroundWorkerSlot available in practice. It's simpler to just
opt-out, and seems better given that max_parallel_degree is a way of
resource limiting based on available cores (it's certainly not about
the availability of shared memory for the BackgroundWorkerSlot array).

More importantly, I have other, entirely general concerns. Other major
RDBMSs have settings that are very similar to max_parallel_degree,
with a setting of 1 effectively disabling all parallelism. Both Oracle
and SQL Server have settings that they both call the "maximum degree
or parallelism". I think it's a bit odd that with Postgres,
max_parallel_degree = 1 can still use parallelism at all. I have to
wonder: are we conflating controlling the resources used by parallel
operations with how shared memory is doled out?

I could actually "give back" my parallel worker slots early if I
really wanted to (that would be messy, but the then-acquiesced workers
do nothing for the duration of the merge beyond conceptually owning
the shared tape temp files). I don't think releasing the slots early
makes sense, because I tend to think that hanging on to the workers
helps the DBA in managing the server's resources. The still-serial
merge phase is likely to become a big bottleneck with parallel sort.

With parallel sequential scan, a max_parallel_degree of 8 could result
in 16 processes scanning in parallel. That's a concern, and not least
because it happens only sometimes, when things are timed just right.
The fact that only half of those processes are "genuine" workers seems
to me like a distinction without a difference.

> I think that's probably over-engineered.  I mean, it wouldn't be that
> hard to have the workers just exit if you decide you don't want them,
> and I don't really want to make the signaling here more complicated
> than it really needs to be.

I worry about the additional overhead of constantly starting and
stopping a single worker in some cases (not so much with parallel
index build, but other uses of parallel sort beyond 9.6). Furthermore,
the coordination between worker and leader processes to make this
happen seems messy -- you actually have the postmaster launch
processes, but they must immediately get permission to do anything.

It wouldn't be that hard to offer a general way of doing this, so why not?
-- 
Peter Geoghegan

pgsql-hackers by date:

From: David Rowley
Date: 11 January 2016, 00:38:32
Subject: Re: WIP: bloom filter in Hash Joins with batches

From: Peter Geoghegan
Date: 11 January 2016, 00:57:49
Subject: Re: ExecGather() + nworkers

Re: ExecGather() + nworkers - Mailing list pgsql-hackers

Previous

Next