Re: ExecGather() + nworkers - Mailing list pgsql-hackers

From Robert Haas
Subject Re: ExecGather() + nworkers
Date
Msg-id CA+TgmoavVaRsaY15s-Ep+DdqBm1AjS+hK34L4qpTRAEMWNmorA@mail.gmail.com
Whole thread Raw
In response to Re: ExecGather() + nworkers  (Peter Geoghegan <pg@heroku.com>)
Responses Re: ExecGather() + nworkers
Re: ExecGather() + nworkers
List pgsql-hackers
On Sun, Jan 10, 2016 at 4:44 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> I don't really understand why this should be so.  I thought the idea
>> of parallel sort is (roughly) that each worker should read data until
>> it fills work_mem, sort that data, and write a tape.  Repeat until no
>> data remains.  Then, merge the tapes.  I don't see any reason at all
>> why this shouldn't work just fine with a leader and 1 worker.
>
> It will work fine with a leader and 1 worker -- the code will be
> correct, and without any special cases. But it will be a suboptimal
> use of resources. From the caller's point of view, there is no reason
> to think it will be faster, and some reason to think it will be
> slower. A particular concern for parallel sort is that the sort might
> not use enough memory to need to be an external sort, but you
> effectively force it to be one by making it a parallel sort (that is
> not ideal in the long term, but it's a good compromise for 9.6's
> parallel index build stuff). You're also consuming a
> BackgroundWorkerSlot for the duration of the sort, in an environment
> where evidently those are in short supply.

Well, in general, the parallel sort code doesn't really get to pick
whether or not a BackgroundWorkerSlot gets used or not.  Whoever
created the parallel context decides how many workers to request, and
then the context got as many of those as it could.  It then did
arbitrary computation, which at some point in the middle involves one
or more parallel sorts.  You can't just have one of those workers up
and exit in the middle.  Now, in the specific case of parallel index
build, you probably can do that, if you want to.  But to be honest,
I'd be inclined not to include that in the first version.  If you get
fewer workers than you asked for, just use the number you got.  Let's
see how that actually works out before we decide that we need a lot
more mechanism here.  You may find that it's surprisingly effective to
do it this way.

> Now, you might wonder why it is that the leader cannot also sort runs,
> just as a worker would. It's possible, but it isn't exactly
> straightforward. You have to have special cases in several places,
> even though it probably is going to be uncommon to only have one
> BackgroundWorkerSlot available in practice. It's simpler to just
> opt-out, and seems better given that max_parallel_degree is a way of
> resource limiting based on available cores (it's certainly not about
> the availability of shared memory for the BackgroundWorkerSlot array).

I am surprised that this is not straightforward.  I don't see why it
shouldn't be, and it worries me that you think it isn't.

> More importantly, I have other, entirely general concerns. Other major
> RDBMSs have settings that are very similar to max_parallel_degree,
> with a setting of 1 effectively disabling all parallelism. Both Oracle
> and SQL Server have settings that they both call the "maximum degree
> or parallelism". I think it's a bit odd that with Postgres,
> max_parallel_degree = 1 can still use parallelism at all. I have to
> wonder: are we conflating controlling the resources used by parallel
> operations with how shared memory is doled out?

We could redefined things so that max_parallel_degree = N means use N
- 1 workers, with a minimum value of 1 rather than 0, if there's a
consensus that that's better.  Personally, I prefer it the way we've
got it: it's real darned clear in my mind that max_parallel_degree=0
means "not parallel".  But I won't cry into my beer if a consensus
emerges that the other way would be better.

> I could actually "give back" my parallel worker slots early if I
> really wanted to (that would be messy, but the then-acquiesced workers
> do nothing for the duration of the merge beyond conceptually owning
> the shared tape temp files). I don't think releasing the slots early
> makes sense, because I tend to think that hanging on to the workers
> helps the DBA in managing the server's resources. The still-serial
> merge phase is likely to become a big bottleneck with parallel sort.

Like I say, the sort code better not know anything about this
directly, or it's going to break when embedded in a query.

> With parallel sequential scan, a max_parallel_degree of 8 could result
> in 16 processes scanning in parallel. That's a concern, and not least
> because it happens only sometimes, when things are timed just right.
> The fact that only half of those processes are "genuine" workers seems
> to me like a distinction without a difference.

This seems dead wrong.  A max_parallel_degree of 8 means you have a
leader and 8 workers.  Where are the other 7 processes coming from?
What you should have is 8 processes each of which is participating in
both the parallel seq scan and the parallel sort, not 8 processes
scanning and 8 separate processes sorting.

>> I think that's probably over-engineered.  I mean, it wouldn't be that
>> hard to have the workers just exit if you decide you don't want them,
>> and I don't really want to make the signaling here more complicated
>> than it really needs to be.
>
> I worry about the additional overhead of constantly starting and
> stopping a single worker in some cases (not so much with parallel
> index build, but other uses of parallel sort beyond 9.6). Furthermore,
> the coordination between worker and leader processes to make this
> happen seems messy -- you actually have the postmaster launch
> processes, but they must immediately get permission to do anything.
>
> It wouldn't be that hard to offer a general way of doing this, so why not?

Well, if these things become actual problems, fine, we can fix them.
But let's not decide to add the API before we're agreed that we need
it to solve an actual problem that we both agree we have.  We are not
there yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Improved tab completion for FDW DDL
Next
From: Robert Haas
Date:
Subject: Re: [COMMITTERS] pgsql: Avoid pin scan for replay of XLOG_BTREE_VACUUM