Re: ExecGather() + nworkers - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: ExecGather() + nworkers
Date
Msg-id CAFj8pRBWFRwHi=syrqBqSb0xOhhd+Tm5q9ge2Ebk++C8P=ygSg@mail.gmail.com
Whole thread Raw
In response to Re: ExecGather() + nworkers  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


> More importantly, I have other, entirely general concerns. Other major
> RDBMSs have settings that are very similar to max_parallel_degree,
> with a setting of 1 effectively disabling all parallelism. Both Oracle
> and SQL Server have settings that they both call the "maximum degree
> or parallelism". I think it's a bit odd that with Postgres,
> max_parallel_degree = 1 can still use parallelism at all. I have to
> wonder: are we conflating controlling the resources used by parallel
> operations with how shared memory is doled out?

We could redefined things so that max_parallel_degree = N means use N
- 1 workers, with a minimum value of 1 rather than 0, if there's a
consensus that that's better.  Personally, I prefer it the way we've
got it: it's real darned clear in my mind that max_parallel_degree=0
means "not parallel".  But I won't cry into my beer if a consensus
emerges that the other way would be better.


when max_parallel_degree will be renamed to max_query_workers or some similar, then the new metric has sense. And can be more intuitive.

Regards

Pavel

 
> I could actually "give back" my parallel worker slots early if I
> really wanted to (that would be messy, but the then-acquiesced workers
> do nothing for the duration of the merge beyond conceptually owning
> the shared tape temp files). I don't think releasing the slots early
> makes sense, because I tend to think that hanging on to the workers
> helps the DBA in managing the server's resources. The still-serial
> merge phase is likely to become a big bottleneck with parallel sort.

Like I say, the sort code better not know anything about this
directly, or it's going to break when embedded in a query.

> With parallel sequential scan, a max_parallel_degree of 8 could result
> in 16 processes scanning in parallel. That's a concern, and not least
> because it happens only sometimes, when things are timed just right.
> The fact that only half of those processes are "genuine" workers seems
> to me like a distinction without a difference.

This seems dead wrong.  A max_parallel_degree of 8 means you have a
leader and 8 workers.  Where are the other 7 processes coming from?
What you should have is 8 processes each of which is participating in
both the parallel seq scan and the parallel sort, not 8 processes
scanning and 8 separate processes sorting.

>> I think that's probably over-engineered.  I mean, it wouldn't be that
>> hard to have the workers just exit if you decide you don't want them,
>> and I don't really want to make the signaling here more complicated
>> than it really needs to be.
>
> I worry about the additional overhead of constantly starting and
> stopping a single worker in some cases (not so much with parallel
> index build, but other uses of parallel sort beyond 9.6). Furthermore,
> the coordination between worker and leader processes to make this
> happen seems messy -- you actually have the postmaster launch
> processes, but they must immediately get permission to do anything.
>
> It wouldn't be that hard to offer a general way of doing this, so why not?

Well, if these things become actual problems, fine, we can fix them.
But let's not decide to add the API before we're agreed that we need
it to solve an actual problem that we both agree we have.  We are not
there yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: PATCH: add pg_current_xlog_flush_location function
Next
From: Amit Kapila
Date:
Subject: Re: checkpointer continuous flushing