Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
Date
Msg-id CA+TgmoZDfwkwa+HfheHhDZWU2jH6E-__BffjH_OdXXCGsMwLNA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] modeling parallel contention (was: Parallel Appendimplementation)  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] modeling parallel contention (was: Parallel Appendimplementation)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, May 4, 2017 at 9:37 PM, Andres Freund <andres@anarazel.de> wrote:
> Have those benchmarks, even in a very informal form, been shared /
> collected / referenced centrally?  I'd be very interested to know where
> the different contention points are. Possibilities:
>
> - in non-resident workloads: too much concurrent IOs submitted, leading
>   to overly much random IO
> - internal contention in the the parallel node, e.g. parallel seqscan
> - contention on PG componenents like buffer mapping, procarray, clog
> - contention on individual buffers, e.g. btree root pages, or small
>   tables in nestloop joins
> - just too many context switches, due to ineffective parallelization
>
> probably multiple of those are a problem, but without trying to figure
> them out, we're going to have a hard time to develop a better costing
> model...

It's pretty easy (but IMHO not very interesting) to measure internal
contention in the Parallel Seq Scan node.  As David points out
downthread, that problem isn't trivial to fix, but it's not that hard,
either.  I do believe that there is a problem with too much concurrent
I/O on things like:

Gather
-> Parallel Seq Scan on lineitem
-> Hash Join  -> Seq Scan on lineitem

If that goes to multiple batches, you're probably wrenching the disk
head all over the place - multiple processes are now reading and
writing batch files at exactly the same time.  I also strongly suspect
that contention on individual buffers can turn into a problem on
queries like this:

Gather (Merge)
-> Merge Join -> Parallel Index Scan -> Index Scan

The parallel index scan surely has some upper limit on concurrency,
but if that is exceeded, what will tend to happen is that processes
will sleep.  On the inner side, though, every process is doing a full
index scan and chances are good that they are doing it more or less in
lock step, hitting the same buffers at the same time.

Also consider:

Finalize HashAggregate
-> Gather -> Partial HashAggregate   -> Parallel Seq Scan

Suppose that the average group contains ten items which will tend to
be widely spaced across the table.  As you add workers, the number of
workers across which any given group gets spread increases.  There's
probably a "sweet spot" here.  Up to a certain point, adding workers
makes it faster, because the work of the Seq Scan and the Partial
HashAggregate gets spread across more processes.  However, adding
workers also increases the number of rows that pass through the Gather
node, because as you add workers more groups end up being split across
workers, or across more workers.  That means more and more of the
aggregation starts happening in the Finalize HashAggregate rather than
the Partial HashAggregate.  If you had for example 20 workers here
almost nothing would be happening in the Partial HashAggregate,
because chances are good that each of the 10 rows in each group would
be encountered by a different worker, so that'd probably be
counterproductive.

I think there are two separate questions here:

1. How do we reduce or eliminate contention during parallel query execution?
2. How do we make the query planner smarter about picking the optimal
number of workers?

I think the second problem is both more difficult and more
interesting.  I think that no matter how much work we do on #1, there
are always going to be cases where the amount of effective parallelism
has some finite limit - and I think the limit will vary substantially
from query to query.  So, without #2, we'll either leave a lot of
money on the table for queries that can benefit from using a large
number of workers, or we'll throw extra workers uselessly at queries
where they don't help (or even make things worse).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Rod Taylor
Date:
Subject: Re: [HACKERS] Row Level Security UPDATE Confusion
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)