Re: [HACKERS] modeling parallel contention (was: Parallel Appendimplementation) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] modeling parallel contention (was: Parallel Appendimplementation)
Date
Msg-id 20170505023646.3uhnmf2hbwtm63lc@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
List pgsql-hackers
Hi,

On 2017-05-05 14:20:48 +1200, David Rowley wrote:
> Yeah, I did get some time to look over the contention in Parallel Seq
> Scan a while back and I discovered that on the machine that I was
> testing on. the lock obtained in heap_parallelscan_nextpage() was
> causing workers to have to wait for other workers to fetch their next
> task to work on.

Oh, if it's "just" that, it should be easy enough to address.  Two
approaches:
1) use atomic ops for increment, modulo afterwards to deal with  wraparound in the synchronous scan
2) batching


> I ended up writing the attached (which I'd not intended to post until
> some time closer to when the doors open for PG11). At the moment it's
> basically just a test patch to see how it affects things when we give
> workers a bit more to do before they come back to look for more work.
> In this case, I've just given them 10 pages to work on, instead of the
> 1 that's allocated in 9.6 and v10.

Right.


> A quick test on a pretty large table on a large machine shows:
> 
> Unpatched:
> 
> postgres=# select count(*) from a;
>    count
> ------------
>  1874000000
> (1 row)
> 
> Time: 5211.485 ms (00:05.211)
> 
> Patched:
> 
> postgres=# select count(*) from a;
>    count
> ------------
>  1874000000
> (1 row)
> 
> Time: 2523.983 ms (00:02.524)

Neat!


> I'd had thoughts that the 10 pages wouldn't be constant, but the
> batching size would depend on the size of the relation to be scanned.
> I'd rough ideas to just try to make about 1 million batches. Something
> like batch_pages = Max(parallel_scan->phs_nblocks / 1000000, 1); so
> that we only take more than 1 page if there's some decent amount to
> process. We don't want to make the batches too big as we might end up
> having to wait on slow workers at the end of a scan.

I wonder how much doing the atomic ops approach alone can help, that
doesn't have the issue that the work might be unevenly distributed
between pages.


> Anyway. I don't want to hi-jack this thread with discussions on this.
> I just wanted to mark that I plan to work on this in order to avoid
> any repeat developments or analysis. I'll probably start a new thread
> for this sometime nearer PG11's dev cycle.

Cool.  I think it might sense to post about this soon, just to give it
some more visibility to reduce the potential for duplication.

- andres



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
Next
From: David Rowley
Date:
Subject: Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)