Hi,
On 2017-05-05 14:20:48 +1200, David Rowley wrote:
> Yeah, I did get some time to look over the contention in Parallel Seq
> Scan a while back and I discovered that on the machine that I was
> testing on. the lock obtained in heap_parallelscan_nextpage() was
> causing workers to have to wait for other workers to fetch their next
> task to work on.
Oh, if it's "just" that, it should be easy enough to address. Two
approaches:
1) use atomic ops for increment, modulo afterwards to deal with wraparound in the synchronous scan
2) batching
> I ended up writing the attached (which I'd not intended to post until
> some time closer to when the doors open for PG11). At the moment it's
> basically just a test patch to see how it affects things when we give
> workers a bit more to do before they come back to look for more work.
> In this case, I've just given them 10 pages to work on, instead of the
> 1 that's allocated in 9.6 and v10.
Right.
> A quick test on a pretty large table on a large machine shows:
>
> Unpatched:
>
> postgres=# select count(*) from a;
> count
> ------------
> 1874000000
> (1 row)
>
> Time: 5211.485 ms (00:05.211)
>
> Patched:
>
> postgres=# select count(*) from a;
> count
> ------------
> 1874000000
> (1 row)
>
> Time: 2523.983 ms (00:02.524)
Neat!
> I'd had thoughts that the 10 pages wouldn't be constant, but the
> batching size would depend on the size of the relation to be scanned.
> I'd rough ideas to just try to make about 1 million batches. Something
> like batch_pages = Max(parallel_scan->phs_nblocks / 1000000, 1); so
> that we only take more than 1 page if there's some decent amount to
> process. We don't want to make the batches too big as we might end up
> having to wait on slow workers at the end of a scan.
I wonder how much doing the atomic ops approach alone can help, that
doesn't have the issue that the work might be unevenly distributed
between pages.
> Anyway. I don't want to hi-jack this thread with discussions on this.
> I just wanted to mark that I plan to work on this in order to avoid
> any repeat developments or analysis. I'll probably start a new thread
> for this sometime nearer PG11's dev cycle.
Cool. I think it might sense to post about this soon, just to give it
some more visibility to reduce the potential for duplication.
- andres