Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) |
Date | |
Msg-id | CAH2-WzkK4qnRuM-kh+9w4AbyyPz9+WEUv_8wNFbPd9Eb9xn=2A@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
|
List | pgsql-hackers |
On Sat, Jan 27, 2018 at 12:20 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > I have posted the patch for the above API and posted it on a new > thread [1]. Do let me know either here or on that thread if the patch > suffices your need? I've responded to you over on that thread. Thanks again for helping me. I already have a revision of my patch lined up that is coded to target your new WaitForParallelWorkersToAttach() interface, plus some other changes. These include: * Make the leader's worker Tuplesortstate complete before the main leader Tuplesortstate even begins, making it very clear that nbtsort.c does not rely on knowing the number of launched workers up-front. That should make Robert a bit happier about our ability to add additional workers fairly late in the process, in a future tuplesort client that finds that to be useful. * I've added a new "parallel" argument to index_build(), which controls whether or not we even call the plan_create_index_workers() cost model. When this is false, we always do a serial build. This was added because I noticed that TRUNCATE REINDEXes the table at a time when parallelism couldn't possibly be useful, which still used parallelism. Best to have the top-level caller opt in or opt out. * Polished the docs some more. * Improved commentary on randomAccess/writable leader handling within logtape.c. We could still support that, if we were willing to make shared Buffiles that are opened within another backend writable. I'm not proposing to do that, but it's nice that we could. I hesitate to post something that won't cleanly apply on the master branch's tip, but otherwise I am ready to send this new revision of the patch right away. It seems likely that Robert will commit your patch within a matter of days, once some minor issues are worked through, at which point I'll send what I have. if anyone prefers, I can post the patch immediately, and break out the WaitForParallelWorkersToAttach() as the second patch in a cumulative patch set. Right now, I'm out of things to work on here. Notes on how I've stress-tested parallel CREATE INDEX: I can recommend using the amcheck heapallverified functionality [1] from the Github version of amcheck to test this patch. You will need to modify the call to IndexBuildHeapScan() that the extension makes, to add a new NULL "scan" argument, since parallel CREATE INDEX changes the signature of IndexBuildHeapScan(). That's trivial, though. Note that parallel CREATE INDEX should produce relfiles that are physically identical to a serial CREATE INDEX, since index tuplesorts are generally always deterministic. IOW, we use a heap TID tie-breaker within tuplesort.c from B-Tree index tuples, which assures us that varying maintenance_work_mem won't affect the final output even in a tiny, insignificant way -- using parallelism should not change anything about the exact output, either. At one point I was testing this patch by verifying not only that indexes were sane, but that they were physically identical to what a serial sort (in the master branch) produced (I only needed to mask page LSNs). Finally, yet another good way to test this patch is to verify that everything continues to work when MAX_PHYSICAL_FILESIZE is modified to be BLCKSZ (2^13 rather than 2^30). You will get many many BufFile segments that way, which could in theory reveal bugs in rare edge cases that I haven't considered. This is a strategy that led to my finding a bug in v10 at one point [2], as well as bugs in earlier versions of Thomas' parallel hash join patch set. It worked for me twice already, so it seems like a good strategy. It may be worth *combining* with some other stress-testing strategy. [1] https://github.com/petergeoghegan/amcheck#optional-heapallindexed-verification [2] https://www.postgresql.org/message-id/CAM3SWZRWdNtkhiG0GyiX_1mUAypiK3dV6-6542pYe2iEL-foTA@mail.gmail.com -- Peter Geoghegan
pgsql-hackers by date: