Re: [HACKERS] Block level parallel vacuum WIP - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [HACKERS] Block level parallel vacuum WIP |
Date | |
Msg-id | CAD21AoAXmbFQDTDm=AiV95C6NhsM8Hh2zEWhR7CsrB5Ofyd1NA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Block level parallel vacuum WIP (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: [HACKERS] Block level parallel vacuum WIP
(Masahiko Sawada <sawada.mshk@gmail.com>)
|
List | pgsql-hackers |
On Wed, Jul 26, 2017 at 5:38 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Sun, Mar 5, 2017 at 4:09 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Sun, Mar 5, 2017 at 12:14 PM, David Steele <david@pgmasters.net> wrote: >>> On 3/4/17 9:08 PM, Masahiko Sawada wrote: >>>> On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>>>> On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>>>> Yes, it's taking a time to update logic and measurement but it's >>>>>> coming along. Also I'm working on changing deadlock detection. Will >>>>>> post new patch and measurement result. >>>>> >>>>> I think that we should push this patch out to v11. I think there are >>>>> too many issues here to address in the limited time we have remaining >>>>> this cycle, and I believe that if we try to get them all solved in the >>>>> next few weeks we're likely to end up getting backed into some choices >>>>> by time pressure that we may later regret bitterly. Since I created >>>>> the deadlock issues that this patch is facing, I'm willing to try to >>>>> help solve them, but I think it's going to require considerable and >>>>> delicate surgery, and I don't think doing that under time pressure is >>>>> a good idea. >>>>> >>>>> From a fairness point of view, a patch that's not in reviewable shape >>>>> on March 1st should really be pushed out, and we're several days past >>>>> that. >>>>> >>>> >>>> Agreed. There are surely some rooms to discuss about the design yet, >>>> and it will take long time. it's good to push this out to CF2017-07. >>>> Thank you for the comment. >>> >>> I have marked this patch "Returned with Feedback." Of course you are >>> welcome to submit this patch to the 2017-07 CF, or whenever you feel it >>> is ready. >> >> Thank you! >> > > I re-considered the basic design of parallel lazy vacuum. I didn't > change the basic concept of this feature and usage, the lazy vacuum > still executes with some parallel workers. In current design, dead > tuple TIDs are shared with all vacuum workers including leader process > when table has index. If we share dead tuple TIDs, we have to make two > synchronization points: before starting vacuum and before clearing > dead tuple TIDs. Before starting vacuum we have to make sure that the > dead tuple TIDs are not added no more. And before clearing dead tuple > TIDs we have to make sure that it's used no more. > > For index vacuum, each indexes is assigned to a vacuum workers based > on ParallelWorkerNumber. For example, if a table has 5 indexes and > vacuum with 2 workers, the leader process and one vacuum worker are > assigned to 2 indexes, and another vacuum process is assigned the > remaining one. The following steps are how the parallel vacuum > processes if table has indexes. > > 1. The leader process and workers scan the table in parallel using > ParallelHeapScanDesc, and collect dead tuple TIDs to shared memory. > 2. Before vacuum on table, the leader process sort the dead tuple TIDs > in physical order once all workers completes to scan the table. > 3. In vacuum on table, the leader process and workers reclaim garbage > on table in block-level parallel. > 4. In vacuum on indexes, the indexes on table is assigned to > particular parallel worker or leader process. The process assigned to > a index vacuums on the index. > 5. Before back to scanning the table, the leader process clears the > dead tuple TIDs once all workers completes to vacuum on table and > indexes. > > Attached the latest patch but it's still PoC version patch and > contains some debug codes. Note that this patch still requires another > patch which moves the relation extension lock out of heavy-weight > lock[1]. The parallel lazy vacuum patch could work even without [1] > patch but could fail during vacuum in some cases. > > Also, I attached the result of performance evaluation. The table size > is approximately 300MB ( > shared_buffers) and I deleted tuples on > every blocks before execute vacuum so that vacuum visits every blocks. > The server spec is > * Intel Xeon E5620 @ 2.4Ghz (8cores) > * 32GB RAM > * ioDrive > > According to the result of table with indexes, performance of lazy > vacuum improved up to a point where the number of indexes and parallel > degree are the same. If a table has 16 indexes and vacuum with 16 > workers, parallel vacuum is 10x faster than single process execution. > Also according to the result of table with no indexes, the parallel > vacuum is 5x faster than single process execution at 8 parallel > degree. Of course we can vacuum only for indexes > > I'm planning to work on that in PG11, will register it to next CF. > Comment and feedback are very welcome. > Since the previous patch conflicts with current HEAD I attached the latest version patch. Also, I measured performance benefit with more large 4GB table and indexes and attached the result. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: