Re: [HACKERS] Block level parallel vacuum WIP - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [HACKERS] Block level parallel vacuum WIP |
Date | |
Msg-id | CAD21AoCLV7Vb_XvYcWhatFvj7q0hZct8Nu0CQSB=UZaOxVHkGw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Block level parallel vacuum WIP (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: [HACKERS] Block level parallel vacuum WIP
|
List | pgsql-hackers |
On Tue, Aug 15, 2017 at 10:13 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Wed, Jul 26, 2017 at 5:38 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Sun, Mar 5, 2017 at 4:09 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> On Sun, Mar 5, 2017 at 12:14 PM, David Steele <david@pgmasters.net> wrote: >>>> On 3/4/17 9:08 PM, Masahiko Sawada wrote: >>>>> On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>>>>> On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>>>>> Yes, it's taking a time to update logic and measurement but it's >>>>>>> coming along. Also I'm working on changing deadlock detection. Will >>>>>>> post new patch and measurement result. >>>>>> >>>>>> I think that we should push this patch out to v11. I think there are >>>>>> too many issues here to address in the limited time we have remaining >>>>>> this cycle, and I believe that if we try to get them all solved in the >>>>>> next few weeks we're likely to end up getting backed into some choices >>>>>> by time pressure that we may later regret bitterly. Since I created >>>>>> the deadlock issues that this patch is facing, I'm willing to try to >>>>>> help solve them, but I think it's going to require considerable and >>>>>> delicate surgery, and I don't think doing that under time pressure is >>>>>> a good idea. >>>>>> >>>>>> From a fairness point of view, a patch that's not in reviewable shape >>>>>> on March 1st should really be pushed out, and we're several days past >>>>>> that. >>>>>> >>>>> >>>>> Agreed. There are surely some rooms to discuss about the design yet, >>>>> and it will take long time. it's good to push this out to CF2017-07. >>>>> Thank you for the comment. >>>> >>>> I have marked this patch "Returned with Feedback." Of course you are >>>> welcome to submit this patch to the 2017-07 CF, or whenever you feel it >>>> is ready. >>> >>> Thank you! >>> >> >> I re-considered the basic design of parallel lazy vacuum. I didn't >> change the basic concept of this feature and usage, the lazy vacuum >> still executes with some parallel workers. In current design, dead >> tuple TIDs are shared with all vacuum workers including leader process >> when table has index. If we share dead tuple TIDs, we have to make two >> synchronization points: before starting vacuum and before clearing >> dead tuple TIDs. Before starting vacuum we have to make sure that the >> dead tuple TIDs are not added no more. And before clearing dead tuple >> TIDs we have to make sure that it's used no more. >> >> For index vacuum, each indexes is assigned to a vacuum workers based >> on ParallelWorkerNumber. For example, if a table has 5 indexes and >> vacuum with 2 workers, the leader process and one vacuum worker are >> assigned to 2 indexes, and another vacuum process is assigned the >> remaining one. The following steps are how the parallel vacuum >> processes if table has indexes. >> >> 1. The leader process and workers scan the table in parallel using >> ParallelHeapScanDesc, and collect dead tuple TIDs to shared memory. >> 2. Before vacuum on table, the leader process sort the dead tuple TIDs >> in physical order once all workers completes to scan the table. >> 3. In vacuum on table, the leader process and workers reclaim garbage >> on table in block-level parallel. >> 4. In vacuum on indexes, the indexes on table is assigned to >> particular parallel worker or leader process. The process assigned to >> a index vacuums on the index. >> 5. Before back to scanning the table, the leader process clears the >> dead tuple TIDs once all workers completes to vacuum on table and >> indexes. >> >> Attached the latest patch but it's still PoC version patch and >> contains some debug codes. Note that this patch still requires another >> patch which moves the relation extension lock out of heavy-weight >> lock[1]. The parallel lazy vacuum patch could work even without [1] >> patch but could fail during vacuum in some cases. >> >> Also, I attached the result of performance evaluation. The table size >> is approximately 300MB ( > shared_buffers) and I deleted tuples on >> every blocks before execute vacuum so that vacuum visits every blocks. >> The server spec is >> * Intel Xeon E5620 @ 2.4Ghz (8cores) >> * 32GB RAM >> * ioDrive >> >> According to the result of table with indexes, performance of lazy >> vacuum improved up to a point where the number of indexes and parallel >> degree are the same. If a table has 16 indexes and vacuum with 16 >> workers, parallel vacuum is 10x faster than single process execution. >> Also according to the result of table with no indexes, the parallel >> vacuum is 5x faster than single process execution at 8 parallel >> degree. Of course we can vacuum only for indexes >> >> I'm planning to work on that in PG11, will register it to next CF. >> Comment and feedback are very welcome. >> > > Since the previous patch conflicts with current HEAD I attached the > latest version patch. Also, I measured performance benefit with more > large 4GB table and indexes and attached the result. > Since v4 patch conflicts with current HEAD I attached the latest version patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: