Re: [HACKERS] Block level parallel vacuum WIP - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Block level parallel vacuum WIP
Date
Msg-id CAD21AoAXmbFQDTDm=AiV95C6NhsM8Hh2zEWhR7CsrB5Ofyd1NA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum WIP  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] Block level parallel vacuum WIP  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Wed, Jul 26, 2017 at 5:38 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Sun, Mar 5, 2017 at 4:09 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Sun, Mar 5, 2017 at 12:14 PM, David Steele <david@pgmasters.net> wrote:
>>> On 3/4/17 9:08 PM, Masahiko Sawada wrote:
>>>> On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>>> On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>>> Yes, it's taking a time to update logic and measurement but it's
>>>>>> coming along. Also I'm working on changing deadlock detection. Will
>>>>>> post new patch and measurement result.
>>>>>
>>>>> I think that we should push this patch out to v11.  I think there are
>>>>> too many issues here to address in the limited time we have remaining
>>>>> this cycle, and I believe that if we try to get them all solved in the
>>>>> next few weeks we're likely to end up getting backed into some choices
>>>>> by time pressure that we may later regret bitterly.  Since I created
>>>>> the deadlock issues that this patch is facing, I'm willing to try to
>>>>> help solve them, but I think it's going to require considerable and
>>>>> delicate surgery, and I don't think doing that under time pressure is
>>>>> a good idea.
>>>>>
>>>>> From a fairness point of view, a patch that's not in reviewable shape
>>>>> on March 1st should really be pushed out, and we're several days past
>>>>> that.
>>>>>
>>>>
>>>> Agreed. There are surely some rooms to discuss about the design yet,
>>>> and it will take long time. it's good to push this out to CF2017-07.
>>>> Thank you for the comment.
>>>
>>> I have marked this patch "Returned with Feedback."  Of course you are
>>> welcome to submit this patch to the 2017-07 CF, or whenever you feel it
>>> is ready.
>>
>> Thank you!
>>
>
> I re-considered the basic design of parallel lazy vacuum. I didn't
> change the basic concept of this feature and usage, the lazy vacuum
> still executes with some parallel workers. In current design, dead
> tuple TIDs are shared with all vacuum workers including leader process
> when table has index. If we share dead tuple TIDs, we have to make two
> synchronization points: before starting vacuum and before clearing
> dead tuple TIDs. Before starting vacuum we have to make sure that the
> dead tuple TIDs are not added no more. And before clearing dead tuple
> TIDs we have to make sure that it's used no more.
>
> For index vacuum, each indexes is assigned to a vacuum workers based
> on ParallelWorkerNumber. For example, if a table has 5 indexes and
> vacuum with 2 workers, the leader process and one vacuum worker are
> assigned to 2 indexes, and another vacuum process is assigned the
> remaining one. The following steps are how the parallel vacuum
> processes if table has indexes.
>
> 1. The leader process and workers scan the table in parallel using
> ParallelHeapScanDesc, and collect dead tuple TIDs to shared memory.
> 2. Before vacuum on table, the leader process sort the dead tuple TIDs
> in physical order once all workers completes to scan the table.
> 3. In vacuum on table, the leader process and workers reclaim garbage
> on table in block-level parallel.
> 4. In vacuum on indexes, the indexes on table is assigned to
> particular parallel worker or leader process. The process assigned to
> a index vacuums on the index.
> 5. Before back to scanning the table, the leader process clears the
> dead tuple TIDs once all workers completes to vacuum on table and
> indexes.
>
> Attached the latest patch but it's still PoC version patch and
> contains some debug codes. Note that this patch still requires another
> patch which moves the relation extension lock out of heavy-weight
> lock[1]. The parallel lazy vacuum patch could work even without [1]
> patch but could fail during vacuum in some cases.
>
> Also, I attached the result of performance evaluation. The table size
> is approximately 300MB ( > shared_buffers) and I deleted tuples on
> every blocks before execute vacuum so that vacuum visits every blocks.
> The server spec is
> * Intel Xeon E5620 @ 2.4Ghz (8cores)
> * 32GB RAM
> * ioDrive
>
> According to the result of table with indexes, performance of lazy
> vacuum improved up to a point where the number of indexes and parallel
> degree are the same. If a table has 16 indexes and vacuum with 16
> workers, parallel vacuum  is 10x faster than single process execution.
> Also according to the result of table with no indexes, the parallel
> vacuum is 5x faster than single process execution at 8 parallel
> degree. Of course we can vacuum only for indexes
>
> I'm planning to work on that in PG11, will register it to next CF.
> Comment and feedback are very welcome.
>

Since the previous patch conflicts with current HEAD I attached the
latest version patch. Also, I measured performance benefit with more
large 4GB table and indexes and attached the result.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: "Moon Insung"
Date:
Subject: Re: [HACKERS] [PATCH] pageinspect function to decode infomasks
Next
From: Marko Tiikkaja
Date:
Subject: [HACKERS] INSERT .. ON CONFLICT DO SELECT [FOR ..]