Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoB8oE4WtgXhs+YFwOH3ODAOvEN48afFLYL=vqMk0d25TA@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum
List pgsql-hackers
On Sat, Nov 5, 2022 at 6:23 PM John Naylor <john.naylor@enterprisedb.com> wrote:
>
> On Fri, Nov 4, 2022 at 10:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > For parallel heap pruning, multiple workers will insert key-value
> > pairs to the radix tree concurrently. The simplest solution would be a
> > single lock to protect writes but the performance will not be good.
> > Another solution would be that we can divide the tables into multiple
> > ranges so that keys derived from TIDs are not conflicted with each
> > other and have parallel workers process one or more ranges. That way,
> > parallel vacuum workers can build *sub-trees* and the leader process
> > can merge them. In use cases of lazy vacuum, since the write phase and
> > read phase are separated the readers don't need to worry about
> > concurrent updates.
>
> It's a good idea to use ranges for a different reason -- readahead. See commit 56788d2156fc3, which aimed to improve
readaheadfor sequential scans. It might work to use that as a model: Each worker prunes a range of 64 pages, keeping
thedead tids in a local array. At the end of the range: lock the tid store, enter the tids into the store, unlock, free
thelocal array, and get the next range from the leader. It's possible contention won't be too bad, and I suspect using
smalllocal arrays as-we-go would be faster and use less memory than merging multiple sub-trees at the end. 

Seems a promising idea. I think it might work well even in the current
parallel vacuum (ie., single writer). I mean, I think we can have a
single lwlock for shared cases in the first version. If the overhead
of acquiring the lwlock per insertion of key-value is not negligible,
we might want to try this idea.

Apart from that, I'm going to incorporate the comments on 0004 patch
and try a pointer tagging.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: "Karthik Jagadish (kjagadis)"
Date:
Subject: Re: Tables not getting vacuumed in postgres
Next
From: Mark Dilger
Date:
Subject: Re: Tables not getting vacuumed in postgres