Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoCLvVkvf=0Jyu+wV9=GZTcEva+R0q-FzMU9viT9asoevA@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum
List pgsql-hackers
On Tue, Nov 8, 2022 at 11:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Nov 5, 2022 at 6:23 PM John Naylor <john.naylor@enterprisedb.com> wrote:
> >
> > On Fri, Nov 4, 2022 at 10:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > For parallel heap pruning, multiple workers will insert key-value
> > > pairs to the radix tree concurrently. The simplest solution would be a
> > > single lock to protect writes but the performance will not be good.
> > > Another solution would be that we can divide the tables into multiple
> > > ranges so that keys derived from TIDs are not conflicted with each
> > > other and have parallel workers process one or more ranges. That way,
> > > parallel vacuum workers can build *sub-trees* and the leader process
> > > can merge them. In use cases of lazy vacuum, since the write phase and
> > > read phase are separated the readers don't need to worry about
> > > concurrent updates.
> >
> > It's a good idea to use ranges for a different reason -- readahead. See commit 56788d2156fc3, which aimed to
improvereadahead for sequential scans. It might work to use that as a model: Each worker prunes a range of 64 pages,
keepingthe dead tids in a local array. At the end of the range: lock the tid store, enter the tids into the store,
unlock,free the local array, and get the next range from the leader. It's possible contention won't be too bad, and I
suspectusing small local arrays as-we-go would be faster and use less memory than merging multiple sub-trees at the
end.
>
> Seems a promising idea. I think it might work well even in the current
> parallel vacuum (ie., single writer). I mean, I think we can have a
> single lwlock for shared cases in the first version. If the overhead
> of acquiring the lwlock per insertion of key-value is not negligible,
> we might want to try this idea.
>
> Apart from that, I'm going to incorporate the comments on 0004 patch
> and try a pointer tagging.

I'd like to share some progress on this work.

0004 patch is a new patch supporting a pointer tagging of the node
kind. Also, it introduces rt_node_ptr we discussed so that internal
functions use it rather than having two arguments for encoded and
decoded pointers. With this intermediate patch, the DSA support patch
became more readable and understandable. Probably we can make it
smaller further if we move the change of separating the control object
from radix_tree to the main patch (0002). The patch still needs to be
polished but I'd like to check if this idea is worthwhile. If we agree
on this direction, this patch will be merged into the main radix tree
implementation patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Allow file inclusion in pg_hba and pg_ident files
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)