Re: decoupling table and index vacuum - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: decoupling table and index vacuum
Date
Msg-id CAH2-WznYa63O+3uoJ9iZ065s_FKMznAAYvhj5p-xFqDR8Br2kw@mail.gmail.com
Whole thread Raw
In response to Re: decoupling table and index vacuum  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Feb 4, 2022 at 1:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > The constantly modified index will be entirely dependent on index
> > vacuuming here, and so an improved VACUUM design that allows that
> > particular index to be vacuumed more frequently could really improve
> > performance.
>
> Thanks for checking my work here - I wasn't 100% sure I had the right idea.

I should perhaps have emphasized individual leaf pages, rather than
total index size. Presumably we only need to store so many extra
versions per logical row at any one time, and we have a fair amount of
free space for extra versions on leaf pages. Typically 10%- 30% of the
space from the page (typical when it isn't already inevitable that the
page will eventually split due to simple inserts). A traditional
guarantee with B-Trees is that we get `ln(2)` space utilization with
random insertions, which leaves just over 30% of the page free for
later updates -- that's where I got 30% from.

There is a complementary effect with deduplication, since that buys us
time before the page has to split, making it much more likely that the
split will be avoided entirely. It's very nonlinear.

As I said, the competition between older snapshots and garbage
collection can still lead to version-driven page splits (especially
when non-hot updates are concentrated in one part of the key space, or
one leaf page). But that's arguably a good thing -- it naturally
relieves contention. There are actually designs that artificially
split B-Tree pages early [1], detecting concurrency control related
contention. Other systems need concurrency control in indexes, which
we avoid by having versions live in indexes.

[1] http://cidrdb.org/cidr2021/papers/cidr2021_paper21.pdf
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Windows now has fdatasync()
Next
From: Michael Banck
Date:
Subject: Re: Release notes for February minor releases