Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From John Naylor
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CANWCAZbaaD8goNQ0KXZhGzF_yFuaGWmiYu90UHYpR919sjAS_A@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, Dec 21, 2023 at 6:27 PM Andres Freund <andres@anarazel.de> wrote:
>
> Could either of you summarize what the design changes you've made in the last
> months are and why you've done them? Unfortunately this thread is very long,
> and the comments in the file just say "FIXME" in places that apparently are
> affected by design changes.  This makes it hard to catch up here.

I'd be happy to try, since we are about due for a summary. I was also
hoping to reach a coherent-enough state sometime in early January to
request your feedback, so good timing. Not sure how much detail to go
into, but here goes:

Back in May [1], the method of value storage shifted towards "combined
pointer-value slots", which was described and recommended in the
paper. There were some other changes for simplicity and efficiency,
but none as far-reaching as this.

This is enabled by using the template architecture that we adopted
long ago for different reasons. Fixed length values are either stored
in the slot of the last-level node (if the value fits into the
platform's pointer), or are a "single-value" leaf (otherwise).

For tid store, we want to eventually support bitmap heap scans (in
addition to vacuum), and in doing so make it independent of heap AM.
That means value types similar to PageTableEntry tidbitmap.c, but with
a variable number of bitmapwords.

That required radix tree to support variable length values. That has
been the main focus in the last several months, and it basically works
now.

To my mind, the biggest architectural issues in the patch today are:

- Variable-length values means that pointers are passed around in
places. This will require some shifting responsibility for locking to
the caller, or longer-term maybe a callback interface. (This is new,
the below are pre-existing issues.)
- The tid store has its own "control object" (when shared memory is
needed) with its own lock, in addition to the same for the associated
radix tree. This leads to unnecessary double-locking. This area needs
some attention.
- Memory accounting is still unsettled. The current thinking is to cap
max block/segment size, scaled to a fraction of m_w_m, but there are
still open questions.

There has been some recent effort toward finishing work started
earlier, like shrinking nodes. There a couple places that can still
use either simplification or optimization, but otherwise work fine.
Most of the remaining fixmes/todos/wips are trivial; a few are
actually outdated now that I look again, and will be removed shortly.
The regression tests could use some tidying up.

-John

[1] https://www.postgresql.org/message-id/CAFBsxsFyWLxweHVDtKb7otOCR4XdQGYR4b%2B9svxpVFnJs08BmQ%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Remove MSVC scripts from the tree
Next
From: Japin Li
Date:
Subject: Re: Transaction timeout