Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From John Naylor
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAFBsxsGuBoa5rrHtKn4ajYN-OXK8bJRN2gpmZk60fkQDerzotw@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
On Fri, Apr 7, 2023 at 4:55 PM John Naylor <john.naylor@enterprisedb.com> wrote:

> - Fixed-size PagetableEntry's are pretty large, but the tid compression scheme used in this thread (in addition to being complex) is not a great fit for tidbitmap because it makes it more difficult to track per-block metadata (see also next point). With the "combined pointer-value slots" technique, if a page's max tid offset is 63 or less, the offsets can be stored directly in the pointer for the exact case. The lowest bit can tag to indicate a pointer to a single-value leaf. That would complicate operations like union/intersection and tracking "needs recheck", but it would reduce memory use and node-traversal in common cases.

[just getting some thoughts out there before I have something concrete]

Thinking some more, this needn't be complicated at all. We'd just need to reserve some bits of a bitmapword for the tag, as well as flags for "ischunk" and "recheck". The other bits can be used for offsets. Getting/storing the offsets basically amounts to adjusting the shift by a constant. That way, this "embeddable PTE" could serve as both "PTE embedded in a node pointer" and also the first member of a full PTE. A full PTE is now just an array of embedded PTEs, except only the first one has the flags we need. That reduces the number of places that have to be different. Storing any set of offsets all less than ~60 would save allocation/traversal in a large number of real cases. Furthermore, that would reduce a full PTE to 40 bytes because there would be no padding.

This all assumes the key (block number) is no longer stored in the PTE, whether embedded or not. That would mean this technique:

> - With lazy expansion and single-value leaves, the root of a radix tree can point to a single leaf. That might get rid of the need to track TBMStatus, since setting a single-leaf tree should be cheap.

...is not a good trade off because it requires each leaf to have the key, and would thus reduce the utility of embedded leaves. We just need to make sure storing a single value is not costly, and I suspect it's not. (Currently the overhead avoided is allocating and zeroing a few kilobytes for a hash table). If it is not, then we don't need a special case in tidbitmap, which would be a great simplification. If it is, there are other ways to mitigate.

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: evtcache: EventTriggerCache vs Event Trigger Cache
Next
From: shveta malik
Date:
Subject: Re: Support logical replication of DDLs