Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
Date
Msg-id CAH2-Wz=XytErMnb8FAyFd+OQEbiipB0Q2FmFdXrggPL4VBnRYQ@mail.gmail.com
Whole thread Raw
Responses Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
List pgsql-hackers
My ongoing project to make VACUUM more predictable over time by
proactive freezing [1] will increase the overall number of tuples
frozen by VACUUM significantly (at least in larger tables). It's
important that we avoid any new user-visible impact from extra
freezing, though. I recently spent a lot of time on adding high-level
techniques that aim to avoid extra freezing (e.g. by being lazy about
freezing) when that makes sense. Low level techniques aimed at making
the mechanical process of freezing cheaper might also help. (In any
case it's well worth optimizing.)

I'd like to talk about one such technique on this thread. The attached
WIP patch reduces the size of xl_heap_freeze_page records by applying
a simple deduplication process. This can be treated as independent
work (I think it can, at least). The patch doesn't change anything
about the conceptual model used by VACUUM's lazy_scan_prune function
to build xl_heap_freeze_page records for a page, and yet still manages
to make the WAL records for freeze records over 5x smaller in many
important cases. They'll be ~4x-5x smaller with *most* workloads,
even.

Each individual tuple entry (each xl_heap_freeze_tuple) adds a full 12
bytes to the WAL record right now -- no matter what. So the existing
approach is rather space inefficient by any standard (perhaps because
it was developed under time pressure while fixing bugs in Postgres
9.3). More importantly, there is a lot of natural redundancy among
each xl_heap_freeze_tuple entry -- each tuple's xl_heap_freeze_tuple
details tend to match. We can usually get away with storing each
unique combination of values from xl_heap_freeze_tuple once per
xl_heap_freeze_page record, while storing associated page offset
numbers in a separate area, grouped by their canonical freeze plan
(which is a normalized version of the information currently stored in
xl_heap_freeze_tuple).

In practice most individual tuples that undergo any kind of freezing
only need to have their xmin field frozen. And when xmax is affected
at all, it'll usually just get set to InvalidTransactionId. And so the
actual low-level processing steps for xmax have a high chance of being
shared by other tuples on the page, even in ostensibly tricky cases.
While there are quite a few paths that lead to VACUUM setting a
tuple's xmax to InvalidTransactionId, they all end up with the same
instructional state in the final xl_heap_freeze_tuple entry.

Note that there is a small chance that the patch will be less space
efficient by up to 2 bytes per tuple frozen per page in cases where
we're allocating new Mulits during VACUUM. I think that this should be
acceptable on its own -- even in rare bad cases we'll usually still
come out ahead -- what are the chances that we won't make up the
difference on the same page? Or at least within the same VACUUM? And
that's before we talk about a future world in which freezing will
batch tuples together at the page level (you don't have to bring the
other VACUUM work into this discussion, I think, but it's not
*completely* unrelated either).

[1] https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com
-- 
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: "Jonathan S. Katz"
Date:
Subject: Re: PostgreSQL 15 release announcement draft
Next
From: David Rowley
Date:
Subject: Re: PostgreSQL 15 release announcement draft