On Fri, Sep 16, 2022 at 12:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> After a quick benchmark, I've confirmed that the amount of WAL records
> for freezing 1 million tuples reduced to about one-fifth (1.2GB vs
> 250MB). Great.
I think that the really interesting thing about the patch is how this
changes the way we should think about freezing costs. It makes
page-level batching seem very natural.
The minimum possible size of a Heap2/FREEZE_PAGE record is 64 bytes,
once alignment and so on is taken into account (without the patch).
Once we already know that we have to freeze *some* tuples on a given
heap page, it becomes very reasonable to freeze as many as possible,
in batch, just because we know that it'll be much cheaper if we do it
now versus doing it later on instead. Even if this extra freezing ends
up "going to waste" due to updates against the same tuples that happen
a little later on, the *added* cost of freezing "extra" tuples will
have been so small that it's unlikely to matter. On the other hand, if
it's not wasted we'll be *much* better off.
It's very hard to predict the future, which is kinda what the current
FreezeLimit-based approach to freezing does. It's actually quite easy
to understand the cost of freezing now versus freezing later, though.
At a high level, it makes sense for VACUUM to focus on freezing costs
(including the risk that comes with *not* freezing with larger
tables), and not worry so much about making accurate predictions.
Making accurate predictions about freezing/workload characteristics is
overrated.
> True. I've not looked at the patch in depth yet but I think we need
> regression tests for this.
What did you have in mind?
I think that the best way to test something like this is with
wal_consistency_checking. That mostly works fine. However, note that
heap_mask() won't always be able to preserve the state of a tuple's
xmax when modified by freezing. We sometimes need "hint bits" to
actually reliably be set in REDO, when replaying the records for
freezing. At other times they really are just hints. We have to
conservatively assume that it's just a hint when masking. Not sure if
I can do much about that.
Note that this optimization is one level below lazy_scan_prune(), and
one level above heap_execute_freeze_tuple(). Neither function really
changes at all. This seems useful because there are rare
pg_upgrade-only paths where xvac fields need to be frozen. That's not
tested either.
--
Peter Geoghegan