Re: Eager page freeze criteria clarification - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Eager page freeze criteria clarification
Date
Msg-id CAH2-WzntrxejhQ4TO59W8nLA7Zv0m8zmMrk1HhrCLHUSyvVjWA@mail.gmail.com
Whole thread Raw
In response to Re: Eager page freeze criteria clarification  (Andres Freund <andres@anarazel.de>)
Responses Re: Eager page freeze criteria clarification
List pgsql-hackers
On Wed, Sep 27, 2023 at 10:01 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-09-26 09:07:13 -0700, Peter Geoghegan wrote:
> I don't think doing this on a system wide basis with a metric like #unfrozen
> pages is a good idea. It's quite common to have short lived data in some
> tables while also having long-lived data in other tables. Making opportunistic
> freezing more aggressive in that situation will just hurt, without a benefit
> (potentially even slowing down the freezing of older data!). And even within a
> single table, making freezing more aggressive because there's a decent sized
> part of the table that is updated regularly and thus not frozen, doesn't make
> sense.

I never said that #unfrozen pages should be the sole criterion, for
anything. Just that it would influence the overall strategy, making
the system veer towards more aggressive freezing. It would complement
a more sophisticated algorithm that decides whether or not to freeze a
page based on its individual characteristics.

For example, maybe the page-level algorithm would have a random
component. That could potentially be where the global (or at least
table level) view gets to influence things -- the random aspect is
weighed using the global view of debt. That kind of thing seems like
an interesting avenue of investigation.

> If we want to take global freeze debt into account, which I think is a good
> idea, we'll need a smarter way to represent the debt than just the number of
> unfrozen pages.  I think we would need to track the age of unfrozen pages in
> some way. If there are a lot of unfrozen pages with a recent xid, then it's
> fine, but if they are older and getting older, it's a problem and we need to
> be more aggressive.

Tables like pgbench_history will have lots of unfrozen pages with a
recent XID that get scanned during every VACUUM. We should be freezing
such pages at the earliest opportunity.

> The problem I see is how track the age of unfrozen data -
> it'd be easy enough to track the mean(oldest-64bit-xid-on-page), but then we
> again have the issue of rare outliers moving the mean too much...

I think that XID age is mostly not very important compared to the
absolute amount of unfrozen pages, and the cost profile of freezing
now versus later. (XID age *is* important in emergencies, but that's
mostly not what we're discussing right now.)

To be clear, that doesn't mean that XID age shouldn't play an
important role in helping VACUUM to differentiate between pages that
should not be frozen and pages that should be frozen. But even there
it should probably be treated as a cue. And, the relationship between
the XIDs on the page is probably more important than their absolute
age (or relationship to XIDs that appear elsewhere more generally).

For example, if a page is filled with heap tuples whose XIDs all match
(i.e. tuples that were all inserted by the same transaction), or XIDs
that are at least very close together, then that could make VACUUM
more enthusiastic about freezing now. OTOH if the XIDs are more
heterogeneous (but never very old), or if some xmin fields were
frozen, then VACUUM should show much less enthusiasm for freezing if
it's expensive.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Add pg_basetype() function to obtain a DOMAIN base type
Next
From: Andres Freund
Date:
Subject: Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()