Re: Eager page freeze criteria clarification - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Eager page freeze criteria clarification
Date
Msg-id CAH2-Wzk9yFf2RkadQKJ4O_094_UxGz63y06kqCdptLN2aBDKNA@mail.gmail.com
Whole thread Raw
In response to Re: Eager page freeze criteria clarification  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Sep 27, 2023 at 10:46 AM Andres Freund <andres@anarazel.de> wrote:
> I don't disagree that we should do something in that direction - I just don't
> see the raw number of unfrozen pages being useful in that regard. If you have
> a database where no pages live long, we don't need to freeze
> oppportunistically, yet the fraction of unfrozen pages will be huge.

We don't know how to reliably predict the future. We can do our best
to ameliorate problems with such workloads, using a slew of different
strategies (including making the behaviors configurable, holding off
on freezing a second or a third time, etc). Such workloads are not
very common, and won't necessarily suffer too much. I think that
they're pretty much limited to queue-like tables.

Any course of action will likely have some downsides.
Melanie/you/whomever will need to make a trade-off, knowing that
somebody somewhere isn't going to be completely happy about it. I just
don't think that anybody will ever be able to come up with an
algorithm that can generalize well enough for things to not work out
that way. I really care about the problems in this area being
addressed more comprehensively, so I'm certainly not going to be the
one that just refuses to accept a trade-off (within reason, of
course).

> > > If we want to take global freeze debt into account, which I think is a good
> > > idea, we'll need a smarter way to represent the debt than just the number of
> > > unfrozen pages.  I think we would need to track the age of unfrozen pages in
> > > some way. If there are a lot of unfrozen pages with a recent xid, then it's
> > > fine, but if they are older and getting older, it's a problem and we need to
> > > be more aggressive.
> >
> > Tables like pgbench_history will have lots of unfrozen pages with a
> > recent XID that get scanned during every VACUUM. We should be freezing
> > such pages at the earliest opportunity.
>
> I think we ought to be able to freeze tables with as simple a workload as
> pgbench_history has aggressively without taking a global freeze debt into
> account.

Agreed.

> We definitely *also* should take the number of unfrozen pages into account. I
> just don't determining freeze debt primarily using the number of unfrozen
> pages will be useful. The presence of unfrozen pages that are likely to be
> updated again soon is not a problem and makes the simple metric pretty much
> useless.

I never said "primarily".

> > To be clear, that doesn't mean that XID age shouldn't play an
> > important role in helping VACUUM to differentiate between pages that
> > should not be frozen and pages that should be frozen.
>
> I think we need to take it into acocunt to determine a useful freeze debt on a
> table level (and potentially system wide too).

If we reach the stage where XID age starts to matter, we're no longer
talking about performance stability IMV. We're talking about avoiding
wraparound, which is mostly a different problem.

Recall that I started this whole line of discussion about debt in
reaction to a point of Robert's about lazy freezing. My original point
was that (as a general rule) we can afford to be lazy when we're not
risking too much -- when the eventual cost of catching up (having
gotten it wrong) isn't too painful (i.e. doesn't lead to a big balloon
payment down the road). Laziness is the thing that requires
justification. Putting off vital maintenance work for as long as
possible only makes sense under fairly limited circumstances.

A complicating factor here is the influence of the free space map. The
way that that works (the total lack of hysteresis to discourage using
space from old pages) is probably something that makes the problems
with freezing harder to solve. Maybe that should be put in scope.

> Assuming we could compute it cheaply enough, if we had an approximate median
> oldest-64bit-xid-on-page and the number of unfrozen pages, we could
> differentiate between tables that have lots of recent unfrozen pages (the
> median will be low) and pages with lots of unfrozen pages that are unlikely to
> be updated again (the median will be high and growing).  Something like the
> median 64bit xid would be interesting because it'd not get "invalidated" if
> relfrozenxid is increased.

I'm glad that you're mostly of the view that we should be freezing a
lot more aggressively overall, but I think that you're still too
focussed on avoiding small problems. I understand why novel new
problems are generally more of a concern than established old
problems, but there needs to be a sense of proportion. Performance
stability is incredibly important, and isn't zero cost.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Eager page freeze criteria clarification
Next
From: Heikki Linnakangas
Date:
Subject: Re: Streaming I/O, vectored I/O (WIP)