Re: Eagerly scan all-visible pages to amortize aggressive vacuum - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Date
Msg-id CAAKRu_bQC8hQuugxPrMqUTfTBjLCWpqhxLka_R4mDv9Meb+AZw@mail.gmail.com
Whole thread Raw
In response to Re: Eagerly scan all-visible pages to amortize aggressive vacuum  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Eagerly scan all-visible pages to amortize aggressive vacuum
List pgsql-hackers
On Tue, Feb 4, 2025 at 3:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Feb 4, 2025 at 2:57 PM Robert Treat <rob@xzilla.net> wrote:
> > > Yea, I thought that counting them as failures made sense because we
> > > did fail to freeze them. However, now that you mention it, we didn't
> > > fail to freeze them because of age, so maybe we don't want to count
> > > them as failures. I don't expect us to have a bunch of contended
> > > all-visible pages, so I think the question is about what makes it more
> > > clear in the code. What do you think? Should I reset was_eager_scanned
> > > to false if we don't get the cleanup lock?
> >
> > I feel like if we are making the trade-off in resources to attempt
> > eager scanning, and we weren't making progress for whatever reason
> > (and in the lock failure cases, wouldn't some of those be things that
> > would prevent us from freezing?) then it would generally be ok to bias
> > towards bailing sooner rather than later.
>
> Failures to acquire cleanup locks are, hopefully, rare, so it may not
> matter that much. Having said that, if we skip a page because we can't
> acquire a cleanup lock on it, I think that means that it was already
> present in shared_buffers, which means that we didn't have to do an
> I/O to get it. Since I think the point of the failure cap is mostly to
> limit wasted I/O, I would lean toward NOT counting such cases as
> failures.

I think I misspoke when I said we are unlikely to have contended
all-visible pages. I suppose it is trivial to concoct a scenario where
there are many pinned all-visible pages.

Initially I agreed with you that we shouldn't count eagerly scanned
pages we failed to freeze because we didn't get the cleanup lock as
failures. If the page is pinned in shared buffers, it is necessarily
not going to cost us a read -- which is the main overhead of failed
eager freezes.

However, if we don't count an eagerly scanned page as a failure when
we don't get the cleanup lock because we won't have incurred a read,
then why would we count any eagerly scanned page in shared buffers as
a failure? In the case that we actually tried freezing the page and
failed because it was too new, that is giving us information about
whether or not we should keep trying to freeze. So, it is not just
about doing the read but also about what the failure indicates about
the data.

Interestingly, we call heap_tuple_should_freeze() in
lazy_scan_noprune(), so we actually could tell whether or not there
are some tuples on the page old enough to trigger freezing if we had
gotten the cleanup lock. One option would be to add a new output
parameter to lazy_scan_noprune() that indicates whether or not there
were tuples with xids older than the FreezeLimit, and only if there
were not, count it as a failed eager scan.

- Melanie



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: UUID v7
Next
From: Peter Eisentraut
Date:
Subject: Re: Virtual generated columns