Re: Trigger more frequent autovacuums of heavy insert tables - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Trigger more frequent autovacuums of heavy insert tables
Date
Msg-id CAAKRu_aSHL5gn0q8KotSzeJtRs=oj-4nd2e8VNRMm6UKYmUY8A@mail.gmail.com
Whole thread Raw
In response to Re: Trigger more frequent autovacuums of heavy insert tables  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Trigger more frequent autovacuums of heavy insert tables
List pgsql-hackers
On Mon, Feb 24, 2025 at 4:53 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Thu, Feb 20, 2025 at 07:35:32PM -0500, Melanie Plageman wrote:
> > Attache v7 doesn't cap the result for manual stats updating done with
> > relation_statistics_update().
>
> Unfortunately, this already needs a rebase...

Thanks! Attached v8 does just that.

I've also taken a pass at updating the stats import tests to include
relallfrozen. I'm not totally sure how I feel about the results. I
erred on the side of putting relallfrozen wherever relallvisible was.
But, that means relallfrozen is included in test cases where it
doesn't seem important to the test case. But, that was true with
relallvisible too.

Anyway, I talked to Corey off-list last week. Maybe he will have a
chance to weigh in on the test cases. Also, I had forgotten to include
relallfrozen in pg_clear_relation_stats(). I've fixed that in v8, but
now I'm worried there are some other places I may have missed.

> > I did, however, keep the cap for the
> > places where vacuum/analyze/create index update the stats. There the
> > number for relallfrozen is coming directly from visibilitymap_count(),
> > so it should be correct. I could perhaps add an assert instead, but I
> > didn't think that really made sense. An assert is meant to help the
> > developer and what could the developer do about the visibility map
> > being corrupted.
>
> This might just be personal preference, but I think this is exactly the
> sort of thing an assertion is meant for.  If we expect the value to always
> be correct, and it's not, then our assumptions were wrong or someone has
> broken something.  In both of these cases, I as a developer would really
> like to know so that I don't introduce a latent bug.  If we want Postgres
> to gracefully handle or detect visibility map corruption, then maybe we
> should do both or PANIC.

I'm on the fence about adding a PANIC. We do PANIC in other places
where we notice corruption (like PageAddItemExtended()).  But, in most
of the cases, it seems like we are PANICing because there isn't a
reasonable way to accomplish the intended task. In this case, we
probably can't trust the visibility map counts for that page, but the
pg_class columns are just estimates, so just capping relallfrozen
might be good enough.

I will note that in the other place where we may notice corruption in
the VM, in lazy_scan_prune(), we do visibilitymap_clear() and print a
WARNING -- as opposed to PANICing. Perhaps that's because there is no
need to panic, since we are already fixing the problem with
visibiliytmap_clear().

An assert would only help if the developer did something while
developing that corrupted the visibility map. It doesn't help keep
bogus values out of pg_class if a user's visibility map got corrupted
in some way. But, maybe that isn't needed.

> I do see that heap_vacuum_rel() already caps relallvisible to relpages, but
> it's not clear to me whether that's something that we expect to regularly
> happen in practice or what the adverse effects might be.  So perhaps I'm
> misunderstanding the scope and severity of bogus results from
> visibilitymap_count().  Commit e6858e6, which added this code, doesn't say
> anything about safety, but commit 3d351d9 changed the comment in question
> to its current wording.  After a very quick skim of the latter's thread
> [0], I don't see any discussion about this point, either.

Thanks for doing this archaeology.

I am hesitant to keep the current cap on relallvisible in
heap_vacuum_rel() but then not include an equivalent cap for
relallfrozen. And I think it would be even more confusing for
relallvisible to be capped but relallfrozen has an assert instead.

None of the other locations where relallvisible is updated
(do_analyze_rel(), index_update_stats()) do this capping of
relallvisible. It seems like we should make it consistent. Perhaps we
should just remove it from heap_vacuum_rel(). Then add an assert in
all these places to at least protect development mistakes.

- Melanie

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary
Next
From: Michael Paquier
Date:
Subject: Re: Add Pipelining support in psql