Thread: Re: Trigger more frequent autovacuums of heavy insert tables

Re: Trigger more frequent autovacuums of heavy insert tables

From

Nathan Bossart

Date:

07 February, 20:37:18

On Tue, Oct 22, 2024 at 03:12:53PM -0400, Melanie Plageman wrote:
> By considering only the unfrozen portion of the table when calculating
> the vacuum insert threshold, we can trigger vacuums more proactively
> on insert-heavy tables. This changes the definition of
> insert_scale_factor to a percentage of "active" table size. The
> attached patch does this.

I think this is a creative idea.  My first reaction is to question whether
it makes send to have two strategies for this sort of thing:
autovacuum_vacuum_max_threshold for updates/deletes and this for inserts.
Perhaps we don't want to more aggressively clean up bloat (except for the
very largest tables via the hard cap), but we do want to more aggressively
mark newly-inserted tuples frozen.  I'm curious what you think.

> I've estimated the unfrozen percentage of the table by adding a new
> field to pg_class, relallfrozen, which is updated in the same places
> as relallvisible.

Wouldn't relallvisible be sufficient here?  We'll skip all-visible pages
unless this is an anti-wraparound vacuum, at which point I would think the
insert threshold goes out the window.

> More frequent vacuums means each vacuum scans fewer pages, but, more
> interestingly, the first vacuum after a checkpoint is much more
> efficient. With the patch, the first vacuum after a checkpoint emits
> half as many FPIs. You can see that only 18 pages were newly dirtied.
> So, with the patch, the pages being vacuumed are usually still in
> shared buffers and still dirty.

Are you aware of any scenarios where your proposed strategy might make
things worse?  From your test results, it sounds like these vacuums ought
to usually be relatively efficient, so sending insert-only tables to the
front of the line is normally okay, but maybe that's not always true.

-- 
nathan

Re: Trigger more frequent autovacuums of heavy insert tables

From

Melanie Plageman

Date:

07 February, 22:21:07

On Fri, Feb 7, 2025 at 12:37 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Oct 22, 2024 at 03:12:53PM -0400, Melanie Plageman wrote:
> > By considering only the unfrozen portion of the table when calculating
> > the vacuum insert threshold, we can trigger vacuums more proactively
> > on insert-heavy tables. This changes the definition of
> > insert_scale_factor to a percentage of "active" table size. The
> > attached patch does this.
>
> I think this is a creative idea.

Indeed. I can't take much credit for it -- Andres suggested this
direction during an off-list conversation where I was complaining
about how difficult it was to benchmark my vacuum eager scanning patch
set [1] because normal vacuums were so rarely triggered for
insert-only tables after the first aggressive vacuum.

> My first reaction is to question whether
> it makes send to have two strategies for this sort of thing:
> autovacuum_vacuum_max_threshold for updates/deletes and this for inserts.
> Perhaps we don't want to more aggressively clean up bloat (except for the
> very largest tables via the hard cap), but we do want to more aggressively
> mark newly-inserted tuples frozen.  I'm curious what you think.

The goal with insert-only tables is to set the whole page frozen in
the VM. So, the number of pages is more important than the total
number of tuples inserted. Whereas, with updates/deletes, it seems
like the total amount of garbage (# tuples) needing cleaning is more
important.

My intuition (maybe wrong) is that it is more common to have a bunch
of pages with a single (or few) updates/deletes than it is to have a
bunch of pages with a single insert. This patch is mostly meant to
trigger vacuums sooner on large insert-only or bulk loaded tables.
Though, it is more common to have a cluster of hot pages than
uniformly distributed updates and deletes...

> > I've estimated the unfrozen percentage of the table by adding a new
> > field to pg_class, relallfrozen, which is updated in the same places
> > as relallvisible.
>
> Wouldn't relallvisible be sufficient here?  We'll skip all-visible pages
> unless this is an anti-wraparound vacuum, at which point I would think the
> insert threshold goes out the window.

It's a great question. There are a couple reasons why I don't think so.

I think this might lead to triggering vacuums too often for
insert-mostly tables. For those tables, the pages that are not
all-visible will largely be just those with data that is new since the
last vacuum. And if we trigger vacuums based off of the % not
all-visible, we might decrease the number of cases where we are able
to vacuum inserted data and freeze it the first time it is vacuumed --
thereby increasing the total amount of work.

As for your point about us skipping all-visible pages except in
anti-wraparound vacuums -- that's not totally true. Autovacuums
triggered by the insert or update/delete thresholds and not by
autovacuum_freeze_max_age can also be aggressive (that's based on
vacuum_freeze_table_age). Aggressive vacuums scan all-visible pages.
And we actually want to trigger more normal aggressive (non-anti-wrap)
vacuums because anti-wraparound vacuums are not canceled by
conflicting lock requests (like those needed by DDL) -- see
PROC_VACUUM_FOR_WRAPAROUND in ProcSleep().

We also scan a surprising number of all-visible pages in practice due
to SKIP_PAGES_THRESHOLD. I was pretty taken aback while testing [1]
how many all-visible pages we scan due to this optimization. And, I'm
planning on merging [1] in the next few days, so this will also
increase the number of all-visible pages scanned during normal
vacuums.

> > More frequent vacuums means each vacuum scans fewer pages, but, more
> > interestingly, the first vacuum after a checkpoint is much more
> > efficient. With the patch, the first vacuum after a checkpoint emits
> > half as many FPIs. You can see that only 18 pages were newly dirtied.
> > So, with the patch, the pages being vacuumed are usually still in
> > shared buffers and still dirty.
>
> Are you aware of any scenarios where your proposed strategy might make
> things worse?  From your test results, it sounds like these vacuums ought
> to usually be relatively efficient, so sending insert-only tables to the
> front of the line is normally okay, but maybe that's not always true.

So, of course they aren't exactly at the front of the line since we
autovacuum based on the order in pg_class. But, I suppose if you spend
a bunch of time vacuuming an insert-mostly table you previously would
have skipped instead of some other table -- that is effectively
prioritizing the insert-mostly tables.

For insert-only/mostly tables, what you are ideally doing is vacuuming
more frequently and handling a small number of pages each vacuum of
the relation, so it has a low performance impact. I suppose if you
only have a few autovacuum workers and an equal number of massive
insert-only tables, you could end up starving other actively updated
tables of vacuum resources. But, those insert-only tables would have
to be vacuumed eventually -- and I imagine that the impact of a
massive aggressive vacuum of all of the data in those tables would be
more disruptive than some extra bloat in your other tables.

I'd be interested if other people with more field experience can
imagine starvation scenarios that would be much worse with this patch.
What kinds of starvation scenarios do you normally see?

In terms of specific, dramatic differences in behavior (since this
wouldn't be hidden behind a guc) people might be surprised by how soon
tables start being vacuumed after a huge COPY FREEZE.

- Melanie

Re: Trigger more frequent autovacuums of heavy insert tables

From

Melanie Plageman

Date:

07 February, 23:57:49

On Fri, Feb 7, 2025 at 3:38 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Fri, Feb 07, 2025 at 02:21:07PM -0500, Melanie Plageman wrote:
> > On Fri, Feb 7, 2025 at 12:37 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >>
> >> Wouldn't relallvisible be sufficient here?  We'll skip all-visible pages
> >> unless this is an anti-wraparound vacuum, at which point I would think the
> >> insert threshold goes out the window.
> >
> > It's a great question. There are a couple reasons why I don't think so.
> >
> > I think this might lead to triggering vacuums too often for
> > insert-mostly tables. For those tables, the pages that are not
> > all-visible will largely be just those with data that is new since the
> > last vacuum. And if we trigger vacuums based off of the % not
> > all-visible, we might decrease the number of cases where we are able
> > to vacuum inserted data and freeze it the first time it is vacuumed --
> > thereby increasing the total amount of work.
>
> Rephrasing to make sure I understand correctly: you're saying that using
> all-frozen would trigger less frequent insert vacuums, which would give us
> a better chance of freezing more than more frequent insert vacuums
> triggered via all-visible?  My suspicion is that the difference would tend
> to be quite subtle in practice, but I have no concrete evidence to back
> that up.

You understood me correctly.

As for relallfrozen, one of the justifications for adding it to
pg_class is actually for the visibility it would provide. We have no
way of knowing how many all-visible but not all-frozen pages there are
on users' systems without pg_visibility. If users had this
information, they could potentially tune their freeze-related settings
more aggressively. Regularly reading the whole visibility map with
pg_visibilitymap_summary() is pretty hard to justify on most
production systems. But querying pg_class every 10 minutes or
something is much more reasonable.

- Melanie

Re: Trigger more frequent autovacuums of heavy insert tables

From

Nathan Bossart

Date:

08 February, 00:05:09

On Fri, Feb 07, 2025 at 03:57:49PM -0500, Melanie Plageman wrote:
> As for relallfrozen, one of the justifications for adding it to
> pg_class is actually for the visibility it would provide. We have no
> way of knowing how many all-visible but not all-frozen pages there are
> on users' systems without pg_visibility. If users had this
> information, they could potentially tune their freeze-related settings
> more aggressively. Regularly reading the whole visibility map with
> pg_visibilitymap_summary() is pretty hard to justify on most
> production systems. But querying pg_class every 10 minutes or
> something is much more reasonable.

If we need it anyway, then I have no objections to using a freeze-related
metric for a freeze-related feature.

Okay, I'll actually look at the patches next...

-- 
nathan

Re: Trigger more frequent autovacuums of heavy insert tables

From

Nathan Bossart

Date:

17 February, 19:11:43

On Fri, Feb 07, 2025 at 03:05:09PM -0600, Nathan Bossart wrote:
> Okay, I'll actually look at the patches next...

Ugh, it's already been 10 days since I said that.  A couple of thoughts on
0001:

I'm not sure I understand the reason for capping relallfrozen to
relallvisible.  From upthread, I gather this is mostly to deal with manual
statistics manipulation, but my first reaction is that we should just let
those values be bogus.  Is there something that fundamentally requires
relallfrozen to be <= relallvisible?  These are only estimates, so I don't
think it would be that surprising for them to defy this expectation.

Should we allow manipulating relallfrozen like we do relallvisible?  My
assumption is that would even be required for the ongoing statistics
import/export work.

Upthread, you mentioned that you weren't seeing relallfrozen in
pg_class_d.h.  I checked on my machine and see it there as expected.  Are
you still missing it?

-- 
nathan

Re: Trigger more frequent autovacuums of heavy insert tables

From

Melanie Plageman

Date:

20 February, 00:36:05

On Mon, Feb 17, 2025 at 11:11 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:
>
> On Fri, Feb 07, 2025 at 03:05:09PM -0600, Nathan Bossart wrote:
> > Okay, I'll actually look at the patches next...

Thanks for taking a look!

> I'm not sure I understand the reason for capping relallfrozen to
> relallvisible.  From upthread, I gather this is mostly to deal with manual
> statistics manipulation, but my first reaction is that we should just let
> those values be bogus.  Is there something that fundamentally requires
> relallfrozen to be <= relallvisible?  These are only estimates, so I don't
> think it would be that surprising for them to defy this expectation.

I wasn't quite sure what to do here. I see your perspective: for
example, reltuples can't possibly be more than relpages but we don't
do any validation of that. My rationale wasn't exactly principled, so
I'll change it to not cap relallfrozen.

This makes me think I should also not cap relallfrozen when using it
in relation_needs_vacanalyze(). There I cap it to relallvisible and
relallvisible is capped to relpages. One of the ideas behind letting
people modify these stats in pg_class is that they can change a single
field to see what the effect on their system is, right?

> Should we allow manipulating relallfrozen like we do relallvisible?  My
> assumption is that would even be required for the ongoing statistics
> import/export work.

Why would it be required for the statistics import/export work?

> Upthread, you mentioned that you weren't seeing relallfrozen in
> pg_class_d.h.  I checked on my machine and see it there as expected.  Are
> you still missing it?

I see it now. No idea what was happening.

- Melanie

Re: Trigger more frequent autovacuums of heavy insert tables

From

Nathan Bossart

Date:

20 February, 00:59:54

On Wed, Feb 19, 2025 at 04:36:05PM -0500, Melanie Plageman wrote:
> This makes me think I should also not cap relallfrozen when using it
> in relation_needs_vacanalyze(). There I cap it to relallvisible and
> relallvisible is capped to relpages. One of the ideas behind letting
> people modify these stats in pg_class is that they can change a single
> field to see what the effect on their system is, right?

Right.  Capping these values to reflect reality seems like it could make
that more difficult.

>> Should we allow manipulating relallfrozen like we do relallvisible?  My
>> assumption is that would even be required for the ongoing statistics
>> import/export work.
> 
> Why would it be required for the statistics import/export work?

It's probably not strictly required, but my naive expectation would be that
we'd handle relallfrozen just like relallvisible, which appears to be
dumped in the latest stats import/export patch.  Is there any reason we
shouldn't do the same for relallfrozen?

-- 
nathan

Re: Trigger more frequent autovacuums of heavy insert tables

From

Melanie Plageman

Date:

21 February, 03:35:32

On Wed, Feb 19, 2025 at 4:59 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Wed, Feb 19, 2025 at 04:36:05PM -0500, Melanie Plageman wrote:
> > This makes me think I should also not cap relallfrozen when using it
> > in relation_needs_vacanalyze(). There I cap it to relallvisible and
> > relallvisible is capped to relpages. One of the ideas behind letting
> > people modify these stats in pg_class is that they can change a single
> > field to see what the effect on their system is, right?
>
> Right.  Capping these values to reflect reality seems like it could make
> that more difficult.

Attache v7 doesn't cap the result for manual stats updating done with
relation_statistics_update(). I did, however, keep the cap for the
places where vacuum/analyze/create index update the stats. There the
number for relallfrozen is coming directly from visibilitymap_count(),
so it should be correct. I could perhaps add an assert instead, but I
didn't think that really made sense. An assert is meant to help the
developer and what could the developer do about the visibility map
being corrupted.

> >> Should we allow manipulating relallfrozen like we do relallvisible?  My
> >> assumption is that would even be required for the ongoing statistics
> >> import/export work.
> >
> > Why would it be required for the statistics import/export work?
>
> It's probably not strictly required, but my naive expectation would be that
> we'd handle relallfrozen just like relallvisible, which appears to be
> dumped in the latest stats import/export patch.  Is there any reason we
> shouldn't do the same for relallfrozen?

Nope I don't think so, but I also don't know about how people are
envisioning using a manually updated relallvisible.

- Melanie

Attachment

Re: Trigger more frequent autovacuums of heavy insert tables

From

Nathan Bossart

Date:

25 February, 00:53:48

On Thu, Feb 20, 2025 at 07:35:32PM -0500, Melanie Plageman wrote:
> Attache v7 doesn't cap the result for manual stats updating done with
> relation_statistics_update().

Unfortunately, this already needs a rebase...

> I did, however, keep the cap for the
> places where vacuum/analyze/create index update the stats. There the
> number for relallfrozen is coming directly from visibilitymap_count(),
> so it should be correct. I could perhaps add an assert instead, but I
> didn't think that really made sense. An assert is meant to help the
> developer and what could the developer do about the visibility map
> being corrupted.

This might just be personal preference, but I think this is exactly the
sort of thing an assertion is meant for.  If we expect the value to always
be correct, and it's not, then our assumptions were wrong or someone has
broken something.  In both of these cases, I as a developer would really
like to know so that I don't introduce a latent bug.  If we want Postgres
to gracefully handle or detect visibility map corruption, then maybe we
should do both or PANIC.

I do see that heap_vacuum_rel() already caps relallvisible to relpages, but
it's not clear to me whether that's something that we expect to regularly
happen in practice or what the adverse effects might be.  So perhaps I'm
misunderstanding the scope and severity of bogus results from
visibilitymap_count().  Commit e6858e6, which added this code, doesn't say
anything about safety, but commit 3d351d9 changed the comment in question
to its current wording.  After a very quick skim of the latter's thread
[0], I don't see any discussion about this point, either.

>> >> Should we allow manipulating relallfrozen like we do relallvisible?  My
>> >> assumption is that would even be required for the ongoing statistics
>> >> import/export work.
>> >
>> > Why would it be required for the statistics import/export work?
>>
>> It's probably not strictly required, but my naive expectation would be that
>> we'd handle relallfrozen just like relallvisible, which appears to be
>> dumped in the latest stats import/export patch.  Is there any reason we
>> shouldn't do the same for relallfrozen?
> 
> Nope I don't think so, but I also don't know about how people are
> envisioning using a manually updated relallvisible.

That does seem unlikely.  I'd expect it to be more useful for porting
statistics over during pg_upgrade.

[0] https://postgr.es/m/flat/F02298E0-6EF4-49A1-BCB6-C484794D9ACC%40thebuild.com

-- 
nathan

Re: Trigger more frequent autovacuums of heavy insert tables

From

Melanie Plageman

Date:

25 February, 02:35:18

On Mon, Feb 24, 2025 at 4:53 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Thu, Feb 20, 2025 at 07:35:32PM -0500, Melanie Plageman wrote:
> > Attache v7 doesn't cap the result for manual stats updating done with
> > relation_statistics_update().
>
> Unfortunately, this already needs a rebase...

Thanks! Attached v8 does just that.

I've also taken a pass at updating the stats import tests to include
relallfrozen. I'm not totally sure how I feel about the results. I
erred on the side of putting relallfrozen wherever relallvisible was.
But, that means relallfrozen is included in test cases where it
doesn't seem important to the test case. But, that was true with
relallvisible too.

Anyway, I talked to Corey off-list last week. Maybe he will have a
chance to weigh in on the test cases. Also, I had forgotten to include
relallfrozen in pg_clear_relation_stats(). I've fixed that in v8, but
now I'm worried there are some other places I may have missed.

> > I did, however, keep the cap for the
> > places where vacuum/analyze/create index update the stats. There the
> > number for relallfrozen is coming directly from visibilitymap_count(),
> > so it should be correct. I could perhaps add an assert instead, but I
> > didn't think that really made sense. An assert is meant to help the
> > developer and what could the developer do about the visibility map
> > being corrupted.
>
> This might just be personal preference, but I think this is exactly the
> sort of thing an assertion is meant for.  If we expect the value to always
> be correct, and it's not, then our assumptions were wrong or someone has
> broken something.  In both of these cases, I as a developer would really
> like to know so that I don't introduce a latent bug.  If we want Postgres
> to gracefully handle or detect visibility map corruption, then maybe we
> should do both or PANIC.

I'm on the fence about adding a PANIC. We do PANIC in other places
where we notice corruption (like PageAddItemExtended()).  But, in most
of the cases, it seems like we are PANICing because there isn't a
reasonable way to accomplish the intended task. In this case, we
probably can't trust the visibility map counts for that page, but the
pg_class columns are just estimates, so just capping relallfrozen
might be good enough.

I will note that in the other place where we may notice corruption in
the VM, in lazy_scan_prune(), we do visibilitymap_clear() and print a
WARNING -- as opposed to PANICing. Perhaps that's because there is no
need to panic, since we are already fixing the problem with
visibiliytmap_clear().

An assert would only help if the developer did something while
developing that corrupted the visibility map. It doesn't help keep
bogus values out of pg_class if a user's visibility map got corrupted
in some way. But, maybe that isn't needed.

> I do see that heap_vacuum_rel() already caps relallvisible to relpages, but
> it's not clear to me whether that's something that we expect to regularly
> happen in practice or what the adverse effects might be.  So perhaps I'm
> misunderstanding the scope and severity of bogus results from
> visibilitymap_count().  Commit e6858e6, which added this code, doesn't say
> anything about safety, but commit 3d351d9 changed the comment in question
> to its current wording.  After a very quick skim of the latter's thread
> [0], I don't see any discussion about this point, either.

Thanks for doing this archaeology.

I am hesitant to keep the current cap on relallvisible in
heap_vacuum_rel() but then not include an equivalent cap for
relallfrozen. And I think it would be even more confusing for
relallvisible to be capped but relallfrozen has an assert instead.

None of the other locations where relallvisible is updated
(do_analyze_rel(), index_update_stats()) do this capping of
relallvisible. It seems like we should make it consistent. Perhaps we
should just remove it from heap_vacuum_rel(). Then add an assert in
all these places to at least protect development mistakes.

- Melanie

Attachment

Re: Trigger more frequent autovacuums of heavy insert tables

From

wenhui qiu

Date:

25 February, 06:30:04

Hi Melanie

> relallvisible. It seems like we should make it consistent. Perhaps we

> should just remove it from heap_vacuum_rel(). Then add an assert in
> all these places to at least protect development mistakes.

I think there's some objection to that.

Thanks

On Tue, Feb 25, 2025 at 7:35 AM Melanie Plageman <melanieplageman@gmail.com> wrote:

On Mon, Feb 24, 2025 at 4:53 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Thu, Feb 20, 2025 at 07:35:32PM -0500, Melanie Plageman wrote:
> > Attache v7 doesn't cap the result for manual stats updating done with
> > relation_statistics_update().
>
> Unfortunately, this already needs a rebase...

Thanks! Attached v8 does just that.

I've also taken a pass at updating the stats import tests to include
relallfrozen. I'm not totally sure how I feel about the results. I
erred on the side of putting relallfrozen wherever relallvisible was.
But, that means relallfrozen is included in test cases where it
doesn't seem important to the test case. But, that was true with
relallvisible too.

Anyway, I talked to Corey off-list last week. Maybe he will have a
chance to weigh in on the test cases. Also, I had forgotten to include
relallfrozen in pg_clear_relation_stats(). I've fixed that in v8, but
now I'm worried there are some other places I may have missed.

> > I did, however, keep the cap for the
> > places where vacuum/analyze/create index update the stats. There the
> > number for relallfrozen is coming directly from visibilitymap_count(),
> > so it should be correct. I could perhaps add an assert instead, but I
> > didn't think that really made sense. An assert is meant to help the
> > developer and what could the developer do about the visibility map
> > being corrupted.
>
> This might just be personal preference, but I think this is exactly the
> sort of thing an assertion is meant for. If we expect the value to always
> be correct, and it's not, then our assumptions were wrong or someone has
> broken something. In both of these cases, I as a developer would really
> like to know so that I don't introduce a latent bug. If we want Postgres
> to gracefully handle or detect visibility map corruption, then maybe we
> should do both or PANIC.

I'm on the fence about adding a PANIC. We do PANIC in other places
where we notice corruption (like PageAddItemExtended()). But, in most
of the cases, it seems like we are PANICing because there isn't a
reasonable way to accomplish the intended task. In this case, we
probably can't trust the visibility map counts for that page, but the
pg_class columns are just estimates, so just capping relallfrozen
might be good enough.

I will note that in the other place where we may notice corruption in
the VM, in lazy_scan_prune(), we do visibilitymap_clear() and print a
WARNING -- as opposed to PANICing. Perhaps that's because there is no
need to panic, since we are already fixing the problem with
visibiliytmap_clear().

An assert would only help if the developer did something while
developing that corrupted the visibility map. It doesn't help keep
bogus values out of pg_class if a user's visibility map got corrupted
in some way. But, maybe that isn't needed.

> I do see that heap_vacuum_rel() already caps relallvisible to relpages, but
> it's not clear to me whether that's something that we expect to regularly
> happen in practice or what the adverse effects might be. So perhaps I'm
> misunderstanding the scope and severity of bogus results from
> visibilitymap_count(). Commit e6858e6, which added this code, doesn't say
> anything about safety, but commit 3d351d9 changed the comment in question
> to its current wording. After a very quick skim of the latter's thread
> [0], I don't see any discussion about this point, either.

Thanks for doing this archaeology.

I am hesitant to keep the current cap on relallvisible in
heap_vacuum_rel() but then not include an equivalent cap for
relallfrozen. And I think it would be even more confusing for
relallvisible to be capped but relallfrozen has an assert instead.

None of the other locations where relallvisible is updated
(do_analyze_rel(), index_update_stats()) do this capping of
relallvisible. It seems like we should make it consistent. Perhaps we
should just remove it from heap_vacuum_rel(). Then add an assert in
all these places to at least protect development mistakes.

- Melanie

Re: Trigger more frequent autovacuums of heavy insert tables

From

Melanie Plageman

Date:

25 February, 18:21:01

On Mon, Feb 24, 2025 at 10:30 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
>
> Hi Melanie
> > relallvisible. It seems like we should make it consistent. Perhaps we
> > should just remove it from heap_vacuum_rel(). Then add an assert in
> >  all these places to at least protect development mistakes.
> I think there's some objection to that.

Could you elaborate a bit?

There were new merge conflicts, so v9 is attached.

- Melanie

Attachment

Re: Trigger more frequent autovacuums of heavy insert tables

From

Robert Haas

Date:

25 February, 18:38:48

On Mon, Feb 24, 2025 at 6:35 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> I'm on the fence about adding a PANIC. We do PANIC in other places
> where we notice corruption (like PageAddItemExtended()).  But, in most
> of the cases, it seems like we are PANICing because there isn't a
> reasonable way to accomplish the intended task. In this case, we
> probably can't trust the visibility map counts for that page, but the
> pg_class columns are just estimates, so just capping relallfrozen
> might be good enough.

+1 for not unnecessarily inflating the log level. I agree with the
principle you articulate here: a PANIC is reasonable when there's no
sane way to continue, but for other corruption events, a WARNING is
better. Note that one really bad consequence of using ERROR or any
higher level in VACUUM is that VACUUM doesn't ever complete. We just
keep retrying and dying. Even if it's only an ERROR, now we're no
longer controlling bloat or preventing wraparound. That's extremely
painful for users, so we don't want to end up there if we have a
reasonable alternative.

Also, I think that it's usually a bad idea to use an Assert to test
for data corruption scenarios. The problem is that programmers are
more or less entitled to assume that an assertion will always hold
true, barring bugs, and just ignore completely what the code might do
if the assertion turns out to be false. But data does get corrupted on
disk, so you SHOULD think about what the code is going to do in such
scenarios. Depending on the details, you might want to WARNING or
ERROR or PANIC or try to repair it or try to just tolerate it being
wrong or something else -- but you should be thinking about what the
code is actually going to do, not just going "eh, well, that can't
happen".

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Trigger more frequent autovacuums of heavy insert tables

From

Melanie Plageman

Date:

25 February, 19:02:40

On Tue, Feb 25, 2025 at 10:39 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Feb 24, 2025 at 6:35 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > I'm on the fence about adding a PANIC. We do PANIC in other places
> > where we notice corruption (like PageAddItemExtended()).  But, in most
> > of the cases, it seems like we are PANICing because there isn't a
> > reasonable way to accomplish the intended task. In this case, we
> > probably can't trust the visibility map counts for that page, but the
> > pg_class columns are just estimates, so just capping relallfrozen
> > might be good enough.
>
> +1 for not unnecessarily inflating the log level. I agree with the
> principle you articulate here: a PANIC is reasonable when there's no
> sane way to continue, but for other corruption events, a WARNING is
> better. Note that one really bad consequence of using ERROR or any
> higher level in VACUUM is that VACUUM doesn't ever complete. We just
> keep retrying and dying. Even if it's only an ERROR, now we're no
> longer controlling bloat or preventing wraparound. That's extremely
> painful for users, so we don't want to end up there if we have a
> reasonable alternative.

That makes sense. I think I'll add a WARNING if we don't cap the value.

> Also, I think that it's usually a bad idea to use an Assert to test
> for data corruption scenarios. The problem is that programmers are
> more or less entitled to assume that an assertion will always hold
> true, barring bugs, and just ignore completely what the code might do
> if the assertion turns out to be false. But data does get corrupted on
> disk, so you SHOULD think about what the code is going to do in such
> scenarios. Depending on the details, you might want to WARNING or
> ERROR or PANIC or try to repair it or try to just tolerate it being
> wrong or something else -- but you should be thinking about what the
> code is actually going to do, not just going "eh, well, that can't
> happen".

I agree with this. I don't see what the assertion would catch in this
case. Even as I try to think of a scenario where this would help the
developer, if you write code that corrupts the visibility map, I don't
think waiting for an assert at the end of a vacuum will be your first
or best signal.

This does however leave me with the question of how to handle the
original question of whether or not to cap the proposed relallfrozen
to the value of relallvisible when updating stats at the end of
vacuum. The current code in heap_vacuum_rel() caps relallvisible to
relpages, so capping relallfrozen to relallvisible would follow that
pattern. However, the other places relallvisible is updated do no such
capping (do_analyze_rel(), index_update_stats()). It doesn't seem like
there is a good reason to do it one place and not the others. So, I
suggest either removing all the caps and adding a WARNING or capping
the value in all places. Because users can now manually update these
values in pg_class, there wouldn't be a way to detect the difference
between a bogus relallfrozen value due to VM corruption or a bogus
value due to manual statistics intervention. This led me to think that
a WARNING and no cap would be more effective for heap_vacuum_rel().

- Melanie

Re: Trigger more frequent autovacuums of heavy insert tables

From

Nathan Bossart

Date:

25 February, 19:06:59

On Tue, Feb 25, 2025 at 10:38:48AM -0500, Robert Haas wrote:
> On Mon, Feb 24, 2025 at 6:35 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
>> I'm on the fence about adding a PANIC. We do PANIC in other places
>> where we notice corruption (like PageAddItemExtended()).  But, in most
>> of the cases, it seems like we are PANICing because there isn't a
>> reasonable way to accomplish the intended task. In this case, we
>> probably can't trust the visibility map counts for that page, but the
>> pg_class columns are just estimates, so just capping relallfrozen
>> might be good enough.
> 
> +1 for not unnecessarily inflating the log level. I agree with the
> principle you articulate here: a PANIC is reasonable when there's no
> sane way to continue, but for other corruption events, a WARNING is
> better. Note that one really bad consequence of using ERROR or any
> higher level in VACUUM is that VACUUM doesn't ever complete. We just
> keep retrying and dying. Even if it's only an ERROR, now we're no
> longer controlling bloat or preventing wraparound. That's extremely
> painful for users, so we don't want to end up there if we have a
> reasonable alternative.
> 
> Also, I think that it's usually a bad idea to use an Assert to test
> for data corruption scenarios. The problem is that programmers are
> more or less entitled to assume that an assertion will always hold
> true, barring bugs, and just ignore completely what the code might do
> if the assertion turns out to be false. But data does get corrupted on
> disk, so you SHOULD think about what the code is going to do in such
> scenarios. Depending on the details, you might want to WARNING or
> ERROR or PANIC or try to repair it or try to just tolerate it being
> wrong or something else -- but you should be thinking about what the
> code is actually going to do, not just going "eh, well, that can't
> happen".

In this case, it sounds to me like we want production builds to tolerate
visibility map corruption (at least, to some extent), but we don't really
expect to see it regularly.  If that's true, then my inclination would be
to add assertions to make sure nobody has fundamentally broken the code,
but to at most either warn or restrict the values for non-assert builds.
TBH my gut feeling is that, outside of assert-enabled builds, we should
just let the values be bogus if they are indeed just estimates that can be
wildly inaccurate anyway.  I see little value in trying to gently corral
the value in certain cases, lest we want to mask bugs and give credence to
bad assumptions about the possible values.

I think the argument for PANIC-ing in this case is that visibility map
corruption seems likely to be the tip of the iceberg.  If we see that one
file is corrupted, I would expect a DBA to be extremely wary of trusting
the other files on the system, too.  (I will freely admit that I might be
missing situations where race conditions or the like cause us to retrieve
seemingly invalid values for relpages and relallvisible, so please pardon
my ignorance as I dive deeper into this code.)

-- 
nathan

Re: Trigger more frequent autovacuums of heavy insert tables

From

Nathan Bossart

Date:

25 February, 19:31:56

On Tue, Feb 25, 2025 at 11:02:40AM -0500, Melanie Plageman wrote:
> This does however leave me with the question of how to handle the
> original question of whether or not to cap the proposed relallfrozen
> to the value of relallvisible when updating stats at the end of
> vacuum. The current code in heap_vacuum_rel() caps relallvisible to
> relpages, so capping relallfrozen to relallvisible would follow that
> pattern. However, the other places relallvisible is updated do no such
> capping (do_analyze_rel(), index_update_stats()). It doesn't seem like
> there is a good reason to do it one place and not the others. So, I
> suggest either removing all the caps and adding a WARNING or capping
> the value in all places. Because users can now manually update these
> values in pg_class, there wouldn't be a way to detect the difference
> between a bogus relallfrozen value due to VM corruption or a bogus
> value due to manual statistics intervention. This led me to think that
> a WARNING and no cap would be more effective for heap_vacuum_rel().

I'm currently leaning towards the "remove all caps" idea.  But I'm not sure
I totally understand the need for a WARNING.  What do we expect users to do
with that information?  If it's intended to alert them of possible
corruption, then IMHO a WARNING may be too gentle.  I guess we could warn
and suggest a way to fix the value with the new statistics manipulation
functions if it's not that big of a deal, but at that point we might as
well just cap it on our own again.  If we don't really expect users to have
to do anything about it, then isn't it just adding unnecessary log noise?

-- 
nathan

Re: Trigger more frequent autovacuums of heavy insert tables

From

Greg Sabino Mullane

Date:

25 February, 19:36:04

On Tue, Feb 25, 2025 at 11:03 AM Melanie Plageman <melanieplageman@gmail.com> wrote:

Because users can now manually update these values in pg_class, there wouldn't be a way to detect the difference
between a bogus relallfrozen value due to VM corruption or a bogus value due to manual statistics intervention.

Er..you had me until this. If manual monkeying of the system catalogs leads to a "bogus" error that resembles a real one, then sow the wind, and reap the whirlwind. I don't think that should be a consideration here.

Cheers,

Greg

Crunchy Data - https://www.crunchydata.com

Enterprise Postgres Software Products & Tech Support

Re: Trigger more frequent autovacuums of heavy insert tables

From

Robert Treat

Date:

25 February, 20:36:40

On Tue, Feb 25, 2025 at 11:32 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:
> On Tue, Feb 25, 2025 at 11:02:40AM -0500, Melanie Plageman wrote:
> > This does however leave me with the question of how to handle the
> > original question of whether or not to cap the proposed relallfrozen
> > to the value of relallvisible when updating stats at the end of
> > vacuum. The current code in heap_vacuum_rel() caps relallvisible to
> > relpages, so capping relallfrozen to relallvisible would follow that
> > pattern. However, the other places relallvisible is updated do no such
> > capping (do_analyze_rel(), index_update_stats()). It doesn't seem like
> > there is a good reason to do it one place and not the others. So, I
> > suggest either removing all the caps and adding a WARNING or capping
> > the value in all places. Because users can now manually update these
> > values in pg_class, there wouldn't be a way to detect the difference
> > between a bogus relallfrozen value due to VM corruption or a bogus
> > value due to manual statistics intervention. This led me to think that
> > a WARNING and no cap would be more effective for heap_vacuum_rel().
>
> I'm currently leaning towards the "remove all caps" idea.  But I'm not sure
> I totally understand the need for a WARNING.  What do we expect users to do
> with that information?  If it's intended to alert them of possible
> corruption, then IMHO a WARNING may be too gentle.  I guess we could warn
> and suggest a way to fix the value with the new statistics manipulation
> functions if it's not that big of a deal, but at that point we might as
> well just cap it on our own again.  If we don't really expect users to have
> to do anything about it, then isn't it just adding unnecessary log noise?
>

If the end user is manipulating numbers to test some theory, I think
it's valuable feedback that they have probably bent the system too
far, because we are now seeing numbers that don't make sense. If they
aren't mucking with the system, then it's valuable feedback that they
may have an underlying system problem that could be about to get
worse.

Robert Treat
https://xzilla.net

Re: Trigger more frequent autovacuums of heavy insert tables

From

Nathan Bossart

Date:

25 February, 21:05:10

On Tue, Feb 25, 2025 at 12:36:40PM -0500, Robert Treat wrote:
> On Tue, Feb 25, 2025 at 11:32 AM Nathan Bossart
> <nathandbossart@gmail.com> wrote:
>> I'm currently leaning towards the "remove all caps" idea.  But I'm not sure
>> I totally understand the need for a WARNING.  What do we expect users to do
>> with that information?  If it's intended to alert them of possible
>> corruption, then IMHO a WARNING may be too gentle.  I guess we could warn
>> and suggest a way to fix the value with the new statistics manipulation
>> functions if it's not that big of a deal, but at that point we might as
>> well just cap it on our own again.  If we don't really expect users to have
>> to do anything about it, then isn't it just adding unnecessary log noise?
> 
> If the end user is manipulating numbers to test some theory, I think
> it's valuable feedback that they have probably bent the system too
> far, because we are now seeing numbers that don't make sense.

If we do that in other cases, then that seems reasonable.  But it feels
weird to me to carve out just this one inaccuracy.  For example, do we warn
if someone sets reltuples too high for relpages, or if they set relpages
too low for reltuples?  I'm not seeing why the relallfrozen <=
relallvisible <= relpages case is especially important to uphold.  If we
were consistent about enforcing these kinds of invariants everywhere, then
I think I would be more on board with capping the values and/or ERROR-ing
when folks tried to set bogus ones.  But if the only outcome of bogus
values is that you might get weird plans or autovacuum might prioritize the
table differently, then I'm not sure I see the point.  You can accomplish
both of those things with totally valid values already.

> If they
> aren't mucking with the system, then it's valuable feedback that they
> may have an underlying system problem that could be about to get
> worse.

Maybe a WARNING is as much as we can realistically do in this case, but I
think it could be easily missed in the server logs.  I dunno, it just feels
wrong to me to deal with potential corruption by gently notifying the user
and proceeding normally.  I guess that's what we do already elsewhere,
though (e.g., for relfrozenxid/relminmxid in vac_update_relstats()).

-- 
nathan

Re: Trigger more frequent autovacuums of heavy insert tables

From

Robert Haas

Date:

25 February, 21:52:28

On Tue, Feb 25, 2025 at 11:03 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> This does however leave me with the question of how to handle the
> original question of whether or not to cap the proposed relallfrozen
> to the value of relallvisible when updating stats at the end of
> vacuum. The current code in heap_vacuum_rel() caps relallvisible to
> relpages, so capping relallfrozen to relallvisible would follow that
> pattern. However, the other places relallvisible is updated do no such
> capping (do_analyze_rel(), index_update_stats()). It doesn't seem like
> there is a good reason to do it one place and not the others. So, I
> suggest either removing all the caps and adding a WARNING or capping
> the value in all places. Because users can now manually update these
> values in pg_class, there wouldn't be a way to detect the difference
> between a bogus relallfrozen value due to VM corruption or a bogus
> value due to manual statistics intervention. This led me to think that
> a WARNING and no cap would be more effective for heap_vacuum_rel().

I mean, does it really make any difference one way or the other?

Given that users could manually update the catalog, we have to be able
to tolerate bad data in the catalogs without the world ending. If that
code has to exist anyway, then it's not mandatory to cap. On the other
hand, there's no great virtue in refusing to correct data that we know
to be wrong. Unless there is some other consideration which makes one
way better than the other, this feels like author's choice.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Trigger more frequent autovacuums of heavy insert tables

From

Nathan Bossart

Date:

25 February, 23:05:22

On Tue, Feb 25, 2025 at 01:52:28PM -0500, Robert Haas wrote:
> Given that users could manually update the catalog, we have to be able
> to tolerate bad data in the catalogs without the world ending. If that
> code has to exist anyway, then it's not mandatory to cap. On the other
> hand, there's no great virtue in refusing to correct data that we know
> to be wrong. Unless there is some other consideration which makes one
> way better than the other, this feels like author's choice.

Maybe the most conservative choice is to simply follow the example of
surrounding code.  If it's careful to cap relallvisible to relpages, also
have it cap relallfrozen.  If not, don't.  *shrug*

In any case, I don't want to hold up this patch on this relatively minor
point.  This seems like something we could pretty easily change in the
future if needed.

-- 
nathan

Re: Trigger more frequent autovacuums of heavy insert tables

From

Melanie Plageman

Date:

26 February, 00:29:31

On Tue, Feb 25, 2025 at 11:36 AM Greg Sabino Mullane <htamfids@gmail.com> wrote:
>
> On Tue, Feb 25, 2025 at 11:03 AM Melanie Plageman <melanieplageman@gmail.com> wrote:
>>
>> Because users can now manually update these values in pg_class, there wouldn't be a way to detect the difference
>> between a bogus relallfrozen value due to VM corruption or a bogus value due to manual statistics intervention.
>
> Er..you had me until this. If manual monkeying of the system catalogs leads to a "bogus" error that resembles a real
one,then sow the wind, and reap the whirlwind. I don't think that should be a consideration here. 

Oh, the WARNING would only show up in a case of actual VM corruption.
The WARNING proposed is after calling visibilitymap_count() before
updating pg_class, if the value we get from the VM for relallfrozen
exceeds relallvisible. If you manually change pg_class
relallfrozen/relallvisible to bogus values, you wouldn't get a
warning. I meant there wouldn't be a way to detect the difference when
viewing pg_class between bogus values because of a corrupt VM and
bogus values because of manual updates to pg_class.

- Melanie

Re: Trigger more frequent autovacuums of heavy insert tables

From

Melanie Plageman

Date:

26 February, 01:13:09

On Tue, Feb 25, 2025 at 1:52 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Feb 25, 2025 at 11:03 AM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > This does however leave me with the question of how to handle the
> > original question of whether or not to cap the proposed relallfrozen
> > to the value of relallvisible when updating stats at the end of
> > vacuum. The current code in heap_vacuum_rel() caps relallvisible to
> > relpages, so capping relallfrozen to relallvisible would follow that
> > pattern. However, the other places relallvisible is updated do no such
> > capping (do_analyze_rel(), index_update_stats()). It doesn't seem like
> > there is a good reason to do it one place and not the others. So, I
> > suggest either removing all the caps and adding a WARNING or capping
> > the value in all places. Because users can now manually update these
> > values in pg_class, there wouldn't be a way to detect the difference
> > between a bogus relallfrozen value due to VM corruption or a bogus
> > value due to manual statistics intervention. This led me to think that
> > a WARNING and no cap would be more effective for heap_vacuum_rel().
>
> I mean, does it really make any difference one way or the other?
>
> Given that users could manually update the catalog, we have to be able
> to tolerate bad data in the catalogs without the world ending. If that
> code has to exist anyway, then it's not mandatory to cap. On the other
> hand, there's no great virtue in refusing to correct data that we know
> to be wrong. Unless there is some other consideration which makes one
> way better than the other, this feels like author's choice.

I realized that whether or not we add a WARNING is an independent
question from whether or not we cap these values. In these instances,
we happen to have just read the whole VM and so we can tell you if it
is broken in a particular way. If I want to write a patch to warn
users of visibility map corruption after calling
visibilitymap_count(), I could do that and it might be a good idea,
but it should probably be a separate commit anyway.

- Melanie

Re: Trigger more frequent autovacuums of heavy insert tables

From

Melanie Plageman

Date:

26 February, 01:19:30

On Tue, Feb 25, 2025 at 3:05 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Feb 25, 2025 at 01:52:28PM -0500, Robert Haas wrote:
> > Given that users could manually update the catalog, we have to be able
> > to tolerate bad data in the catalogs without the world ending. If that
> > code has to exist anyway, then it's not mandatory to cap. On the other
> > hand, there's no great virtue in refusing to correct data that we know
> > to be wrong. Unless there is some other consideration which makes one
> > way better than the other, this feels like author's choice.
>
> Maybe the most conservative choice is to simply follow the example of
> surrounding code.  If it's careful to cap relallvisible to relpages, also
> have it cap relallfrozen.  If not, don't.  *shrug*

Agreed. I've done this in attached v10. I handle relallfrozen values >
relpages in the second patch in the set when using the relallfrozen
value, so I think we are all good.

> In any case, I don't want to hold up this patch on this relatively minor
> point.  This seems like something we could pretty easily change in the
> future if needed.

Yes, so one thing you haven't said yet is if you are +1 on going
forward with these patches in general.

As for the code, I'm not 100% convinced I've got all the stats
import/export bits perfect (those are changing under my feet right now
anyway).

- Melanie

Attachment

Re: Trigger more frequent autovacuums of heavy insert tables

From

Nathan Bossart

Date:

26 February, 01:33:27

On Tue, Feb 25, 2025 at 05:19:30PM -0500, Melanie Plageman wrote:
> Yes, so one thing you haven't said yet is if you are +1 on going
> forward with these patches in general.

Sorry, yes, I'm +1 in general.  It conceptually makes sense to me that we
should disregard frozen pages when deciding whether to do an insert vacuum,
and it's hard to argue with the results in your original post.  I also am
not overly concerned about worker starvation.  While this patch does give
higher priority to insert-only/mostly tables, it's also reducing the amount
of resources required to vacuum them, anyway.

-- 
nathan

Re: Trigger more frequent autovacuums of heavy insert tables

From

Robert Treat

Date:

26 February, 19:17:06

On Tue, Feb 25, 2025 at 4:29 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Tue, Feb 25, 2025 at 11:36 AM Greg Sabino Mullane <htamfids@gmail.com> wrote:
> >
> > On Tue, Feb 25, 2025 at 11:03 AM Melanie Plageman <melanieplageman@gmail.com> wrote:
> >>
> >> Because users can now manually update these values in pg_class, there wouldn't be a way to detect the difference
> >> between a bogus relallfrozen value due to VM corruption or a bogus value due to manual statistics intervention.
> >
> > Er..you had me until this. If manual monkeying of the system catalogs leads to a "bogus" error that resembles a
realone, then sow the wind, and reap the whirlwind. I don't think that should be a consideration here. 
>
> Oh, the WARNING would only show up in a case of actual VM corruption.
> The WARNING proposed is after calling visibilitymap_count() before
> updating pg_class, if the value we get from the VM for relallfrozen
> exceeds relallvisible. If you manually change pg_class
> relallfrozen/relallvisible to bogus values, you wouldn't get a
> warning. I meant there wouldn't be a way to detect the difference when
> viewing pg_class between bogus values because of a corrupt VM and
> bogus values because of manual updates to pg_class.
>

It strikes me as a bit odd to have this extra wording in the pg_class
documentation:

+ Every all-frozen page must also be marked
+       all-visible in the visibility map, so
+       <structfield>relallfrozen</structfield> should be less than or equal to
+       <structfield>relallvisible</structfield>. However, if either field is
+       updated manually or if the visibility map is corrupted, it is possible
+       for <structfield>relallfrozen</structfield> to exceed
+       <structfield>relallvisible</structfield>.

For example, we don't document that rellallvisible should never exceed
relpages, and we aren't normally in the habit of documenting weird
behavior that might happen if people go updating the system catalogs.
Maybe it's just me, but when I read this earlier, I thought there
might be some intended use case for updating the catalog manually that
you had in mind and so the comments were warranted (and indeed, it's
part of why I thought the warning would be useful for users). But upon
reading the thread more and another pass through your updated patches,
this doesn't seem to be the case, and I wonder if this language might
be more encouraging of people updating catalogs than we would
typically be.

Robert Treat
https://xzilla.net

Re: Trigger more frequent autovacuums of heavy insert tables

From

Nathan Bossart

Date:

26 February, 20:25:13

On Wed, Feb 26, 2025 at 11:17:06AM -0500, Robert Treat wrote:
> It strikes me as a bit odd to have this extra wording in the pg_class
> documentation:
> 
> + Every all-frozen page must also be marked
> +       all-visible in the visibility map, so
> +       <structfield>relallfrozen</structfield> should be less than or equal to
> +       <structfield>relallvisible</structfield>. However, if either field is
> +       updated manually or if the visibility map is corrupted, it is possible
> +       for <structfield>relallfrozen</structfield> to exceed
> +       <structfield>relallvisible</structfield>.
> 
> For example, we don't document that rellallvisible should never exceed
> relpages, and we aren't normally in the habit of documenting weird
> behavior that might happen if people go updating the system catalogs.
> Maybe it's just me, but when I read this earlier, I thought there
> might be some intended use case for updating the catalog manually that
> you had in mind and so the comments were warranted (and indeed, it's
> part of why I thought the warning would be useful for users). But upon
> reading the thread more and another pass through your updated patches,
> this doesn't seem to be the case, and I wonder if this language might
> be more encouraging of people updating catalogs than we would
> typically be.

+1.  If we did want to add more information about the ordinary expectations
of relallfrozen and friends here, I'd suggest doing so in a separate patch.
IMHO the usual "This is only an estimate..." wording is sufficient for the
introduction of relallfrozen.

-- 
nathan

Re: Trigger more frequent autovacuums of heavy insert tables

From

Melanie Plageman

Date:

27 February, 00:48:20

On Wed, Feb 26, 2025 at 12:25 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:
>
> On Wed, Feb 26, 2025 at 11:17:06AM -0500, Robert Treat wrote:
> > It strikes me as a bit odd to have this extra wording in the pg_class
> > documentation:
> >
> > + Every all-frozen page must also be marked
> > +       all-visible in the visibility map, so
> > +       <structfield>relallfrozen</structfield> should be less than or equal to
> > +       <structfield>relallvisible</structfield>. However, if either field is
> > +       updated manually or if the visibility map is corrupted, it is possible
> > +       for <structfield>relallfrozen</structfield> to exceed
> > +       <structfield>relallvisible</structfield>.
> >
> > For example, we don't document that rellallvisible should never exceed
> > relpages, and we aren't normally in the habit of documenting weird
> > behavior that might happen if people go updating the system catalogs.
> > Maybe it's just me, but when I read this earlier, I thought there
> > might be some intended use case for updating the catalog manually that
> > you had in mind and so the comments were warranted (and indeed, it's
> > part of why I thought the warning would be useful for users). But upon
> > reading the thread more and another pass through your updated patches,
> > this doesn't seem to be the case, and I wonder if this language might
> > be more encouraging of people updating catalogs than we would
> > typically be.
>
> +1.  If we did want to add more information about the ordinary expectations
> of relallfrozen and friends here, I'd suggest doing so in a separate patch.
> IMHO the usual "This is only an estimate..." wording is sufficient for the
> introduction of relallfrozen.

Makes sense. Thanks Robert and Nathan. Attached v11 changes the docs
wording and is rebased.

- Melanie

Attachment

Re: Trigger more frequent autovacuums of heavy insert tables

From

Nathan Bossart

Date:

28 February, 20:54:02

On Wed, Feb 26, 2025 at 04:48:20PM -0500, Melanie Plageman wrote:
> Makes sense. Thanks Robert and Nathan. Attached v11 changes the docs
> wording and is rebased.

0001 LGTM.

>          <para>
> -         Specifies a fraction of the table size to add to
> -         <varname>autovacuum_vacuum_insert_threshold</varname>
> -         when deciding whether to trigger a <command>VACUUM</command>.
> -         The default is <literal>0.2</literal> (20% of table size).
> -         This parameter can only be set in the <filename>postgresql.conf</filename>
> -         file or on the server command line;
> -         but the setting can be overridden for individual tables by
> -         changing table storage parameters.
> +        Specifies a fraction of the active (unfrozen) table size to add to
> +        <varname>autovacuum_vacuum_insert_threshold</varname>
> +        when deciding whether to trigger a <command>VACUUM</command>.
> +        The default is <literal>0.2</literal> (20% of active table size).
> +        This parameter can only be set in the <filename>postgresql.conf</filename>
> +        file or on the server command line;
> +        but the setting can be overridden for individual tables by
> +        changing table storage parameters.
>          </para>

nitpick: There might be an unintentional indentation change here.

I'm wondering about the use of the word "active," too.  While it's
qualified by the "(unfrozen)" after it, I'm worried it might not be
descriptive enough.  For example, I might consider a frozen page that's in
the buffer cache and is being read by queries to be "active."  And it
doesn't seem clear to me that it's referring to unfrozen pages and not
unfrozen tuples.  Perhaps we should say something like "a fraction of the
unfrozen pages in the table to add...".

> +        /*
> +         * If we have data for relallfrozen, calculate the unfrozen percentage
> +         * of the table to modify insert scale factor. This helps us decide
> +         * whether or not to vacuum an insert-heavy table based on the number
> +         * of inserts to the "active" part of the table.
> +         */
> +        if (relpages > 0 && relallfrozen > 0)

So, if we don't have this data, we just use reltuples, which is the
existing behavior and should trigger vacuums less aggressively than if we
_did_ have the data.  That seems like the correct choice to me.

> +            /*
> +             * It could be the stats were updated manually and relallfrozen >
> +             * relpages. Clamp relallfrozen to relpages to avoid nonsensical
> +             * calculations.
> +             */
> +            relallfrozen = Min(relallfrozen, relpages);
> +            pcnt_unfrozen = 1 - ((float4) relallfrozen / relpages);

Makes sense.

-- 
nathan

Re: Trigger more frequent autovacuums of heavy insert tables

From

wenhui qiu

Date:

01 March, 03:57:52

Hi
> + * It could be the stats were updated manually and relallfrozen >
> + * relpages. Clamp relallfrozen to relpages to avoid nonsensical
> + * calculations.
> + */
> + relallfrozen = Min(relallfrozen, relpages);
> + pcnt_unfrozen = 1 - ((float4) relallfrozen / relpages);
> + }
> +

Based on the comments, the pcnt_unfrozen value could potentially be 0, which would indicate that everything is frozen. Therefore, is it necessary to handle the case where the value is 0.？

On Sat, Mar 1, 2025 at 1:54 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Wed, Feb 26, 2025 at 04:48:20PM -0500, Melanie Plageman wrote:
> Makes sense. Thanks Robert and Nathan. Attached v11 changes the docs
> wording and is rebased.

0001 LGTM.

> <para>
> - Specifies a fraction of the table size to add to
> - <varname>autovacuum_vacuum_insert_threshold</varname>
> - when deciding whether to trigger a <command>VACUUM</command>.
> - The default is <literal>0.2</literal> (20% of table size).
> - This parameter can only be set in the <filename>postgresql.conf</filename>
> - file or on the server command line;
> - but the setting can be overridden for individual tables by
> - changing table storage parameters.
> + Specifies a fraction of the active (unfrozen) table size to add to
> + <varname>autovacuum_vacuum_insert_threshold</varname>
> + when deciding whether to trigger a <command>VACUUM</command>.
> + The default is <literal>0.2</literal> (20% of active table size).
> + This parameter can only be set in the <filename>postgresql.conf</filename>
> + file or on the server command line;
> + but the setting can be overridden for individual tables by
> + changing table storage parameters.
> </para>

nitpick: There might be an unintentional indentation change here.

I'm wondering about the use of the word "active," too. While it's
qualified by the "(unfrozen)" after it, I'm worried it might not be
descriptive enough. For example, I might consider a frozen page that's in
the buffer cache and is being read by queries to be "active." And it
doesn't seem clear to me that it's referring to unfrozen pages and not
unfrozen tuples. Perhaps we should say something like "a fraction of the
unfrozen pages in the table to add...".

> + /*
> + * If we have data for relallfrozen, calculate the unfrozen percentage
> + * of the table to modify insert scale factor. This helps us decide
> + * whether or not to vacuum an insert-heavy table based on the number
> + * of inserts to the "active" part of the table.
> + */
> + if (relpages > 0 && relallfrozen > 0)

So, if we don't have this data, we just use reltuples, which is the
existing behavior and should trigger vacuums less aggressively than if we
_did_ have the data. That seems like the correct choice to me.

> + /*
> + * It could be the stats were updated manually and relallfrozen >
> + * relpages. Clamp relallfrozen to relpages to avoid nonsensical
> + * calculations.
> + */
> + relallfrozen = Min(relallfrozen, relpages);
> + pcnt_unfrozen = 1 - ((float4) relallfrozen / relpages);

Makes sense.

--
nathan

Re: Trigger more frequent autovacuums of heavy insert tables

From

Nathan Bossart

Date:

01 March, 20:50:56

On Sat, Mar 01, 2025 at 08:57:52AM +0800, wenhui qiu wrote:
> Based on the comments, the  pcnt_unfrozen   value could potentially be 0,
> which would indicate that everything is frozen. Therefore,  is it necessary
> to handle the case where the value is 0.？

How so?  If it's 0, then the insert threshold calculation would produce
autovacuum_vacuum_insert_threshold, just like for an empty table.  That
seems like the intended behavior to me.

-- 
nathan

Re: Trigger more frequent autovacuums of heavy insert tables

From

Melanie Plageman

Date:

03 March, 20:18:37

On Fri, Feb 28, 2025 at 12:54 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:
>
> On Wed, Feb 26, 2025 at 04:48:20PM -0500, Melanie Plageman wrote:
> > Makes sense. Thanks Robert and Nathan. Attached v11 changes the docs
> > wording and is rebased.
>
> 0001 LGTM.

Cool. Corey checked over the stats import tests off-list and +1'd
them, so I went ahead and pushed this.

> >          <para>
> > -         Specifies a fraction of the table size to add to
<--snip-->
> >          </para>
>
> nitpick: There might be an unintentional indentation change here.

Yep, fixed it in attached v12, thanks.

> I'm wondering about the use of the word "active," too.  While it's
> qualified by the "(unfrozen)" after it, I'm worried it might not be
> descriptive enough.  For example, I might consider a frozen page that's in
> the buffer cache and is being read by queries to be "active."  And it
> doesn't seem clear to me that it's referring to unfrozen pages and not
> unfrozen tuples.  Perhaps we should say something like "a fraction of the
> unfrozen pages in the table to add...".

Done. Thanks for the suggestion.

I noticed the docs wording is kind of different than that in
postgresql.conf.sample. The docs wording mentions that the scale
factor gets added to the threshold and postgresql.conf.sample does not
(in master as well). I just wanted to make sure my updates to
postgresql.conf.sample sound accurate.

> > +             /*
> > +              * If we have data for relallfrozen, calculate the unfrozen percentage
> > +              * of the table to modify insert scale factor. This helps us decide
> > +              * whether or not to vacuum an insert-heavy table based on the number
> > +              * of inserts to the "active" part of the table.
> > +              */
> > +             if (relpages > 0 && relallfrozen > 0)
>
> So, if we don't have this data, we just use reltuples, which is the
> existing behavior and should trigger vacuums less aggressively than if we
> _did_ have the data.  That seems like the correct choice to me.

Yep.

- Melanie

Attachment

v12-0001-Trigger-more-frequent-autovacuums-with-relallfro.patch

Re: Trigger more frequent autovacuums of heavy insert tables

From

Nathan Bossart

Date:

03 March, 21:11:37

On Mon, Mar 03, 2025 at 12:18:37PM -0500, Melanie Plageman wrote:
> I noticed the docs wording is kind of different than that in
> postgresql.conf.sample. The docs wording mentions that the scale
> factor gets added to the threshold and postgresql.conf.sample does not
> (in master as well). I just wanted to make sure my updates to
> postgresql.conf.sample sound accurate.

> -#autovacuum_vacuum_insert_scale_factor = 0.2    # fraction of inserts over table
> -                        # size before insert vacuum
> +#autovacuum_vacuum_insert_scale_factor = 0.2    # fraction of unfrozen pages
> +            # inserted to before insert vacuum

Hm.  So for autovacuum_vacuum_scale_factor and
autovacuum_analyze_scale_factor, we have

    "fraction of table size before vacuum"
    "fraction of table size before analyze"

And currently, for autovacuum_vacuum_insert_scale_factor, we have

    "fraction of inserts over table size before insert vacuum"

I think the key change here is that we are replacing "table size" with
"unfrozen pages," which would give us

    "fraction of inserts over unfrozen pages before insert vacuum"

However, I find that unclear, if for no other reason than I'm not sure what
"over" means in this context.  A reader might think that refers to the
fraction (i.e., inserts / unfrozen pages), or they might think it means
"on" or "above."  IMHO your proposed wording is much clearer.  The only
part that feels a bit awkward to me is "to before".  My mind wants the "to"
to be followed with a verb, so I stumbled the first time I read it and had
to read it again.  Perhaps we could just say

    "fraction of unfrozen pages before insert vacuum"

That more closely matches the other scale factor parameters.  It's
admittedly quite terse, but so are the vast majority of other descriptions
in the sample file.  I don't think those are intended to serve as
substitutes for the "real" documentation.

I would make the same argument for leaving out the base threshold.  While
it is important information, these extremely concise descriptions don't
really have enough room for the nitty-gritty details.

-- 
nathan

Re: Trigger more frequent autovacuums of heavy insert tables

From

Melanie Plageman

Date:

03 March, 22:47:33

On Mon, Mar 3, 2025 at 1:11 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
>         "fraction of unfrozen pages before insert vacuum"
>
> That more closely matches the other scale factor parameters.  It's
> admittedly quite terse, but so are the vast majority of other descriptions
> in the sample file.  I don't think those are intended to serve as
> substitutes for the "real" documentation.

cool, I've changed it to this, pushed, and marked the commitfest entry
as committed.
Thanks so much for your attention and review!

- Melanie