On Fri, Feb 7, 2025 at 3:38 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Fri, Feb 07, 2025 at 02:21:07PM -0500, Melanie Plageman wrote:
> > On Fri, Feb 7, 2025 at 12:37 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >>
> >> Wouldn't relallvisible be sufficient here? We'll skip all-visible pages
> >> unless this is an anti-wraparound vacuum, at which point I would think the
> >> insert threshold goes out the window.
> >
> > It's a great question. There are a couple reasons why I don't think so.
> >
> > I think this might lead to triggering vacuums too often for
> > insert-mostly tables. For those tables, the pages that are not
> > all-visible will largely be just those with data that is new since the
> > last vacuum. And if we trigger vacuums based off of the % not
> > all-visible, we might decrease the number of cases where we are able
> > to vacuum inserted data and freeze it the first time it is vacuumed --
> > thereby increasing the total amount of work.
>
> Rephrasing to make sure I understand correctly: you're saying that using
> all-frozen would trigger less frequent insert vacuums, which would give us
> a better chance of freezing more than more frequent insert vacuums
> triggered via all-visible? My suspicion is that the difference would tend
> to be quite subtle in practice, but I have no concrete evidence to back
> that up.
You understood me correctly.
As for relallfrozen, one of the justifications for adding it to
pg_class is actually for the visibility it would provide. We have no
way of knowing how many all-visible but not all-frozen pages there are
on users' systems without pg_visibility. If users had this
information, they could potentially tune their freeze-related settings
more aggressively. Regularly reading the whole visibility map with
pg_visibilitymap_summary() is pretty hard to justify on most
production systems. But querying pg_class every 10 minutes or
something is much more reasonable.
- Melanie