Re: Remaining case where reltuples can become distorted across multiple VACUUM operations - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Remaining case where reltuples can become distorted across multiple VACUUM operations
Date
Msg-id CAEze2WhPL90hdRPc-UoegT0t6yXqNe4vOebDkjVgJNeorg-mDg@mail.gmail.com
Whole thread Raw
In response to Re: Remaining case where reltuples can become distorted across multiple VACUUM operations  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Remaining case where reltuples can become distorted across multiple VACUUM operations
List pgsql-hackers
On Mon, 8 Aug 2022 at 17:49, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Mon, Aug 8, 2022 at 8:33 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > For example, if currently the measured 2% of the pages contains more
> > than 100% of the previous count of tuples, or with your patch the last
> > page contains more than 100% of the previous count of the tuples, that
> > new count is ignored, which seems silly considering that the vacuum
> > count is supposed to be authorative.
>
> The 2% thing is conditioned on the new relpages value precisely
> matching the existing relpages from pg_class -- which makes it very
> targeted. I don't see why scanned_tuples greatly exceeding the
> existing reltuples from pg_class is interesting (any more interesting
> than the other way around).

Because if a subset of the pages of a relation contains more tuples
than your current total expected tuples in the table, you should
update your expectations regardless of which blocks or which number of
blocks you've scanned - the previous stored value is a strictly worse
estimation than your last measurement.

> We'll always accept scanned_tuples as authoritative when VACUUM
> actually scans all pages, no matter what. Currently it isn't possible
> for VACUUM to skip pages in a table that is 32 pages or less in size.
> So even the new "single page" thing from the patch cannot matter
> there.

A 33-block relation with first 32 1-tuple pages is still enough to
have a last page with 250 tuples, which would be ignored in that
scheme and have a total tuple count of 33 or so. Sure, this is an
artificial sample, but you can construct similarly wrong vacuum
samples: Two classes of tuples that have distinct update regimes, one
with 32B-tuples and one with MaxFreeSpaceRequestSize-d tuples. When
you start running VACUUM against these separate classes of updated
blocks you'll see that the relation tuple count will also tend to
1*nblocks, due to the disjoint nature of these updates and the
tendency to ignore all updates to densely packed blocks.

With current code, we ignore the high counts of those often-updated
blocks and expect low density in the relation, precisely because we
ignore areas that are extremely dense and updated in VACUUM cycles
that are different from bloated blocks.

Kind regards,

Matthias van de Meent



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: make update-po@master stops at pg_upgrade
Next
From: Tom Lane
Date:
Subject: Re: bug on log generation ?