Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date
Msg-id 20230118210844.2oy62up7gozhuwbb@awork3.anarazel.de
Whole thread Raw
In response to Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Andres Freund <andres@anarazel.de>)
Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Peter Geoghegan <pg@bowt.ie>)
Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2023-01-18 12:15:17 -0800, Peter Geoghegan wrote:
> On Wed, Jan 18, 2023 at 11:02 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > On Wed, Jan 18, 2023 at 1:31 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > > pgstat_report_analyze() will totally override the
> > > tabentry->dead_tuples information that drives autovacuum.c, based on
> > > an estimate derived from a random sample -- which seems to me to be an
> > > approach that just doesn't have any sound theoretical basis.
> >
> > Yikes. I think we don't have a choice but to have a method to correct
> > the information somehow, because AFAIK the statistics system is not
> > crash-safe. But that approach does seem to carry significant risk of
> > overwriting correct information with wrong information.

I suggested nearby to only have ANALYZE dead_tuples it if there's been no
[auto]vacuum since the stats entry was created. That allows recovering from
stats resets, be it via crashes or explicitly. What do you think?

To add insult to injury, we overwrite accurate information gathered by VACUUM
with bad information gathered by ANALYZE if you do VACUUM ANALYZE.



One complicating factor is that VACUUM sometimes computes an incrementally
more bogus n_live_tup when it skips pages due to the VM, whereas ANALYZE
computes something sane. I unintentionally encountered one when I was trying
something while writing this email, reproducer attached.


VACUUM (DISABLE_PAGE_SKIPPING) foo;
SELECT n_live_tup, n_dead_tup FROM pg_stat_user_tables WHERE relid = 'foo'::regclass;
┌────────────┬────────────┐
│ n_live_tup │ n_dead_tup │
├────────────┼────────────┤
│    9000001 │     500000 │
└────────────┴────────────┘

after one VACUUM:
┌────────────┬────────────┐
│ n_live_tup │ n_dead_tup │
├────────────┼────────────┤
│    8549905 │     500000 │
└────────────┴────────────┘

after 9 more VACUUMs:
┌────────────┬────────────┐
│ n_live_tup │ n_dead_tup │
├────────────┼────────────┤
│    5388421 │     500000 │
└────────────┴────────────┘
(1 row)


I briefly tried it out, and it does *not* reproduce in 11, but does in
12. Haven't dug into what the cause is, but we probably use the wrong
denominator somewhere...

Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Extracting cross-version-upgrade knowledge from buildfarm client
Next
From: Peter Geoghegan
Date:
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation