Home > mailing lists

Testing autovacuum wraparound (including failsafe) - Mailing list pgsql-hackers

From	Andres Freund
Subject	Testing autovacuum wraparound (including failsafe)
Date	April 23, 2021 23:43:06
Msg-id	20210423204306.5osfpkt2ggaedyvy@alap3.anarazel.de Whole thread Raw
Responses	Re: Testing autovacuum wraparound (including failsafe) (Justin Pryzby <pryzby@telsasoft.com>) Re: Testing autovacuum wraparound (including failsafe) (Peter Geoghegan <pg@bowt.ie>) Re: Testing autovacuum wraparound (including failsafe) (Anastasia Lubennikova <lubennikovaav@gmail.com>)
List	pgsql-hackers

Tree view

Hi,

I started to write a test for $Subject, which I think we sorely need.

Currently my approach is to:
- start a cluster, create a few tables with test data
- acquire SHARE UPDATE EXCLUSIVE in a prepared transaction, to prevent
  autovacuum from doing anything
- cause dead tuples to exist
- restart
- run pg_resetwal -x 2000027648
- do things like acquiring pins on pages that block vacuum from progressing
- commit prepared transaction
- wait for template0, template1 datfrozenxid to increase
- wait for relfrozenxid for most relations in postgres to increase
- release buffer pin
- wait for postgres datfrozenxid to increase

So far so good. But I've encountered a few things that stand in the way of
enabling such a test by default:

1) During startup StartupSUBTRANS() zeroes out all pages between
   oldestActiveXID and nextXid. That takes 8s on my workstation, but only
   because I have plenty memory - pg_subtrans ends up 14GB as I currently do
   the test. Clearly not something we could do on the BF.

2) FAILSAFE_MIN_PAGES is 4GB - which seems to make it infeasible to test the
   failsafe mode, we can't really create 4GB relations on the BF. While
   writing the tests I've lowered this to 4MB...

3) pg_resetwal -x requires to carefully choose an xid: It needs to be the
   first xid on a clog page. It's not hard to determine which xids are but it
   depends on BLCKSZ and a few constants in clog.c. I've for now hardcoded a
   value appropriate for 8KB, but ...


I have 2 1/2 ideas about addressing 1);

- We could exposing functionality to do advance nextXid to a future value at
  runtime, without filling in clog/subtrans pages. Would probably have to live
  in varsup.c and be exposed via regress.so or such?

- The only reason StartupSUBTRANS() does that work is because of the prepared
  transaction holding back oldestActiveXID. That transaction in turn exists to
  prevent autovacuum from doing anything before we do test setup
  steps.

  Perhaps it'd be sufficient to set autovacuum_naptime really high initially,
  perform the test setup, set naptime to something lower, reload config. But
  I'm worried that might not be reliable: If something ends up allocating an
  xid we'd potentially reach the path in GetNewTransaction() that wakes up the
  launcher?  But probably there wouldn't be anything doing so?

  Another aspect that might not make this a good choice is that it actually
  seems relevant to be able to test cases where there are very old still
  running transactions...

- As a variant of the previous idea: If that turns out to be unreliable, we
  could instead set nextxid, start in single user mode, create a blocking 2PC
  transaction, start normally. Because there's no old active xid we'd not run
  into the StartupSUBTRANS problem.


For 2), I don't really have a better idea than making that configurable
somehow?

3) is probably tolerable for now, we could skip the test if BLCKSZ isn't 8KB,
or we could hardcode the calculation for different block sizes.



I noticed one minor bug that's likely new:

2021-04-23 13:32:30.899 PDT [2027738] LOG:  automatic aggressive vacuum to prevent wraparound of table
"postgres.public.small_trunc":index scans: 1
 
        pages: 400 removed, 28 remain, 0 skipped due to pins, 0 skipped frozen
        tuples: 14000 removed, 1000 remain, 0 are dead but not yet removable, oldest xmin: 2000027651
        buffer usage: 735 hits, 1262 misses, 874 dirtied
        index scan needed: 401 pages from table (1432.14% of total) had 14000 dead item identifiers removed
        index "small_trunc_pkey": pages: 43 in total, 37 newly deleted, 37 currently deleted, 0 reusable
        avg read rate: 559.048 MB/s, avg write rate: 387.170 MB/s
        system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s
        WAL usage: 1809 records, 474 full page images, 3977538 bytes

'1432.14% of total' - looks like removed pages need to be added before the
percentage calculation?

Greetings,

Andres Freund

pgsql-hackers by date:

From: Robert Haas
Date: 23 April 2021, 23:31:36
Subject: Re: pg_amcheck contrib application

From: Mark Dilger
Date: 24 April 2021, 01:01:32
Subject: Re: pg_amcheck contrib application

Testing autovacuum wraparound (including failsafe) - Mailing list pgsql-hackers

Previous

Next