Re: Testing autovacuum wraparound (including failsafe) - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Testing autovacuum wraparound (including failsafe) |
Date | |
Msg-id | CAH2-Wz=f2dfc-0BcM5X5ttXTeRNPg+JtxEMCuSZR82a7XHfRkg@mail.gmail.com Whole thread Raw |
In response to | Testing autovacuum wraparound (including failsafe) (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Testing autovacuum wraparound (including failsafe)
|
List | pgsql-hackers |
On Fri, Apr 23, 2021 at 1:43 PM Andres Freund <andres@anarazel.de> wrote: > I started to write a test for $Subject, which I think we sorely need. +1 > Currently my approach is to: > - start a cluster, create a few tables with test data > - acquire SHARE UPDATE EXCLUSIVE in a prepared transaction, to prevent > autovacuum from doing anything > - cause dead tuples to exist > - restart > - run pg_resetwal -x 2000027648 > - do things like acquiring pins on pages that block vacuum from progressing > - commit prepared transaction > - wait for template0, template1 datfrozenxid to increase > - wait for relfrozenxid for most relations in postgres to increase > - release buffer pin > - wait for postgres datfrozenxid to increase Just having a standard-ish way to do stress testing like this would add something. > 2) FAILSAFE_MIN_PAGES is 4GB - which seems to make it infeasible to test the > failsafe mode, we can't really create 4GB relations on the BF. While > writing the tests I've lowered this to 4MB... The only reason that I chose 4GB for FAILSAFE_MIN_PAGES is because the related VACUUM_FSM_EVERY_PAGES constant was 8GB -- the latter limits how often we'll consider the failsafe in the single-pass/no-indexes case. I see no reason why it cannot be changed now. VACUUM_FSM_EVERY_PAGES also frustrates FSM testing in the single-pass case in about the same way, so maybe that should be considered as well? Note that the FSM handling for the single pass case is actually a bit different to the two pass/has-indexes case, since the single pass case calls lazy_vacuum_heap_page() directly in its first and only pass over the heap (that's the whole point of having it of course). > 3) pg_resetwal -x requires to carefully choose an xid: It needs to be the > first xid on a clog page. It's not hard to determine which xids are but it > depends on BLCKSZ and a few constants in clog.c. I've for now hardcoded a > value appropriate for 8KB, but ... Ugh. > For 2), I don't really have a better idea than making that configurable > somehow? That could make sense as a developer/testing option, I suppose. I just doubt that it makes sense as anything else. > 2021-04-23 13:32:30.899 PDT [2027738] LOG: automatic aggressive vacuum to prevent wraparound of table "postgres.public.small_trunc":index scans: 1 > pages: 400 removed, 28 remain, 0 skipped due to pins, 0 skipped frozen > tuples: 14000 removed, 1000 remain, 0 are dead but not yet removable, oldest xmin: 2000027651 > buffer usage: 735 hits, 1262 misses, 874 dirtied > index scan needed: 401 pages from table (1432.14% of total) had 14000 dead item identifiers removed > index "small_trunc_pkey": pages: 43 in total, 37 newly deleted, 37 currently deleted, 0 reusable > avg read rate: 559.048 MB/s, avg write rate: 387.170 MB/s > system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s > WAL usage: 1809 records, 474 full page images, 3977538 bytes > > '1432.14% of total' - looks like removed pages need to be added before the > percentage calculation? Clearly this needs to account for removed heap pages in order to consistently express the percentage of pages with LP_DEAD items in terms of a percentage of the original table size. I can fix this shortly. -- Peter Geoghegan
pgsql-hackers by date: