Re: Berserk Autovacuum (let's save next Mandrill) - Mailing list pgsql-hackers

From David Rowley
Subject Re: Berserk Autovacuum (let's save next Mandrill)
Date
Msg-id CAApHDvok5hmXr+bUbJe7+2sQzWo4B_QzSk7RKFR9fP6BjYXx5g@mail.gmail.com
Whole thread Raw
In response to Re: Berserk Autovacuum (let's save next Mandrill)  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Mon, 30 Mar 2020 at 19:49, David Rowley <dgrowleyml@gmail.com> wrote:
> I'll see if I can come up with some way to do this in a more
> deterministic way to determine which tables to add vacuums for, rather
> than waiting for and reacting post-failure.

I ended up running make installcheck on an instance with
autovacuum_naptime set to 1s with a small additional debug line in
autovacuum.c, namely:

diff --git a/src/backend/postmaster/autovacuum.c
b/src/backend/postmaster/autovacuum.c
index 7e97ffab27..ad81e321dc 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -3099,6 +3099,9 @@ relation_needs_vacanalyze(Oid relid,
                *dovacuum = force_vacuum || (vactuples > vacthresh) ||
                                        (vac_ins_base_thresh >= 0 &&
instuples > vacinsthresh);
                *doanalyze = (anltuples > anlthresh);
+
+               if (vac_ins_base_thresh >= 0 && instuples > vacinsthresh)
+                       elog(LOG, "******** %s", NameStr(classForm->relname));
        }
        else
        {

I grepped the log after the installcheck to grab the table names that
saw an insert vacuum during the test then grepped the test output to
see if the table appears to pose a risk of test instability.

I've classed each table with a risk factor. "VeryLow" seems like
there's almost no risk because we don't ever look at EXPLAIN.  Low
risk tables look at EXPLAIN, but I feel are not quite looking in
enough detail to cause issues. Medium risk look at EXPLAIN and I feel
there's a risk of some change, I think these are all Append nodes
which do order subnodes based on their cost. High risk.... those are
the ones I'm about to look into changing.

The full results of my analysis are:

Table: agg_group_1 aggregates.out. Nothing looks at EXPLAIN. Risk:VeryLow
Table: agg_hash_1 aggregates.out. Nothing looks at EXPLAIN. Risk:VeryLow
Table: atest12 privileges.out. Lots of looking at EXPLAIN, but nothing
appears to look into row estimates in detail. Risk:Low
Table: brin_test brin.out.  Test already does VACUUM ANALYZE. Risk:VeryLow
Table: bt_f8_heap btree_index.out, create_index.out. Rows loaded in
copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: bt_i4_heap btree_index.out, create_index.out. Rows loaded in
copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: bt_name_heap btree_index.out, create_index.out. Rows loaded in
copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: bt_txt_heap btree_index.out, create_index.out. Rows loaded in
copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: dupindexcols create_index.out. Some looking at EXPLAIN plans,
but nothing appears to look into row estimates in detail. Risk:Low
Table: fast_emp4000 create_am.out, create_index.out, create_misc.out.
Lots of looking at EXPLAIN, but nothing appears to look into row
estimates in detail. Risk:Low
Table: functional_dependencies stats_ext.out. Lots of looking at
EXPLAIN output. Test looks at row estimates. Risk:High
Table: gist_tbl gist.out. Lots of looking at EXPLAIN, but nothing
appears to look into row estimates in detail. Risk:Low
Table: hash_f8_heap hash_index.out. Rows loaded in copy.source.
Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: hash_i4_heap hash_index.out. Rows loaded in copy.source.
Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: hash_name_heap hash_index.out. Rows loaded in copy.source.
Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: hash_txt_heap hash_index.out. Rows loaded in copy.source.
Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: kd_point_tbl create_index_spgist.out. Lots of looking at
EXPLAIN, but nothing appears to look into row estimates in detail.
Risk:Low
Table: mcv_lists stats_ext.out. Lots of looking at EXPLAIN, but tests
appear to VACUUM after loading rows. Risk:Low
Table: mcv_lists_arrays stats_ext.out. Nothing appears to look at
EXPLAIN. Risk:VeryLow
Table: mcv_lists_bool stats_ext.out.  Lots of looking at EXPLAIN
output. Test looks at row estimates. Risk:High
Table: ndistinct stats_ext.out.  Lots of looking at EXPLAIN output.
Test looks at row estimates. Only 1000 rows are loaded initially and
then 5000 after a truncate. 1000 rows won't trigger the auto-vacuum.
Risk:High
Table: onek Lots of files. Sees a VACUUM in sanity_check test,
however, some tests run before sanity_check, e.g. create_index,
select, copy, none of which appear to pay particular attention to
anything vacuum might change. Risk:Low
Table: pagg_tab_ml_p2_s1 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pagg_tab_ml_p2_s2 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pagg_tab_ml_p3_s1 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pagg_tab_ml_p3_s2 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pagg_tab_para_p1 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pagg_tab_para_p2 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pagg_tab_para_p3 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pg_attribute Seen in several tests. Nothing appears to look at
EXPLAIN. Risk:VeryLow
Table: pg_depend Seen in several tests. Nothing appears to look at
EXPLAIN. Risk:VeryLow
Table: pg_largeobject Seen in several tests. Nothing appears to look
at EXPLAIN. Risk:VeryLow
Table: quad_box_tbl box.out. Sees some use of EXPLAIN, but nothing
looks critical. Risk:Low
Table: quad_box_tbl_ord_seq1 box.out. No EXPLAIN usage. Risk:VeryLow
Table: quad_box_tbl_ord_seq2 box.out. No EXPLAIN usage. Risk:VeryLow
Table: quad_point_tbl create_index_spgist.out Sees some use of
EXPLAIN. Index Only Scans are already being used. Risk:Low
Table: quad_poly_tbl polygon.out Some usages of EXPLAIN. Risk:Low
Table: radix_text_tbl create_index_spgist.out Some usages of EXPLAIN. Risk:Low
Table: road various tests. Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: slow_emp4000 various tests. Nothing appears to look at EXPLAIN.
Risk:VeryLow
Table: spgist_box_tbl spgist.out. Nothing appears to look at EXPLAIN.
Risk:VeryLow
Table: spgist_point_tbl spgist.out. Nothing appears to look at
EXPLAIN. Risk:VeryLow
Table: spgist_text_tbl spgist.out. Nothing appears to look at EXPLAIN.
Risk:VeryLow
Table: tenk1 aggregates.out, groupingsets.out, join.out, limit.out,
misc_functions.out, rowtypes.out,select_distinct.out,
select_parallel.out, subselect.out, tablesample.out, tidscan.out,
union.out, window.out and write_parallel.out are after vacuum in
sanity_check. EXPLAIN used in create_index.out and inherit.out, which
are all run before sanity_check does the vacuum. Risk:Medium
Table: tenk2 Only sees EXPLAIN usages in select_parallel.out, which is
after the table is vacuumed in sanity_check. Risk:Low
Table: test_range_gist rangetypes.out. Nothing appears to look at
EXPLAIN. Risk:VeryLow
Table: test_range_spgist rangetypes.out. Some EXPLAIN usage. Risk:Low
Table: testjsonb jsonb.out. Some EXPLAIN usage. Risk:Low
Table: transition_table_level2 plpgsql.out. Nothing appears to look at
EXPLAIN. Risk:VeryLow
Table: transition_table_status plpgsql.out. Nothing appears to look at
EXPLAIN. Risk:VeryLow

I'd like to wait to see if we get failures for the ones I've classed
as medium risk.

David



pgsql-hackers by date:

Previous
From: "Tom Turelinckx"
Date:
Subject: Re: snapper vs. HEAD
Next
From: Amit Kapila
Date:
Subject: Re: WAL usage calculation patch