Re: Berserk Autovacuum (let's save next Mandrill) - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Berserk Autovacuum (let's save next Mandrill) |
Date | |
Msg-id | CAApHDvok5hmXr+bUbJe7+2sQzWo4B_QzSk7RKFR9fP6BjYXx5g@mail.gmail.com Whole thread Raw |
In response to | Re: Berserk Autovacuum (let's save next Mandrill) (David Rowley <dgrowleyml@gmail.com>) |
List | pgsql-hackers |
On Mon, 30 Mar 2020 at 19:49, David Rowley <dgrowleyml@gmail.com> wrote: > I'll see if I can come up with some way to do this in a more > deterministic way to determine which tables to add vacuums for, rather > than waiting for and reacting post-failure. I ended up running make installcheck on an instance with autovacuum_naptime set to 1s with a small additional debug line in autovacuum.c, namely: diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 7e97ffab27..ad81e321dc 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -3099,6 +3099,9 @@ relation_needs_vacanalyze(Oid relid, *dovacuum = force_vacuum || (vactuples > vacthresh) || (vac_ins_base_thresh >= 0 && instuples > vacinsthresh); *doanalyze = (anltuples > anlthresh); + + if (vac_ins_base_thresh >= 0 && instuples > vacinsthresh) + elog(LOG, "******** %s", NameStr(classForm->relname)); } else { I grepped the log after the installcheck to grab the table names that saw an insert vacuum during the test then grepped the test output to see if the table appears to pose a risk of test instability. I've classed each table with a risk factor. "VeryLow" seems like there's almost no risk because we don't ever look at EXPLAIN. Low risk tables look at EXPLAIN, but I feel are not quite looking in enough detail to cause issues. Medium risk look at EXPLAIN and I feel there's a risk of some change, I think these are all Append nodes which do order subnodes based on their cost. High risk.... those are the ones I'm about to look into changing. The full results of my analysis are: Table: agg_group_1 aggregates.out. Nothing looks at EXPLAIN. Risk:VeryLow Table: agg_hash_1 aggregates.out. Nothing looks at EXPLAIN. Risk:VeryLow Table: atest12 privileges.out. Lots of looking at EXPLAIN, but nothing appears to look into row estimates in detail. Risk:Low Table: brin_test brin.out. Test already does VACUUM ANALYZE. Risk:VeryLow Table: bt_f8_heap btree_index.out, create_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: bt_i4_heap btree_index.out, create_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: bt_name_heap btree_index.out, create_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: bt_txt_heap btree_index.out, create_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: dupindexcols create_index.out. Some looking at EXPLAIN plans, but nothing appears to look into row estimates in detail. Risk:Low Table: fast_emp4000 create_am.out, create_index.out, create_misc.out. Lots of looking at EXPLAIN, but nothing appears to look into row estimates in detail. Risk:Low Table: functional_dependencies stats_ext.out. Lots of looking at EXPLAIN output. Test looks at row estimates. Risk:High Table: gist_tbl gist.out. Lots of looking at EXPLAIN, but nothing appears to look into row estimates in detail. Risk:Low Table: hash_f8_heap hash_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: hash_i4_heap hash_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: hash_name_heap hash_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: hash_txt_heap hash_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: kd_point_tbl create_index_spgist.out. Lots of looking at EXPLAIN, but nothing appears to look into row estimates in detail. Risk:Low Table: mcv_lists stats_ext.out. Lots of looking at EXPLAIN, but tests appear to VACUUM after loading rows. Risk:Low Table: mcv_lists_arrays stats_ext.out. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: mcv_lists_bool stats_ext.out. Lots of looking at EXPLAIN output. Test looks at row estimates. Risk:High Table: ndistinct stats_ext.out. Lots of looking at EXPLAIN output. Test looks at row estimates. Only 1000 rows are loaded initially and then 5000 after a truncate. 1000 rows won't trigger the auto-vacuum. Risk:High Table: onek Lots of files. Sees a VACUUM in sanity_check test, however, some tests run before sanity_check, e.g. create_index, select, copy, none of which appear to pay particular attention to anything vacuum might change. Risk:Low Table: pagg_tab_ml_p2_s1 partition_aggregate.out Appears to be some risk of Append reordering partitions based on cost. Risk:Medium Table: pagg_tab_ml_p2_s2 partition_aggregate.out Appears to be some risk of Append reordering partitions based on cost. Risk:Medium Table: pagg_tab_ml_p3_s1 partition_aggregate.out Appears to be some risk of Append reordering partitions based on cost. Risk:Medium Table: pagg_tab_ml_p3_s2 partition_aggregate.out Appears to be some risk of Append reordering partitions based on cost. Risk:Medium Table: pagg_tab_para_p1 partition_aggregate.out Appears to be some risk of Append reordering partitions based on cost. Risk:Medium Table: pagg_tab_para_p2 partition_aggregate.out Appears to be some risk of Append reordering partitions based on cost. Risk:Medium Table: pagg_tab_para_p3 partition_aggregate.out Appears to be some risk of Append reordering partitions based on cost. Risk:Medium Table: pg_attribute Seen in several tests. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: pg_depend Seen in several tests. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: pg_largeobject Seen in several tests. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: quad_box_tbl box.out. Sees some use of EXPLAIN, but nothing looks critical. Risk:Low Table: quad_box_tbl_ord_seq1 box.out. No EXPLAIN usage. Risk:VeryLow Table: quad_box_tbl_ord_seq2 box.out. No EXPLAIN usage. Risk:VeryLow Table: quad_point_tbl create_index_spgist.out Sees some use of EXPLAIN. Index Only Scans are already being used. Risk:Low Table: quad_poly_tbl polygon.out Some usages of EXPLAIN. Risk:Low Table: radix_text_tbl create_index_spgist.out Some usages of EXPLAIN. Risk:Low Table: road various tests. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: slow_emp4000 various tests. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: spgist_box_tbl spgist.out. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: spgist_point_tbl spgist.out. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: spgist_text_tbl spgist.out. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: tenk1 aggregates.out, groupingsets.out, join.out, limit.out, misc_functions.out, rowtypes.out,select_distinct.out, select_parallel.out, subselect.out, tablesample.out, tidscan.out, union.out, window.out and write_parallel.out are after vacuum in sanity_check. EXPLAIN used in create_index.out and inherit.out, which are all run before sanity_check does the vacuum. Risk:Medium Table: tenk2 Only sees EXPLAIN usages in select_parallel.out, which is after the table is vacuumed in sanity_check. Risk:Low Table: test_range_gist rangetypes.out. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: test_range_spgist rangetypes.out. Some EXPLAIN usage. Risk:Low Table: testjsonb jsonb.out. Some EXPLAIN usage. Risk:Low Table: transition_table_level2 plpgsql.out. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: transition_table_status plpgsql.out. Nothing appears to look at EXPLAIN. Risk:VeryLow I'd like to wait to see if we get failures for the ones I've classed as medium risk. David
pgsql-hackers by date: