Re: another autovacuum scheduling thread - Mailing list pgsql-hackers
| From | Sami Imseih | 
|---|---|
| Subject | Re: another autovacuum scheduling thread | 
| Date | |
| Msg-id | CAA5RZ0tpV3PRHejTGG5-LSsqNKKV0qP=SDvurs8wj7pTk7jYJw@mail.gmail.com Whole thread Raw | 
| In response to | Re: another autovacuum scheduling thread (Sami Imseih <samimseih@gmail.com>) | 
| List | pgsql-hackers | 
> FWIW, I've been putting some scripts together to test some workloads > and I will share shortly what I have. Here is my attempt to test the behavior with the new prioritization. I wanted a way to run the same tests with different workloads, both with and without the prioritization patch, and to see if anything stands out as suspicious in terms of autovacuum or autoanalyze activity. For example, certain tables showing too little or too much autovacuum activity. The scripts I put together (attached) run a busy update workload (OLTP) and a separate batch workload. They use pgbench to execute custom scripts that are generated on the fly. The results are summarized by the average number of autovacuum and autoanalyze runs *per table*, along with some other DML activity stats to ensure that the workloads being compared have similar DML activity. Using the scripts: Place the attached scripts in a specific directory, and modify the section under "Caller should adjust these values" in run_workloads.sh to adjust the workload. The scripts assume you have a running cluster with your specific config file adjusted for the test. Once ready, call run_workloads.sh and at the end a summary will show up as you see below. Hopefully it works for you :) The summary.sh script can also be run while the workloads are executing. Here is a example of a test I wanted to run based on the discussion [0]: This scenario is one that was mentioned, but there are others in which a batch process performing inserts only is prioritized over the update workload. I ran this test for 10 minutes, using 200 clients for the update workload and 5 clients for the batch workload, with the following configuration: ``` max_connections=1000; autovacuum_naptime = '10s' shared_buffers = '4GB' autovacuum_max_workers = 6 ``` -- HEAD ``` Total Activity -[ RECORD 1 ]-------------+---------- total_n_dead_tup | 985183 total_n_mod_since_analyze | 220294866 total_reltuples | 247690373 total_autovacuum_count | 137 total_autoanalyze_count | 470 total_n_tup_upd | 7720012 total_n_tup_ins | 446683000 table_count | 105 Activity By Workload Type -[ RECORD 1 ]-----------------+---------------- table_group | batch_tables ** avg_autovacuum_count | 7.400 ** avg_autoanalyze_count | 8.000 avg_vacuum_count | 0.000 avg_analyze_count | 0.000 rows_inserted | 436683000 rows_updated | 0 rows_hot_updated | 0 table_count | 5 -[ RECORD 2 ]-----------------+---------------- table_group | numbered_tables ** avg_autovacuum_count | 1.000 ** avg_autoanalyze_count | 4.300 avg_vacuum_count | 1.000 avg_analyze_count | 0.000 rows_inserted | 10000000 rows_updated | 7720012 rows_hot_updated | 7094573 table_count | 100 ``` -- with v7 applied ``` Total Activity -[ RECORD 1 ]-------------+---------- total_n_dead_tup | 1233045 total_n_mod_since_analyze | 137843507 total_reltuples | 350704437 total_autovacuum_count | 146 total_autoanalyze_count | 605 total_n_tup_upd | 7896354 total_n_tup_ins | 487974000 table_count | 105 Activity By Workload Type -[ RECORD 1 ]-----------------+---------------- table_group | batch_tables ** avg_autovacuum_count | 11.000 ** avg_autoanalyze_count | 13.200 avg_vacuum_count | 0.000 avg_analyze_count | 0.000 rows_inserted | 477974000 rows_updated | 0 rows_hot_updated | 0 table_count | 5 -[ RECORD 2 ]-----------------+---------------- table_group | numbered_tables ** avg_autovacuum_count | 0.910 ** avg_autoanalyze_count | 5.390 avg_vacuum_count | 1.000 avg_analyze_count | 0.000 rows_inserted | 10000000 rows_updated | 7896354 rows_hot_updated | 7123134 table_count | 100 ``` The results above show what I expected: the batch tables receive higher priority, as seen from the averages of autovacuum and autoanalyze runs. This behavior is expected, but it may catch some users by surprise after an upgrade, since certain tables will now receive more attention than others. Longer tests might also show more bloat accumulating on heavily updated tables. In such cases, a user may need to adjust autovacuum settings on a per-table basis to restore the previous behavior. So, I am not quite sure what is the best way to test except for trying to find these non steady state workloads and see the impact of the prioritization change to (auto)vacuum/analyze activity . Maybe there is a better way? [0] https://www.postgresql.org/message-id/aQI7tGEs8IOPxG64%40nathan -- Sami Imseih Amazon Web Services (AWS)
Attachment
pgsql-hackers by date: