Thread: Autoanalyze of the autovacuum daemon ...
Hi All, We are using postgreSQL since 2007 (now we use postgreSQL 8.4) and until recently we used to perform deletion of old datathen vacuum and analyze tasks by ourself (every night at 0am CET). Nevertheless we reached a point where these tasks are taking so much time that why we decide to perform deletion of old dataevery hour and let the autovacuum daemon manage vacuum and analyze tasks. But we have some difficulties to understand how work the trigger of the autovacuum daemon for autoanalyze tasks. Let me explain our database structure. We have two kind of tables : - first one on which we perform a complete truncate and a copy to fill each (partitioned tables - 288 partitions, one per5mn re-used every day) - second one on which we insert some new data every five minutes (avg~200 rows) and delete old data about every 1 hour (avg~1000rows). For complete understanding, we need up-to-date stats for the second one because the recurrent deletion might take a longtime, (~1mn for less than 1000 deleted rows because planer uses seq scan instead of index scan). The autovacuum perform autoanalyze tasks on first kind as soon as the process (truncate + copy) is done. But the autoanalyze is not that effective for second kind. We tried to reduce autovacuum_analyze_threshold (50 => 10) and autovacuum_analyze_scale_factor (0.1 => 0.005) for the secondkind of tables (ALTER TABLE ... SET PARAMETERS ...) without any conclusive effect. We can not find where is stored the total number of tuples inserted or updated since the last ANALYZE. Could someone giveus the answer ? We take a look at the relfrozenxid but our tables do not have a big value of relfrozenxid (< 10000000). Most of the time there is no autovacuum analyze query in the pg_stat_activity, althought we set the autovacuum_naptime to15s to try to start new analyze task more often. We do not understand why we can't obtain some improvments with previous changes. Did we do something wrong ? Thank you all for your kind advices, Regards, Baptiste. --- Baptiste LHOSTE blhoste@alaloop.com ALALOOP S.A.S. - Technopole Izarbel - 64210 Bidart Téléphone : +33 (0) 5 59 41 51 10 www.alaloop.com
Hi, > second one on which we insert some new data every five minutes (avg~200 > rows) and delete old data about every 1 hour (avg~1000 rows). For complete > understanding, we need up-to-date stats for the second one because the > recurrent deletion might take a long time, (~1mn for less than 1000 > deleted rows because planer uses seq scan instead of index scan). > > The autovacuum perform autoanalyze tasks on first kind as soon as the > process (truncate + copy) is done. > > But the autoanalyze is not that effective for second kind. > > We tried to reduce autovacuum_analyze_threshold (50 => 10) and > autovacuum_analyze_scale_factor (0.1 => 0.005) for the second kind of > tables (ALTER TABLE ... SET PARAMETERS ...) without any conclusive effect. > > We can not find where is stored the total number of tuples inserted or > updated since the last ANALYZE. Could someone give us the answer ? in pg_stat_user_tables, not since the last time ANALYZE run, but you have the number of reltuples from pg_class that is used to calculate the ratio. > We take a look at the relfrozenxid but our tables do not have a big value > of relfrozenxid (< 10000000). > > Most of the time there is no autovacuum analyze query in the > pg_stat_activity, althought we set the autovacuum_naptime to 15s to try to > start new analyze task more often. > > We do not understand why we can't obtain some improvments with previous > changes. Did we do something wrong ? Everything is relative to the size of the table, what is the content of pg_class for the second kind of tables ? (relpages/reltuples) -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation
Attachment
Hi, > in pg_stat_user_tables, not since the last time ANALYZE run, but you have the > number of reltuples from pg_class that is used to calculate the ratio. So how does the autovacuum daemon to determine the number of obsolete row since its last work on a specific table ? > Everything is relative to the size of the table, what is the content of > pg_class for the second kind of tables ? (relpages/reltuples) relname | last_autoanalyse | reltuples | relpages | n_tup_ins + n_tup_upd + n_tup_del | (from pg_stat_all_tables) | | | (from pg_stat_all_tables) | | | | table1 | 2012-10-15 21:12:04.362647+02 | 2.53296e+06 | 50789 | 28542483 table2 | 2012-10-15 18:27:30.181583+02 | 2.57144e+06 | 54833 | 31127737 table3 | 2012-10-15 13:13:09.569758+02 | 2.58646e+06 | 63208 | 27167944 table4 | 2012-10-15 12:05:39.094652+02 | 2.58655e+06 | 63181 | 27160409 table5 | 2012-10-15 12:05:32.347738+02 | 2.58611e+06 | 57272 | 27171661 table6 | 2012-10-15 10:43:40.558078+02 | 2.57925e+06 | 63145 | 27155875 table7 | 2012-10-15 10:40:25.469625+02 | 2.59551e+06 | 57254 | 27161876 table8 | 2012-10-15 10:25:10.776152+02 | 2.56987e+06 | 63147 | 27177907 table9 | 2012-10-15 09:30:17.407181+02 | 2.5923e+06 | 57231 | 27152953 table10 | 2012-10-15 08:29:25.427207+02 | 2.56237e+06 | 63085 | 31770581 table11 | 2012-10-15 08:26:14.550742+02 | 2.56503e+06 | 63065 | 31769609 table12 | 2012-10-15 06:39:01.737914+02 | 2.57833e+06 | 63023 | 31767543 table13 | 2012-10-15 06:35:36.43683+02 | 2.57217e+06 | 63047 | 31759954 table14 | 2012-10-15 05:08:02.858623+02 | 2.59068e+06 | 57119 | 27154555 table15 | 2012-10-15 04:40:48.695142+02 | 2.57449e+06 | 63002 | 31768034 table16 | 2012-10-10 07:51:49.625226+02 | 2.48584e+06 | 54014 | 22110223 table17 | 2012-10-09 10:03:36.649331+02 | 2.58047e+06 | 63742 | 23423257 table18 | 2012-10-09 10:02:13.549362+02 | 2.57854e+06 | 63585 | 23364035 table19 | 2012-10-09 09:18:55.447895+02 | 2.58851e+06 | 63516 | 23375180 Regards, Baptiste Lhoste
> relname | last_autoanalyse | reltuples | relpages |
> n_tup_ins + n_tup_upd + n_tup_del
>
> | (from pg_stat_all_tables) | | | (from
> | pg_stat_all_tables)
>
> table1 | 2012-10-15 21:12:04.362647+02 | 2.53296e+06 | 50789 | 28542483
> table2 | 2012-10-15 18:27:30.181583+02 | 2.57144e+06 | 54833 | 31127737
> table3 | 2012-10-15 13:13:09.569758+02 | 2.58646e+06 | 63208 | 27167944
> table4 | 2012-10-15 12:05:39.094652+02 | 2.58655e+06 | 63181 | 27160409
> table5 | 2012-10-15 12:05:32.347738+02 | 2.58611e+06 | 57272 | 27171661
> table6 | 2012-10-15 10:43:40.558078+02 | 2.57925e+06 | 63145 | 27155875
> table7 | 2012-10-15 10:40:25.469625+02 | 2.59551e+06 | 57254 | 27161876
> table8 | 2012-10-15 10:25:10.776152+02 | 2.56987e+06 | 63147 | 27177907
> table9 | 2012-10-15 09:30:17.407181+02 | 2.5923e+06 | 57231 | 27152953
> table10 | 2012-10-15 08:29:25.427207+02 | 2.56237e+06 | 63085 | 31770581
> table11 | 2012-10-15 08:26:14.550742+02 | 2.56503e+06 | 63065 | 31769609
> table12 | 2012-10-15 06:39:01.737914+02 | 2.57833e+06 | 63023 | 31767543
> table13 | 2012-10-15 06:35:36.43683+02 | 2.57217e+06 | 63047 | 31759954
> table14 | 2012-10-15 05:08:02.858623+02 | 2.59068e+06 | 57119 | 27154555
> table15 | 2012-10-15 04:40:48.695142+02 | 2.57449e+06 | 63002 | 31768034
> table16 | 2012-10-10 07:51:49.625226+02 | 2.48584e+06 | 54014 | 22110223
> table17 | 2012-10-09 10:03:36.649331+02 | 2.58047e+06 | 63742 | 23423257
> table18 | 2012-10-09 10:02:13.549362+02 | 2.57854e+06 | 63585 | 23364035
> table19 | 2012-10-09 09:18:55.447895+02 | 2.58851e+06 | 63516 | 23375180
So that an auto-ANALYZE should run each 2.5-3 hours.
If it does not proceed, check that you do not have some process still doing manual maintenance or DDL on those tables, they might lock the table and kill autovacuum
(check the pg_stat_activity, and set log_autovacuum_min_duration=0 to track all job done by autovacuum)
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
Attachment
Hi, > So that an auto-ANALYZE should run each 2.5-3 hours. > > If it does not proceed, check that you do not have some process still doing manual maintenance or DDL on those tables,they might > lock the table and kill autovacuum > > (check the pg_stat_activity, and set log_autovacuum_min_duration=0 to track all job done by autovacuum) Sorry for my late answer, but since two days I noticed some improvements so I decided to wait a little bit to confirm these. Finally the autovacuum daemon seems to work correctly. It does not proceed an auto-ANALYZE each 3 hours, it is more abouteach 12 hours but it is much better than at the beginning. The fact is the autovacuum daemon took about 10 days to become effective on both kind of tables, maybe it needs this periodto catch up and be effective. Thanks for your time, Regards, Baptiste.
Hi, Today I consulted the log of my PostgreSQL server and I saw that autovacuum tasks took to much time to do their work. I thoughtthat ANALYZE was a fast operation ? 2012-11-09 10:52:33 CET LOG: automatic analyze of table "flows.public.agg_t20_outgoing_a41_src_net_and_dst_net_f5" systemusage: CPU 0.20s/0.63u sec elapsed 137.12 sec 2012-11-09 10:52:58 CET LOG: automatic vacuum of table "flows.public.agg_t20_outgoing_a51_src_port_and_proto_f5": indexscans: 0 pages: 0 removed, 36044 remain tuples: 0 removed, 3495897 remain system usage: CPU 0.53s/0.62u sec elapsed 307.28 sec 2012-11-09 10:53:33 CET LOG: automatic analyze of table "flows.public.agg_t387_incoming_a44_dst_port_and_proto_f5" systemusage: CPU 0.24s/0.46u sec elapsed 174.73 sec 2012-11-09 10:55:50 CET LOG: automatic analyze of table "flows.public.agg_t20_outgoing_a51_src_port_and_proto_f5" systemusage: CPU 0.20s/0.48u sec elapsed 152.16 sec 2012-11-09 10:55:55 CET LOG: automatic vacuum of table "flows.public.agg_t387_incoming_a41_src_net_and_dst_net_f5": indexscans: 0 pages: 0 removed, 29849 remain tuples: 0 removed, 1349327 remain system usage: CPU 0.35s/0.28u sec elapsed 141.52 sec 2012-11-09 10:58:18 CET LOG: automatic vacuum of table "flows.public.agg_t205_incoming_a41_src_net_and_dst_net_f5": indexscans: 0 pages: 0 removed, 34435 remain tuples: 0 removed, 3339035 remain system usage: CPU 0.66s/0.55u sec elapsed 340.06 sec 2012-11-09 10:59:32 CET LOG: automatic analyze of table "flows.public.agg_t387_incoming_a41_src_net_and_dst_net_f5" systemusage: CPU 0.26s/0.54u sec elapsed 216.95 sec 2012-11-09 10:59:45 CET LOG: automatic vacuum of table "flows.public.agg_t387_incoming_a44_dst_port_and_proto_f5": indexscans: 0 pages: 0 removed, 28733 remain tuples: 0 removed, 1327333 remain system usage: CPU 0.24s/0.23u sec elapsed 234.73 sec 2012-11-09 11:01:24 CET LOG: automatic analyze of table "flows.public.agg_t205_incoming_a41_src_net_and_dst_net_f5" systemusage: CPU 0.22s/0.55u sec elapsed 186.27 sec 2012-11-09 11:01:50 CET LOG: automatic vacuum of table "flows.public.agg_t387_incoming_a41_src_net_and_dst_net_f5": indexscans: 0 pages: 0 removed, 29849 remain tuples: 0 removed, 1349527 remain system usage: CPU 0.24s/0.23u sec elapsed 124.28 sec 2012-11-09 11:02:04 CET LOG: automatic vacuum of table "flows.public.agg_t205_incoming_a40_dst_net_f5": index scans: 0 pages: 0 removed, 19817 remain tuples: 0 removed, 2008020 remain system usage: CPU 0.33s/0.40u sec elapsed 151.76 sec 2012-11-09 11:03:22 CET LOG: automatic analyze of table "flows.public.agg_t205_incoming_a40_dst_net_f5" system usage: CPU0.14s/0.37u sec elapsed 78.40 sec 2012-11-09 11:04:11 CET LOG: automatic vacuum of table "flows.public.agg_t387_incoming_a40_dst_net_f5": index scans: 0 pages: 0 removed, 26294 remain tuples: 0 removed, 1316071 remain system usage: CPU 0.25s/0.28u sec elapsed 140.99 sec 2012-11-09 11:05:08 CET LOG: automatic vacuum of table "flows.public.agg_t20_outgoing_a39_src_net_f5": index scans: 0 pages: 0 removed, 26821 remain tuples: 0 removed, 2869526 remain system usage: CPU 0.46s/0.42u sec elapsed 223.09 sec 2012-11-09 11:05:57 CET LOG: automatic analyze of table "flows.public.agg_t387_incoming_a40_dst_net_f5" system usage: CPU0.28s/0.43u sec elapsed 105.64 sec 2012-11-09 11:06:39 CET LOG: automatic analyze of table "flows.public.agg_t20_outgoing_a39_src_net_f5" system usage: CPU0.27s/0.44u sec elapsed 90.82 sec Regards, Baptiste.
Baptiste LHOSTE wrote: > Today I consulted the log of my PostgreSQL server and I saw that > autovacuum tasks took to much time to do their work. I thought that > ANALYZE was a fast operation ? That depends on configuration settings and on whether the computer (or VM) is so swamped that the autovacuum task is starved for cycles. Also on any overrides of statistics targets for those tables. Please show us the output from running this query: http://wiki.postgresql.org/wiki/Server_Configuration Have you overridden any statistics targets? A description of the environment would also be good. Hardware? Load? -Kevin
Hi, > That depends on configuration settings and on whether the computer > (or VM) is so swamped that the autovacuum task is starved for cycles. > Also on any overrides of statistics targets for those tables. > Please show us the output from running this query: > http://wiki.postgresql.org/wiki/Server_Configuration name | current_setting -----------------------------+------------------------------------------------------------------------------------------------------ version | PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Debian 4.4.5-8) 4.4.5,64-bit autovacuum | on autovacuum_naptime | 15s checkpoint_segments | 80 constraint_exclusion | partition custom_variable_classes | alaloop effective_cache_size | 14GB external_pid_file | /var/run/postgresql/8.4-main.pid lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 listen_addresses | * log_autovacuum_min_duration | 0 log_line_prefix | %t maintenance_work_mem | 1GB max_connections | 100 max_locks_per_transaction | 256 max_prepared_transactions | 150 max_stack_depth | 5MB port | 5432 server_encoding | UTF8 shared_buffers | 756MB ssl | on temp_buffers | 1536 TimeZone | localtime unix_socket_directory | /var/run/postgresql wal_buffers | 1MB work_mem | 756MB > Have you overridden any statistics targets? We have put some specific values to autovacuum parameters for tables which take so much time to be analyzed. To do that weused following kind of query : ALTER TABLE tablename SET (autovacuum_vacuum_threshold=20, autovacuum_vacuum_scale_factor=0.01, autovacuum_analyze_threshold=10,autovacuum_analyze_scale_factor=0.005, autovacuum_vacuum_cost_delay=10, autovacuum_vacuum_cost_limit=1000); These tables have two timestamp columns and a btree index on both timestamp column. Will it be more efficient for us to configure the autovacuum daemon analyze task only on those columns ? If yes, how canwe do that ? > A description of the environment would also be good. Hardware? Load? Debian 6 64bits 8 Intel(R) Xeon(R) CPU E5506 @ 2.13GHz 16 Go RAM - 4 Go Swap 4 * 300 Go Raid 0 Regards, Baptiste.
Baptiste LHOSTE wrote: >> Please show us the output from running this query: >> >> http://wiki.postgresql.org/wiki/Server_Configuration > [very reasonable settings except for a very large work_mem] Make sure that work_mem setting isn't driving you into swapping or near-zero caching. A shortage of cache space could explain this because it looks like about 8.5 ms for each page read. About the only other thing I can think to recommend is to decrease autovacuum_cost_delay to 10ms and see if that helps. > These tables have two timestamp columns and a btree index on both > timestamp column. Will it be more efficient for us to configure the > autovacuum daemon analyze task only on those columns ? No. > 4 * 300 Go Raid 0 You do realize that if any of those four drives fail you will need to use your backup, right? -Kevin
>> Please show us the output from running this query: >> >> http://wiki.postgresql.org/wiki/Server_Configuration > [very reasonable settings except for a very large work_mem] > Make sure that work_mem setting isn't driving you into swapping or > near-zero caching. A shortage of cache space could explain this > because it looks like about 8.5 ms for each page read. About the only > other thing I can think to recommend is to decrease > autovacuum_cost_delay to 10ms and see if that helps. When I run a command htop on my server I can not see any swap issue : Mem 4Go / 16Go Swp 175B / 4Go I am sorry but I do not understand we you suggest to decrease autovacuum_cost_delay to 10 ms. We already set autovacuum_vacuum_cost_delay to 10ms for those tables. >> 4 * 300 Go Raid 0 > You do realize that if any of those four drives fail you will need to > use your backup, right? Yes, we do. We already used our backup when we changed some old servers. Regards, Baptiste.