Thread: Autoanalyze of the autovacuum daemon ...

Autoanalyze of the autovacuum daemon ...

From
Baptiste LHOSTE
Date:
Hi All,

We are using postgreSQL since 2007 (now we use postgreSQL 8.4) and until recently we used to perform deletion of old
datathen vacuum and analyze tasks by ourself (every night at 0am CET). 
Nevertheless we reached a point where these tasks are taking so much time that why we decide to perform deletion of old
dataevery hour and let the autovacuum daemon manage vacuum and analyze tasks. 

But we have some difficulties to understand how work the trigger of the autovacuum daemon for autoanalyze tasks.

Let me explain our database structure. We have two kind of tables :
- first one on which we perform a complete truncate and a copy to fill each (partitioned tables - 288 partitions, one
per5mn re-used every day)  
- second one on which we insert some new data every five minutes (avg~200 rows) and delete old data about every 1 hour
(avg~1000rows).  
For complete understanding, we need up-to-date stats for the second one because the recurrent deletion might take a
longtime, (~1mn for less than 1000 deleted rows because planer uses seq scan instead of index scan).  

The autovacuum perform autoanalyze tasks on first kind as soon as the process (truncate + copy) is done.

But the autoanalyze is not that effective for second kind.

We tried to reduce autovacuum_analyze_threshold (50 => 10) and autovacuum_analyze_scale_factor (0.1 => 0.005) for the
secondkind of tables (ALTER TABLE ... SET PARAMETERS ...) without any conclusive effect.  

We can not find where is stored the total number of tuples inserted or updated since the last ANALYZE. Could someone
giveus the answer ? 

We take a look at the relfrozenxid but our tables do not have a big value of relfrozenxid (< 10000000).

Most of the time there is no autovacuum analyze query in the pg_stat_activity, althought we set the autovacuum_naptime
to15s to try to start new analyze task more often. 

We do not understand why we can't obtain some improvments with previous changes. Did we do something wrong ?

Thank you all for your kind advices,

Regards, Baptiste.

---
Baptiste LHOSTE
blhoste@alaloop.com

ALALOOP S.A.S. - Technopole Izarbel - 64210 Bidart
Téléphone : +33 (0) 5 59 41 51 10
www.alaloop.com


Re: Autoanalyze of the autovacuum daemon ...

From
Cédric Villemain
Date:
Hi,

> second one on which we insert some new data every five minutes (avg~200
> rows) and delete old data about every 1 hour (avg~1000 rows). For complete
> understanding, we need up-to-date stats for the second one because the
> recurrent deletion might take a long time, (~1mn for less than 1000
> deleted rows because planer uses seq scan instead of index scan).
>
> The autovacuum perform autoanalyze tasks on first kind as soon as the
> process (truncate + copy) is done.
>
> But the autoanalyze is not that effective for second kind.
>
> We tried to reduce autovacuum_analyze_threshold (50 => 10) and
> autovacuum_analyze_scale_factor (0.1 => 0.005) for the second kind of
> tables (ALTER TABLE ... SET PARAMETERS ...) without any conclusive effect.
>
> We can not find where is stored the total number of tuples inserted or
> updated since the last ANALYZE. Could someone give us the answer ?

in pg_stat_user_tables, not since the last time ANALYZE run, but you have the
number of reltuples from pg_class that is used to calculate the ratio.

> We take a look at the relfrozenxid but our tables do not have a big value
> of relfrozenxid (< 10000000).
>
> Most of the time there is no autovacuum analyze query in the
> pg_stat_activity, althought we set the autovacuum_naptime to 15s to try to
> start new analyze task more often.
>
> We do not understand why we can't obtain some improvments with previous
> changes. Did we do something wrong ?

Everything is relative to the size of the table, what is the content of
pg_class for the second kind of tables ? (relpages/reltuples)

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Attachment

Re: Autoanalyze of the autovacuum daemon ...

From
Baptiste LHOSTE
Date:
Hi,

> in pg_stat_user_tables, not since the last time ANALYZE run, but you have the
> number of reltuples from pg_class that is used to calculate the ratio.

So how does the autovacuum daemon to determine the number of obsolete row since its last work on a specific table ?

> Everything is relative to the size of the table, what is the content of
> pg_class for the second kind of tables ? (relpages/reltuples)

relname | last_autoanalyse              | reltuples   | relpages | n_tup_ins + n_tup_upd + n_tup_del
        | (from pg_stat_all_tables)     |             |          | (from pg_stat_all_tables)
        |                               |             |          |
table1  | 2012-10-15 21:12:04.362647+02 | 2.53296e+06 |    50789 | 28542483
table2  | 2012-10-15 18:27:30.181583+02 | 2.57144e+06 |    54833 | 31127737
table3  | 2012-10-15 13:13:09.569758+02 | 2.58646e+06 |    63208 | 27167944
table4  | 2012-10-15 12:05:39.094652+02 | 2.58655e+06 |    63181 | 27160409
table5  | 2012-10-15 12:05:32.347738+02 | 2.58611e+06 |    57272 | 27171661
table6  | 2012-10-15 10:43:40.558078+02 | 2.57925e+06 |    63145 | 27155875
table7  | 2012-10-15 10:40:25.469625+02 | 2.59551e+06 |    57254 | 27161876
table8  | 2012-10-15 10:25:10.776152+02 | 2.56987e+06 |    63147 | 27177907
table9  | 2012-10-15 09:30:17.407181+02 |  2.5923e+06 |    57231 | 27152953
table10 | 2012-10-15 08:29:25.427207+02 | 2.56237e+06 |    63085 | 31770581
table11 | 2012-10-15 08:26:14.550742+02 | 2.56503e+06 |    63065 | 31769609
table12 | 2012-10-15 06:39:01.737914+02 | 2.57833e+06 |    63023 | 31767543
table13 | 2012-10-15 06:35:36.43683+02  | 2.57217e+06 |    63047 | 31759954
table14 | 2012-10-15 05:08:02.858623+02 | 2.59068e+06 |    57119 | 27154555
table15 | 2012-10-15 04:40:48.695142+02 | 2.57449e+06 |    63002 | 31768034
table16 | 2012-10-10 07:51:49.625226+02 | 2.48584e+06 |    54014 | 22110223
table17 | 2012-10-09 10:03:36.649331+02 | 2.58047e+06 |    63742 | 23423257
table18 | 2012-10-09 10:02:13.549362+02 | 2.57854e+06 |    63585 | 23364035
table19 | 2012-10-09 09:18:55.447895+02 | 2.58851e+06 |    63516 | 23375180

Regards, Baptiste Lhoste


Re: Autoanalyze of the autovacuum daemon ...

From
Cédric Villemain
Date:

> relname | last_autoanalyse | reltuples | relpages |

> n_tup_ins + n_tup_upd + n_tup_del

>

> | (from pg_stat_all_tables) | | | (from

> | pg_stat_all_tables)

>

> table1 | 2012-10-15 21:12:04.362647+02 | 2.53296e+06 | 50789 | 28542483

> table2 | 2012-10-15 18:27:30.181583+02 | 2.57144e+06 | 54833 | 31127737

> table3 | 2012-10-15 13:13:09.569758+02 | 2.58646e+06 | 63208 | 27167944

> table4 | 2012-10-15 12:05:39.094652+02 | 2.58655e+06 | 63181 | 27160409

> table5 | 2012-10-15 12:05:32.347738+02 | 2.58611e+06 | 57272 | 27171661

> table6 | 2012-10-15 10:43:40.558078+02 | 2.57925e+06 | 63145 | 27155875

> table7 | 2012-10-15 10:40:25.469625+02 | 2.59551e+06 | 57254 | 27161876

> table8 | 2012-10-15 10:25:10.776152+02 | 2.56987e+06 | 63147 | 27177907

> table9 | 2012-10-15 09:30:17.407181+02 | 2.5923e+06 | 57231 | 27152953

> table10 | 2012-10-15 08:29:25.427207+02 | 2.56237e+06 | 63085 | 31770581

> table11 | 2012-10-15 08:26:14.550742+02 | 2.56503e+06 | 63065 | 31769609

> table12 | 2012-10-15 06:39:01.737914+02 | 2.57833e+06 | 63023 | 31767543

> table13 | 2012-10-15 06:35:36.43683+02 | 2.57217e+06 | 63047 | 31759954

> table14 | 2012-10-15 05:08:02.858623+02 | 2.59068e+06 | 57119 | 27154555

> table15 | 2012-10-15 04:40:48.695142+02 | 2.57449e+06 | 63002 | 31768034

> table16 | 2012-10-10 07:51:49.625226+02 | 2.48584e+06 | 54014 | 22110223

> table17 | 2012-10-09 10:03:36.649331+02 | 2.58047e+06 | 63742 | 23423257

> table18 | 2012-10-09 10:02:13.549362+02 | 2.57854e+06 | 63585 | 23364035

> table19 | 2012-10-09 09:18:55.447895+02 | 2.58851e+06 | 63516 | 23375180

 

 

So that an auto-ANALYZE should run each 2.5-3 hours.

If it does not proceed, check that you do not have some process still doing manual maintenance or DDL on those tables, they might lock the table and kill autovacuum

(check the pg_stat_activity, and set log_autovacuum_min_duration=0 to track all job done by autovacuum)

 

--

Cédric Villemain +33 (0)6 20 30 22 52

http://2ndQuadrant.fr/

PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Attachment

Re: Autoanalyze of the autovacuum daemon ...

From
Baptiste LHOSTE
Date:
Hi,

> So that an auto-ANALYZE should run each 2.5-3 hours.
>
> If it does not proceed, check that you do not have some process still doing manual maintenance or DDL on those
tables,they might 
> lock the table and kill autovacuum
>
> (check the pg_stat_activity, and set log_autovacuum_min_duration=0 to track all job done by autovacuum)

Sorry for my late answer, but since two days I noticed some improvements so I decided to wait a little bit to confirm
these.

Finally the autovacuum daemon seems to work correctly. It does not proceed an auto-ANALYZE each 3 hours, it is more
abouteach 12 hours but it is much better than at the beginning. 

The fact is the autovacuum daemon took about 10 days to become effective on both kind of tables, maybe it needs this
periodto catch up and be effective. 

Thanks for your time, Regards, Baptiste.


Re: Autoanalyze of the autovacuum daemon ...

From
Baptiste LHOSTE
Date:
Hi,

Today I consulted the log of my PostgreSQL server and I saw that autovacuum tasks took to much time to do their work. I
thoughtthat ANALYZE was a fast operation ? 

2012-11-09 10:52:33 CET LOG:  automatic analyze of table "flows.public.agg_t20_outgoing_a41_src_net_and_dst_net_f5"
systemusage: CPU 0.20s/0.63u sec elapsed 137.12 sec 
2012-11-09 10:52:58 CET LOG:  automatic vacuum of table "flows.public.agg_t20_outgoing_a51_src_port_and_proto_f5":
indexscans: 0 
    pages: 0 removed, 36044 remain
    tuples: 0 removed, 3495897 remain
    system usage: CPU 0.53s/0.62u sec elapsed 307.28 sec
2012-11-09 10:53:33 CET LOG:  automatic analyze of table "flows.public.agg_t387_incoming_a44_dst_port_and_proto_f5"
systemusage: CPU 0.24s/0.46u sec elapsed 174.73 sec 
2012-11-09 10:55:50 CET LOG:  automatic analyze of table "flows.public.agg_t20_outgoing_a51_src_port_and_proto_f5"
systemusage: CPU 0.20s/0.48u sec elapsed 152.16 sec 
2012-11-09 10:55:55 CET LOG:  automatic vacuum of table "flows.public.agg_t387_incoming_a41_src_net_and_dst_net_f5":
indexscans: 0 
    pages: 0 removed, 29849 remain
    tuples: 0 removed, 1349327 remain
    system usage: CPU 0.35s/0.28u sec elapsed 141.52 sec
2012-11-09 10:58:18 CET LOG:  automatic vacuum of table "flows.public.agg_t205_incoming_a41_src_net_and_dst_net_f5":
indexscans: 0 
    pages: 0 removed, 34435 remain
    tuples: 0 removed, 3339035 remain
    system usage: CPU 0.66s/0.55u sec elapsed 340.06 sec
2012-11-09 10:59:32 CET LOG:  automatic analyze of table "flows.public.agg_t387_incoming_a41_src_net_and_dst_net_f5"
systemusage: CPU 0.26s/0.54u sec elapsed 216.95 sec 
2012-11-09 10:59:45 CET LOG:  automatic vacuum of table "flows.public.agg_t387_incoming_a44_dst_port_and_proto_f5":
indexscans: 0 
    pages: 0 removed, 28733 remain
    tuples: 0 removed, 1327333 remain
    system usage: CPU 0.24s/0.23u sec elapsed 234.73 sec
2012-11-09 11:01:24 CET LOG:  automatic analyze of table "flows.public.agg_t205_incoming_a41_src_net_and_dst_net_f5"
systemusage: CPU 0.22s/0.55u sec elapsed 186.27 sec 
2012-11-09 11:01:50 CET LOG:  automatic vacuum of table "flows.public.agg_t387_incoming_a41_src_net_and_dst_net_f5":
indexscans: 0 
    pages: 0 removed, 29849 remain
    tuples: 0 removed, 1349527 remain
    system usage: CPU 0.24s/0.23u sec elapsed 124.28 sec
2012-11-09 11:02:04 CET LOG:  automatic vacuum of table "flows.public.agg_t205_incoming_a40_dst_net_f5": index scans: 0
    pages: 0 removed, 19817 remain
    tuples: 0 removed, 2008020 remain
    system usage: CPU 0.33s/0.40u sec elapsed 151.76 sec
2012-11-09 11:03:22 CET LOG:  automatic analyze of table "flows.public.agg_t205_incoming_a40_dst_net_f5" system usage:
CPU0.14s/0.37u sec elapsed 78.40 sec 
2012-11-09 11:04:11 CET LOG:  automatic vacuum of table "flows.public.agg_t387_incoming_a40_dst_net_f5": index scans: 0
    pages: 0 removed, 26294 remain
    tuples: 0 removed, 1316071 remain
    system usage: CPU 0.25s/0.28u sec elapsed 140.99 sec
2012-11-09 11:05:08 CET LOG:  automatic vacuum of table "flows.public.agg_t20_outgoing_a39_src_net_f5": index scans: 0
    pages: 0 removed, 26821 remain
    tuples: 0 removed, 2869526 remain
    system usage: CPU 0.46s/0.42u sec elapsed 223.09 sec
2012-11-09 11:05:57 CET LOG:  automatic analyze of table "flows.public.agg_t387_incoming_a40_dst_net_f5" system usage:
CPU0.28s/0.43u sec elapsed 105.64 sec 
2012-11-09 11:06:39 CET LOG:  automatic analyze of table "flows.public.agg_t20_outgoing_a39_src_net_f5" system usage:
CPU0.27s/0.44u sec elapsed 90.82 sec 

Regards, Baptiste.


Re: Autoanalyze of the autovacuum daemon ...

From
"Kevin Grittner"
Date:
Baptiste LHOSTE wrote:

> Today I consulted the log of my PostgreSQL server and I saw that
> autovacuum tasks took to much time to do their work. I thought that
> ANALYZE was a fast operation ?

That depends on configuration settings and on whether the computer
(or VM) is so swamped that the autovacuum task is starved for cycles.
Also on any overrides of statistics targets for those tables.

Please show us the output from running this query:

http://wiki.postgresql.org/wiki/Server_Configuration

Have you overridden any statistics targets?

A description of the environment would also be good. Hardware? Load?

-Kevin


Re: Autoanalyze of the autovacuum daemon ...

From
Baptiste LHOSTE
Date:
Hi,

> That depends on configuration settings and on whether the computer
> (or VM) is so swamped that the autovacuum task is starved for cycles.
> Also on any overrides of statistics targets for those tables.

> Please show us the output from running this query:

> http://wiki.postgresql.org/wiki/Server_Configuration

            name             |                                           current_setting
            

-----------------------------+------------------------------------------------------------------------------------------------------
 version                     | PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Debian 4.4.5-8)
4.4.5,64-bit 
 autovacuum                  | on
 autovacuum_naptime          | 15s
 checkpoint_segments         | 80
 constraint_exclusion        | partition
 custom_variable_classes     | alaloop
 effective_cache_size        | 14GB
 external_pid_file           | /var/run/postgresql/8.4-main.pid
 lc_collate                  | en_US.UTF-8
 lc_ctype                    | en_US.UTF-8
 listen_addresses            | *
 log_autovacuum_min_duration | 0
 log_line_prefix             | %t
 maintenance_work_mem        | 1GB
 max_connections             | 100
 max_locks_per_transaction   | 256
 max_prepared_transactions   | 150
 max_stack_depth             | 5MB
 port                        | 5432
 server_encoding             | UTF8
 shared_buffers              | 756MB
 ssl                         | on
 temp_buffers                | 1536
 TimeZone                    | localtime
 unix_socket_directory       | /var/run/postgresql
 wal_buffers                 | 1MB
 work_mem                    | 756MB


> Have you overridden any statistics targets?

We have put some specific values to autovacuum parameters for tables which take so much time to be analyzed. To do that
weused following kind of query : 

ALTER TABLE tablename SET (autovacuum_vacuum_threshold=20, autovacuum_vacuum_scale_factor=0.01,
autovacuum_analyze_threshold=10,autovacuum_analyze_scale_factor=0.005, autovacuum_vacuum_cost_delay=10,
autovacuum_vacuum_cost_limit=1000);

These tables have two timestamp columns and a btree index on both timestamp column.
Will it be more efficient for us to configure the autovacuum daemon analyze task only on those columns ? If yes, how
canwe do that ? 


> A description of the environment would also be good. Hardware? Load?
Debian 6 64bits
8  Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
16 Go RAM - 4 Go Swap
4 * 300 Go Raid 0


Regards, Baptiste.


Re: Autoanalyze of the autovacuum daemon ...

From
"Kevin Grittner"
Date:
Baptiste LHOSTE wrote:

>> Please show us the output from running this query:
>>
>> http://wiki.postgresql.org/wiki/Server_Configuration

> [very reasonable settings except for a very large work_mem]

Make sure that work_mem setting isn't driving you into swapping or
near-zero caching. A shortage of cache space could explain this
because it looks like about 8.5 ms for each page read. About the only
other thing I can think to recommend is to decrease
autovacuum_cost_delay to 10ms and see if that helps.

> These tables have two timestamp columns and a btree index on both
> timestamp column. Will it be more efficient for us to configure the
> autovacuum daemon analyze task only on those columns ?

No.

> 4 * 300 Go Raid 0

You do realize that if any of those four drives fail you will need to
use your backup, right?

-Kevin


Re: Autoanalyze of the autovacuum daemon ...

From
Baptiste LHOSTE
Date:
>> Please show us the output from running this query:
>>
>> http://wiki.postgresql.org/wiki/Server_Configuration

> [very reasonable settings except for a very large work_mem]

> Make sure that work_mem setting isn't driving you into swapping or
> near-zero caching. A shortage of cache space could explain this
> because it looks like about 8.5 ms for each page read. About the only
> other thing I can think to recommend is to decrease
> autovacuum_cost_delay to 10ms and see if that helps.

When I run a command htop on my server I can not see any swap issue :

Mem 4Go / 16Go
Swp 175B / 4Go

I am sorry but I do not understand we you suggest to decrease autovacuum_cost_delay to 10 ms.
We already set autovacuum_vacuum_cost_delay to 10ms for those tables.

>> 4 * 300 Go Raid 0

> You do realize that if any of those four drives fail you will need to
> use your backup, right?

Yes, we do. We already used our backup when we changed some old servers.

Regards, Baptiste.