Re: Auto-analyse on insert operations - Mailing list pgsql-general

From Bertrand Roos
Subject Re: Auto-analyse on insert operations
Date
Msg-id 563CDFC6.3010101@areal.fr
Whole thread Raw
In response to Re: Auto-analyse on insert operations  (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses Re: Auto-analyse on insert operations
List pgsql-general

Le 04/11/2015 16:57, Adrian Klaver a écrit :
> On 11/04/2015 07:43 AM, Bertrand Roos wrote:
>>
>>
>> Le 04/11/2015 14:55, Bill Moran a écrit :
>>> On Wed, 4 Nov 2015 14:32:37 +0100
>>> Bertrand Roos <bertrand.roos@areal.fr> wrote:
>>>> I try to configure auto-analyse task with postgresql 9.4.
>>>> I have the following configuration (default configuration):
>>>> track_counts = on
>>>> autovacuum = on
>>>> log_autovacuum_min_duration = -1
>>>> autovacuum_max_workers = 3
>>>> autovacuum_naptime = 300s
>>>> autovacuum_vacuum_threshold = 50
>>>> autovacuum_analyze_threshold = 50
>>>> autovacuum_vacuum_scale_factor = 0.2
>>>> autovacuum_analyze_scale_factor = 0.2
>>>> autovacuum_freeze_max_age = 200000000
>>>> autovacuum_multixact_freeze_max_age = 400000000
>>>> autovacuum_vacuum_cost_delay = 20ms
>>>> autovacuum_vacuum_cost_limit = -1
>>>>
>>>> With this configuration, I can observe that some tables are
>>>> auto-analysed, but some others are not. Even if there are millions of
>>>> insert operations on an empty table (all tables are in cluster mode).
>>>> In fact it seems that tables with update operations are the only ones
>>>> that are auto-analysed.
>>>> I'm quite suprised because the documentation says that daemon check
>>>> the
>>>> count of insert, update and delete operations.
>>>> What could it be the reason ? Why tables which have only update
>>>> operation, aren't analysed ?
>>>> Are update operations really taken into account ?
>>> Given that autoanalyze is pretty critical to the way the system
>>> functions,
>>> it's unlikely that it just doesn't work (someone else would have
>>> noticed).
>>>
>>> A more likely scenario is that you've found some extremely obscure edge
>>> case. If that's the case, you're going to have to give very specific
>>> details as to how you're testing it before anyone is liable to be able
>>> to help you.
>>>
>>> I get the impression that you're somewhat new to Postgres, in which
>>> case
>>> it's very likely that the problem is that you're not testing the
>>> situation
>>> correctly. In that case, we're going to need specific details on how
>>> you're
>>> observing that tables are or are not being analysed.
>>>
>>> As a wild-guess theory: the process that does the analyze only wakes up
>>> to check tables every 5 minutes (based on the config you show) ...
>>> so are
>>> you doing the inserts then checking the table without leaving enough
>>> time
>>> in between for the system to wake up and notice the change?
>>>
>> Thanks for your answer Bill.
>> Indeed, I'm pretty new to Postgres and I don't exclude that I'm doing
>> something wrong. But I did my test on a more than 1 day duration, so
>> it's not an issue of autovacuum_naptime (I insert 760 lignes each 30
>> seconds during 36 hours).
>> I can't give all the details of this test because it is to complicated
>> with triggers and partman (and your objective is not to solve
>> configuration issues of others).
>>
>> In fact, I was telling the question because I have read on some forums
>> that the auto vacuum deamon only count dead tuple so only update and
>> delete operations can cause the scheduling of auto-analyse.
>> So if it's the case it perfectly explain why my test case doesn't work.
>> But in the other hand the documentation says that delete, update and
>> insert operations are counted.
>> Is it an know issue that insert operations are not counted for the
>> trigger of auto-analyse ?
>
> No, see below:
>
> http://www.postgresql.org/docs/9.4/interactive/routine-vacuuming.html#AUTOVACUUM
>
>
> 23.1.3. Updating Planner Statistics
>
> "The autovacuum daemon, if enabled, will automatically issue ANALYZE
> commands whenever the content of a table has changed sufficiently.
> However, administrators might prefer to rely on manually-scheduled
> ANALYZE operations, particularly if it is known that update activity
> on a table will not affect the statistics of "interesting" columns.
> The daemon schedules ANALYZE strictly as a function of the number of
> rows inserted or updated; it has no knowledge of whether that will
> lead to meaningful statistical changes."
>
>
> Is the partman you refer to this?:
>
> https://github.com/keithf4/pg_partman
>
> Can you give an outline view of what you are doing and how you are
> determining the status of analyze?
>
>>
>> If it's not, I can try to reproduce this weird behaviour with a simpler
>> test and give you all the details of the test.
>>
>> Bertrand
>>
>>
>>
>>
>
>
I have done multiple tests trying to reproduce my issue and finally find
a partial explanation.

My non-working use case is :
With a database with configuration by default (autovacuum=on,
autovacuum_max_workers=3 ...)
A table ('table_name') partioned with partman with a child table per
month (@Adrian Klaver, yes it's https://github.com/keithf4/pg_partman)
A remote deamon that calls an analyze every day on the table 'table_name'
A remote application that calls multiple stored procedures doing an
insert operation on 'table_name' (785 inserts each 30s), with no delete,
no update operation
This stored procedures are sometimes blocked by a short lock (they are
doing some update operations on a cache table of 'table_name')

It happened that :
The current child table has no statistics :
- SELECT * FROM pg_stat WHERE tablename = '$table_name_month'; => empty row
- SELECT * FROM pg_stat_user_tables WHERE relname = '$table_name_month';
=> statistics are good (coherent with values I should have) but I saw no
value for last_autoanalyze or last_analyze (even if the test was running
for a few days and the table start with no data and grow to 6 million rows)

The behaviour of the auto-analyze I have observed :
Insert operations on a table trigger auto-analyze (with track_counts=on).
Stored procedure on a table trigger auto-analyze (even if
track_functions=off)
Insert operations on a partitioned table trigger auto-analyze.
Insert operations on a daily analyze table may not trigger auto-analyze,
because an analyze reset the count of operation for auto-analyze.
An analyze on a partitioned table update statistics of the parent table
but not for children tables.
An analyze on a partitioned table do not reset the count of operations
for auto-analyze of children tables.

I may not be able yet to explain the reason why the auto-analyze is
rarely not triggered.
But at least I found a way to fix pg_stat of the current table.

@Bill Moran, Are you sure that a dump of the table could help you ? The
values of the table are correct, the main issue is that statistics are
becoming invalid and so postgres is choosing dummy request paths.

Bertrand



pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: is there any difference DROP PRIMARY KEY in oracle and postgres?
Next
From: Michael Convey
Date:
Subject: Two different (postgresql & postgresql93) packages installed?