Re: Auto-analyse on insert operations - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Auto-analyse on insert operations
Date
Msg-id 563D5430.3030502@aklaver.com
Whole thread Raw
In response to Re: Auto-analyse on insert operations  (Bertrand Roos <bertrand.roos@areal.fr>)
List pgsql-general
On 11/06/2015 09:13 AM, Bertrand Roos wrote:
>
>
> Le 04/11/2015 16:57, Adrian Klaver a écrit :
>> On 11/04/2015 07:43 AM, Bertrand Roos wrote:
>>>
>>>
>>> Le 04/11/2015 14:55, Bill Moran a écrit :
>>>> On Wed, 4 Nov 2015 14:32:37 +0100
>>>> Bertrand Roos <bertrand.roos@areal.fr> wrote:
>>>>> I try to configure auto-analyse task with postgresql 9.4.
>>>>> I have the following configuration (default configuration):
>>>>> track_counts = on
>>>>> autovacuum = on
>>>>> log_autovacuum_min_duration = -1
>>>>> autovacuum_max_workers = 3
>>>>> autovacuum_naptime = 300s
>>>>> autovacuum_vacuum_threshold = 50
>>>>> autovacuum_analyze_threshold = 50
>>>>> autovacuum_vacuum_scale_factor = 0.2
>>>>> autovacuum_analyze_scale_factor = 0.2
>>>>> autovacuum_freeze_max_age = 200000000
>>>>> autovacuum_multixact_freeze_max_age = 400000000
>>>>> autovacuum_vacuum_cost_delay = 20ms
>>>>> autovacuum_vacuum_cost_limit = -1
>>>>>
>>>>> With this configuration, I can observe that some tables are
>>>>> auto-analysed, but some others are not. Even if there are millions of
>>>>> insert operations on an empty table (all tables are in cluster mode).
>>>>> In fact it seems that tables with update operations are the only ones
>>>>> that are auto-analysed.
>>>>> I'm quite suprised because the documentation says that daemon check
>>>>> the
>>>>> count of insert, update and delete operations.
>>>>> What could it be the reason ? Why tables which have only update
>>>>> operation, aren't analysed ?
>>>>> Are update operations really taken into account ?
>>>> Given that autoanalyze is pretty critical to the way the system
>>>> functions,
>>>> it's unlikely that it just doesn't work (someone else would have
>>>> noticed).
>>>>
>>>> A more likely scenario is that you've found some extremely obscure edge
>>>> case. If that's the case, you're going to have to give very specific
>>>> details as to how you're testing it before anyone is liable to be able
>>>> to help you.
>>>>
>>>> I get the impression that you're somewhat new to Postgres, in which
>>>> case
>>>> it's very likely that the problem is that you're not testing the
>>>> situation
>>>> correctly. In that case, we're going to need specific details on how
>>>> you're
>>>> observing that tables are or are not being analysed.
>>>>
>>>> As a wild-guess theory: the process that does the analyze only wakes up
>>>> to check tables every 5 minutes (based on the config you show) ...
>>>> so are
>>>> you doing the inserts then checking the table without leaving enough
>>>> time
>>>> in between for the system to wake up and notice the change?
>>>>
>>> Thanks for your answer Bill.
>>> Indeed, I'm pretty new to Postgres and I don't exclude that I'm doing
>>> something wrong. But I did my test on a more than 1 day duration, so
>>> it's not an issue of autovacuum_naptime (I insert 760 lignes each 30
>>> seconds during 36 hours).
>>> I can't give all the details of this test because it is to complicated
>>> with triggers and partman (and your objective is not to solve
>>> configuration issues of others).
>>>
>>> In fact, I was telling the question because I have read on some forums
>>> that the auto vacuum deamon only count dead tuple so only update and
>>> delete operations can cause the scheduling of auto-analyse.
>>> So if it's the case it perfectly explain why my test case doesn't work.
>>> But in the other hand the documentation says that delete, update and
>>> insert operations are counted.
>>> Is it an know issue that insert operations are not counted for the
>>> trigger of auto-analyse ?
>>
>> No, see below:
>>
>> http://www.postgresql.org/docs/9.4/interactive/routine-vacuuming.html#AUTOVACUUM
>>
>>
>> 23.1.3. Updating Planner Statistics
>>
>> "The autovacuum daemon, if enabled, will automatically issue ANALYZE
>> commands whenever the content of a table has changed sufficiently.
>> However, administrators might prefer to rely on manually-scheduled
>> ANALYZE operations, particularly if it is known that update activity
>> on a table will not affect the statistics of "interesting" columns.
>> The daemon schedules ANALYZE strictly as a function of the number of
>> rows inserted or updated; it has no knowledge of whether that will
>> lead to meaningful statistical changes."
>>
>>
>> Is the partman you refer to this?:
>>
>> https://github.com/keithf4/pg_partman
>>
>> Can you give an outline view of what you are doing and how you are
>> determining the status of analyze?
>>
>>>
>>> If it's not, I can try to reproduce this weird behaviour with a simpler
>>> test and give you all the details of the test.
>>>
>>> Bertrand
>>>
>>>
>>>
>>>
>>
>>
> I have done multiple tests trying to reproduce my issue and finally find
> a partial explanation.
>
> My non-working use case is :
> With a database with configuration by default (autovacuum=on,
> autovacuum_max_workers=3 ...)
> A table ('table_name') partioned with partman with a child table per
> month (@Adrian Klaver, yes it's https://github.com/keithf4/pg_partman)
> A remote deamon that calls an analyze every day on the table 'table_name'

So the above is a manual analyze run by another program correct?

> A remote application that calls multiple stored procedures doing an
> insert operation on 'table_name' (785 inserts each 30s), with no delete,
> no update operation

So a batch of 785 inserts over 30 seconds?

> This stored procedures are sometimes blocked by a short lock (they are
> doing some update operations on a cache table of 'table_name')

I not following the above.

You have another table that 'shadows' the 'table_name' table?

>
> It happened that :
> The current child table has no statistics :
> - SELECT * FROM pg_stat WHERE tablename = '$table_name_month'; => empty row

You are actually talking about pg_stats above and below correct?

> - SELECT * FROM pg_stat_user_tables WHERE relname = '$table_name_month';
> => statistics are good (coherent with values I should have) but I saw no
> value for last_autoanalyze or last_analyze (even if the test was running
> for a few days and the table start with no data and grow to 6 million rows)

So what user are you running the above as and is it one that has
permissions on 'table_name'?

>
> The behaviour of the auto-analyze I have observed :
> Insert operations on a table trigger auto-analyze (with track_counts=on).
> Stored procedure on a table trigger auto-analyze (even if
> track_functions=off)

Well:

http://www.postgresql.org/docs/9.4/static/runtime-config-statistics.html#GUC-TRACK-FUNCTIONS

  track_functions (enum)

     Enables tracking of function call counts and time used. Specify pl
to track only procedural-language functions, all to also track SQL and C
language functions. The default is none, which disables function
statistics tracking. Only superusers can change this setting.

So it tracks(or not) function usage not what the function does. If the
function is doing INSERT/UPDATE/DELETEs then that is covered by
track_counts.

> Insert operations on a partitioned table trigger auto-analyze.
> Insert operations on a daily analyze table may not trigger auto-analyze,
> because an analyze reset the count of operation for auto-analyze.
> An analyze on a partitioned table update statistics of the parent table
> but not for children tables.
> An analyze on a partitioned table do not reset the count of operations
> for auto-analyze of children tables.

I do not do enough partitioning to be of much help here. One thing I
would say is that it might be helpful to see the schema partman sets up
when it does the partitioning.

>
> I may not be able yet to explain the reason why the auto-analyze is
> rarely not triggered.
> But at least I found a way to fix pg_stat of the current table.
>
> @Bill Moran, Are you sure that a dump of the table could help you ? The
> values of the table are correct, the main issue is that statistics are
> becoming invalid and so postgres is choosing dummy request paths.
>
> Bertrand
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Michael Convey
Date:
Subject: Two different (postgresql & postgresql93) packages installed?
Next
From: Adrian Klaver
Date:
Subject: Re: Two different (postgresql & postgresql93) packages installed?