Re: Occasional performance issue after changing table partitions - Mailing list pgsql-performance

From Nathan Ward
Subject Re: Occasional performance issue after changing table partitions
Date
Msg-id E7E17E09-3419-45B1-BFC0-B91D0AB15A07@daork.net
Whole thread Raw
In response to Re: Occasional performance issue after changing table partitions  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: Occasional performance issue after changing table partitions  (Nathan Ward <lists+postgresql@daork.net>)
List pgsql-performance
> On 11/07/2022, at 4:05 PM, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Mon, Jul 11, 2022 at 03:21:38PM +1200, Nathan Ward wrote:
>>> Note that postgres doesn't automatically analyze parent tables, so you should
>>> maybe do that whenever the data changes enough for it to matter.
>>
>> Hmm. This raises some stuff I’m not familiar with - does analysing a parent table do anything?
>
> Yes
>
> You could check if you have stats now (maybe due to a global ANALYZE or
> analyzedb) and how the query plans change if you analyze.
> The transaction may be overly conservative.
>
> SELECT COUNT(1) FROM pg_stats WHERE tablename=PARENT;
> SELECT last_analyze, last_autoanalyze, relname FROM pg_stat_all_tables WHERE relname=PARENT;
> begin;
> SET default_statistics_target=10;
> ANALYZE;
> explain SELECT [...];
> rollback;

I have a development database which gets a mirror of about 50% of the data coming in, and ran a global ANALYZE earlier
on- and note that the disk IO is actually a lot higher since which is interesting and not desirable obviously, so I
havesome more fiddling to do.. 
The behaviour during the ANALYZE was very similar to what happens on my production database when things go funny
though,so, this feels like it’s getting me close. 

The above is going to be a bit tricky to do I think - the ingest process runs a stored procedure, and behaviour varies
quitea bit if I stick in synthetic values. 

I think probably my approach for now will be to turn on auto explain with some sampling, and see what happens.


Side note, in the auto_explain docs, there is a note in a callout box saying that log_analyze has a high impact even if
thequery isn’t logged - if I use sampling, is this still the case - i.e. all queries are impacted - or is it only the
sampledqueries? 

>> I got the impression that analysing the parent was just shorthand for analysing all of the attached partitions.
>
> Could you let us know if the documentation left that impression ?
>
> See here (this was updated recently).
>
> https://www.postgresql.org/docs/13/sql-analyze.html#id-1.9.3.46.8
>
> For partitioned tables, ANALYZE gathers statistics by sampling rows from all partitions; in addition, it will recurse
intoeach partition and update its statistics. Each leaf partition is analyzed only once, even with multi-level
partitioning.No statistics are collected for only the parent table (without data from its partitions), because with
partitioningit's guaranteed to be empty. 
>
> By contrast, if the table being analyzed has inheritance children, ANALYZE gathers two sets of statistics: one on the
rowsof the parent table only, and a second including rows of both the parent table and all of its children. This second
setof statistics is needed when planning queries that process the inheritance tree as a whole. The child tables
themselvesare not individually analyzed in this case. 
>
> The autovacuum daemon does not process partitioned tables, nor does it process inheritance parents if only the
childrenare ever modified. It is usually necessary to periodically run a manual ANALYZE to keep the statistics of the
tablehierarchy up to date. 


It was this part:
“””
No statistics are collected for *only* the parent table (without data from its partitions), because with partitioning
it'sguaranteed to be empty. 
“””

Emphasis around “only” is mine - I think my brain skipped that word, but, it’s obviously critical.

I also note this:
“””
It is usually necessary to periodically run a manual ANALYZE to keep the statistics of the table hierarchy up to date.
“””
This seems really important and is something I was entirely unaware of - maybe this should be in one of those callout
boxes.


>> Perhaps because I attach a table with data, the parent sometimes decides it needs to run analyse on a bunch of
things?
>
> No, that doesn't happen.

Ack.

>> Or, maybe it uses the most recently attached partition, with bad statistics, to plan queries that only touch other
partitions?
>
> This is closer to what I was talking about.
>
> To be clear, you are using relkind=p partitions (added in v10), and not
> inheritance parents, right ?

Yes, relkind=p.

--
Nathan Ward




pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Occasional performance issue after changing table partitions
Next
From: arjun shetty
Date:
Subject: functionality difference-performance postgreSQLv14-GCC-llvm-clang