Re: Occasional performance issue after changing table partitions - Mailing list pgsql-performance

From Nathan Ward
Subject Re: Occasional performance issue after changing table partitions
Date
Msg-id 4954694D-9E02-4209-815B-07903F57779A@daork.net
Whole thread Raw
In response to Re: Occasional performance issue after changing table partitions  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: Occasional performance issue after changing table partitions  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-performance
> On 11/07/2022, at 2:05 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Sun, Jul 10, 2022 at 04:55:34PM +1200, Nathan Ward wrote:
>> I am running Postgres 13 on CentOS 7, installed from the yum.postgresql.org <http://yum.postgresql.org/> repo.
>
> It doesn't sound relevant, but what kind of storage systems is postgres using ?
> Filesystem, raid, device.

It’s an NVME SSD backed SAN over 2x16G fibre channel. The postgres server is in a VM (vmware). I can pretty comfortably
do10Gbit/s or more to the disk (I’ve only personally tested that, because initial replication from other sites runs at
aroundthat sort of speed limited by ethernet interfaces). The normal IO is between 1MB/s and maybe 15MB/s - writes
only.Reads are pretty minimal. 
FS is XFS.

> Is the high CPU use related to to autovacuum/autoanalyze ?

Good question - I don’t know. I’ve set the server to debug1 log level so I can see that - I see you have some notes
belowabout autovacuum logs so I’ll see what that shows me. 
Since setting that log level I haven’t yet had the issue occur - I watched it tick over midnight last night and it was
normal(as it is, most days). 

>> The issue I am having, is that when the daily data usage aggregation runs, sometimes we have a big performance
impact,with the following characteristics which happen *after* the aggregation job runs in it usual fast time of 12s or
so:
>> - The aggregation runs fast as per normal
>> - Load on the server goes to 30-40 - recall we have quite high “max connections” to keep throughput high when the
clientis far (16ms) from the server 
>
> I suggest to install and enable autoexplain to see what's running slowly here,
> and what its query plans are.  It seems possible that when the daily
> aggregation script drops the old partitions, the plan changes for the worse.
> I'm not sure what the fix is - maybe you just need to run vacuum or analyze on
> the new partitions soon after populating them.

Hmm, I’ll check it out. I hadn’t thought that the query planner could be doing something different, that’s a good
point.

Note that the normal data ingest queries don’t hit the new partition - the new partition is for data 3 days ago, and
theingest queries only hit the partitions covering the last ~2 hours. 

> For good measure, also set log_autovacuum_min_duration=0 (or something other
> than -1) (and while you're at it, log_checkpoints=on, and log_lock_waits=on if
> you haven't already).

Wilco.

> Note that postgres doesn't automatically analyze parent tables, so you should
> maybe do that whenever the data changes enough for it to matter.

Hmm. This raises some stuff I’m not familiar with - does analysing a parent table do anything? I got the impression
thatanalysing the parent was just shorthand for analysing all of the attached partitions. 

Perhaps because I attach a table with data, the parent sometimes decides it needs to run analyse on a bunch of things?
Or, maybe it uses the most recently attached partition, with bad statistics, to plan queries that only touch other
partitions?

--
Nathan Ward




pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Occasional performance issue after changing table partitions
Next
From: Justin Pryzby
Date:
Subject: Re: Occasional performance issue after changing table partitions