Re: How to analyze a slowdown in 9.3.5? - Mailing list pgsql-general

From Tomas Vondra
Subject Re: How to analyze a slowdown in 9.3.5?
Date
Msg-id 54B085ED.805@2ndquadrant.com
Whole thread Raw
In response to How to analyze a slowdown in 9.3.5?  (Michael Nolan <htfoot@gmail.com>)
Responses Re: How to analyze a slowdown in 9.3.5?  (Andy Colson <andy@squeakycode.net>)
Re: How to analyze a slowdown in 9.3.5?  (Michael Nolan <htfoot@gmail.com>)
List pgsql-general
On 9.1.2015 23:14, Michael Nolan wrote:
> I'm running 9.3.5 on a virtual machine with 5 cores and 24 GB of
> memory.  Disk is on a SAN.
>
> I have a task that runs weekly that processes possibly as many as
> 120 months worth of data, one month at a time. Since moving to 9.3.5
> (from 8.2!!) the average time for a month has been 3 minutes or less.

Congrats to migrating to a supported version!

Please, comparison of the configuration used on 8.2 and 9.3.5 would be
helpful (i.e. how you've updated the config on the new version?).

> However, when this job ran this Tuesday, it ran fine for a number of
> months, but then started slowing down dramatically, 300 minutes for
> one month and then 167 minutes for the next. I stopped and restarted
> postgresql, the next block also ran really slow (157 minutes.) I
> then rebooted the server and the remaining blocks ran at the usual
> fast speed again, so restarting postgresql didn't fix the problem
> but rebooting the server did.

What amounts of data are we talking about? Gigabytes? Tens of gigabytes?

>
> Looking at the logs, I see queries with a function call that would
> normally take no more than 100-200 milliseconds, usually far less,
> that were taking 100 seconds or longer. This function gets called
> thousands of times for each month, so that appears to be one source
> of the slowdown.

But why are the functions taking so much longer? Are they eating CPU,
I/O or are generally waiting for something (e.g. locks)?

>
> I don't suspect a memory leak in the calling program (in php),
> because since moving to this server in December this weekly task has
> run several times over the same range of months, making pretty much
> the same function calls each time. I also ran the entire range
> several times during testing.
>
> One change made to the server since the previous week's run was that
> I moved up to the latest Centos kernel (Linux version
> 3.10.0-123.13.2.el7.x86_64).

And what was the previous kernel version?

However, if it worked fine after rebooting the server, it may not be a
kernel issue (unless it somehow depends on uptime). Is there something
in the /var/log/messages?

> As far as I can tell, the other virtual servers weren't being slowed
> down, so I don't suspect problems with the virtual server or the SAN.
>
> If this happens again, what sorts of settings in postgresq.conf or
> other tools should I be using to try to track down what's causing
> this?

Well, we don't know what the function is doing, so it'd be nice to get
some basic description first. Is it querying the database? Is it
inserting or updating large amounts of data? Or is it something more
complex?

Aside from that, some basic system stats would be really helpful, so
that we can identify the bottleneck (is that a CPU, I/O, locking, ...).
A few lines from these commands should give us some basic idea:

  iostat -x -k 5
  vmstat -w 5

regards

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-general by date:

Previous
From: Brent Tubbs
Date:
Subject: unexpected PQresultStatus: 8 with simple logical replication
Next
From: Michael Paquier
Date:
Subject: Re: unexpected PQresultStatus: 8 with simple logical replication