Re: Occasional spike in query response time (jumps from 200ms to 45 seconds) - Mailing list pgsql-admin

From Robert Voinea
Subject Re: Occasional spike in query response time (jumps from 200ms to 45 seconds)
Date
Msg-id 13209807.nWtyJFh2Fe@shu
Whole thread Raw
In response to Re: Occasional spike in query response time (jumps from 200ms to 45 seconds)  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-admin
Hi

On Thursday 07 November 2013 06:40:00 Kevin Grittner wrote:
> Robert Voinea <rvoinea@gmail.com> wrote:
> > From time to time (1-2 months) one of the production systems I
> > manage starts acting crazy... and this is starting to become a
> > problem.
> >
> > Every query I send to the server has a very long running time
> > (sometimes it reaches 45+ seconds). Even the simples queries like
> > "SELECT NOW()" run in 4-5 seconds... or more.
> >
> > This goes on for a few minutes. After that, everything is back to
> > normal.
>
> A likely cause for this is transparent huge page defrag.  If you
> run `vmstat 1` during an episode and see a lot of system CPU time,
> that tends to confirm this.  If you run `perf top` during an
> episode and see the kernel spending a lot of time in spinlock
> functions, that pretty much nails it.  You might want to turn off
> transparent huge page support.
I need to read about this first.

I forgot to mention that this is a 32 bit system... and due to the known
limitations we are forced to use PAE on the kernel...
Unfortunately we cannot use a 64 bit operating system.

> > I'm guessing that, somehow, there is heavy I/O usage at that time
>
> Which might be another possibility if the `vmstat 1` shows a lot of
> I/O wait time.
Another colleague of mine managed to somehow reproduce the issue.
It seems that the disk + network interface could be the bottle neck (iostat &
iotop shows average wait times very high) and the peak throughput of the disk
at 70MB/s.
This is not the actual production system, but a similar one (as hardware).
We'll try to re-do the tests with the clients and the server connected via a
Gigabit network and see if this helps.

> > I've seen that postgres 9.1.8 Changelog contains some bug fixes
> > related to performance of autovaccum:
> If there is high I/O and you see autovacuum running wraparound
> prevention vacuums or vacuuming tables which have recently had a
> large number of rows deleted, you might solve the problem by
> updating.  I would strongly recommend updating anyway, because
> otherwise you are running with known bugs and security
> vulnerabilities, including one very serious one.
>
> Why knowingly put yourself through dealing with bugs that others
> have already encountered, reported, diagnosed, and fixed?
I agree with you on this, but the update requires downtime and has to be
scheduled in advance.

Thank you for replying.
I will investigate this even further and see what I may come up with.
I'll keep you informed with what I find.

--
Robert Voinea
Software Engineer
+4 0740 467 262

Don't take life too seriously. You'll never get out of it alive.
(Elbert Hubbard)



pgsql-admin by date:

Previous
From: Robert Voinea
Date:
Subject: Re: Occasional spike in query response time (jumps from 200ms to 45 seconds)
Next
From: Nagaraj Shindagi
Date:
Subject: during the maintenance facing error