Re: Occasional spike in query response time (jumps from 200ms to 45 seconds) - Mailing list pgsql-admin

From Kevin Grittner
Subject Re: Occasional spike in query response time (jumps from 200ms to 45 seconds)
Date
Msg-id 1383835200.12074.YahooMailNeo@web162905.mail.bf1.yahoo.com
Whole thread Raw
In response to Occasional spike in query response time (jumps from 200ms to 45 seconds)  (Robert Voinea <rvoinea@gmail.com>)
Responses Re: Occasional spike in query response time (jumps from 200ms to 45 seconds)  (Robert Voinea <rvoinea@gmail.com>)
List pgsql-admin
Robert Voinea <rvoinea@gmail.com> wrote:

> From time to time (1-2 months) one of the production systems I
> manage starts acting crazy... and this is starting to become a
> problem.
>
> Every query I send to the server has a very long running time
> (sometimes it reaches 45+ seconds). Even the simples queries like
> "SELECT NOW()" run in 4-5 seconds... or more.
>
> This goes on for a few minutes. After that, everything is back to
> normal.

A likely cause for this is transparent huge page defrag.  If you
run `vmstat 1` during an episode and see a lot of system CPU time,
that tends to confirm this.  If you run `perf top` during an
episode and see the kernel spending a lot of time in spinlock
functions, that pretty much nails it.  You might want to turn off
transparent huge page support.

> I'm guessing that, somehow, there is heavy I/O usage at that time

Which might be another possibility if the `vmstat 1` shows a lot of
I/O wait time.

> I've seen that postgres 9.1.8 Changelog contains some bug fixes
> related to performance of autovaccum:
>
> // BEGIN QUOTE
> Fix performance problems with autovacuum truncation in busy
> workloads (Jan Wieck)
>
> Truncation of empty pages at the end of a table requires
> exclusive lock, but autovacuum was coded to fail (and release the
> table lock) when there are conflicting lock requests. Under load,
> it is easily possible that truncation would never occur,
> resulting in table bloat. Fix by performing a partial truncation,
> releasing the lock, then attempting to re-acquire the lock and
> continue. This fix also greatly reduces the average time before
> autovacuum releases the lock after a conflicting request arrives.
>
> Fix error in vacuum_freeze_table_age implementation (Andres
> Freund)
>
> In installations that have existed for more than
> vacuum_freeze_min_age transactions, this mistake prevented
> autovacuum from using partial-table scans, so that a full-table
> scan would always happen instead.
> // END QUOTE

If there is high I/O and you see autovacuum running wraparound
prevention vacuums or vacuuming tables which have recently had a
large number of rows deleted, you might solve the problem by
updating.  I would strongly recommend updating anyway, because
otherwise you are running with known bugs and security
vulnerabilities, including one very serious one.

Why knowingly put yourself through dealing with bugs that others
have already encountered, reported, diagnosed, and fixed?

http://www.postgresql.org/support/versioning/

http://www.postgresql.org/support/security/faq/2013-04-04/

Note that in PostgreSQL, a minor release does not contain any new
features or behavior changes except those deemed to be clear bugs.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-admin by date:

Previous
From: bricklen
Date:
Subject: Re: Occasional spike in query response time (jumps from 200ms to 45 seconds)
Next
From: bricklen
Date:
Subject: Re: Occasional spike in query response time (jumps from 200ms to 45 seconds)