Re: Simple queries slowdown, maybe related to 3-minute long " in transaction"? - Mailing list pgsql-general

From Merlin Moncure
Subject Re: Simple queries slowdown, maybe related to 3-minute long " in transaction"?
Date
Msg-id CAHyXU0w4tD4_J0ZpeyPdb+V+6+BGkh8bLLTruWYgw4MsgfiMdw@mail.gmail.com
Whole thread Raw
In response to Simple queries slowdown, maybe related to 3-minute long " in transaction"?  (Dmitry Koterov <dmitry@koterov.ru>)
List pgsql-general
On Tue, Nov 20, 2012 at 1:25 PM, Dmitry Koterov <dmitry@koterov.ru> wrote:
> Hello.
>
> Sometimes I see a strange slowdown on my PG 9.1 server: it looks like the
> simplest queries which typically take 1ms or less (e.g. selection of a row
> by its primary key) take 300ms or even more. It is related to all queries
> within the connection, not the single one: once upon a time all fast queries
> start to work 300 times slower. In a couple of minutes the effect
> disappears. The machine is not overloaded, number of connections is not too
> high, mtr shows no packet loss.
>
> E.g. see the following log from a particular website page I've been
> refreshing (times are in milliseconds, you may see that typically all 22
> queries take near 55ms, but within 3 minutes - from 20:21 till 20:23 approx
> - even simplest queries took more than 300ms instead of 1ms, so the total
> time is many seconds).
>
> web-01: [2012-11-20 20:20:57] web07725] total=250        cpu=210
> php=203        dbq=22      db=55ms
> web-01: [2012-11-20 20:21:06] web07752] total=6254       cpu=220
> php=219        dbq=22      db=6035ms    <--
> web-01: [2012-11-20 20:21:48] web07718] total=7506       cpu=220
> php=222        dbq=22      db=7284ms    <--
> web-01: [2012-11-20 20:22:15] web07717] total=11143      cpu=224
> php=225        dbq=22      db=10918ms   <--
> web-01: [2012-11-20 20:23:44] web07725] total=4402       cpu=224
> php=220        dbq=22      db=4182ms    <--
> web-01: [2012-11-20 20:24:03] web07725] total=268        cpu=212
> php=211        dbq=22      db=57ms
>
> The machine is not overloaded during all this period (I've been logging
> "top" and "pstree" each 10 seconds, results are quite typical):
>
> Nov 20 20:23:07 pg-m03 mon: [Tue Nov 20 20:23:07 MSK 2012]
> Nov 20 20:23:07 pg-m03 mon: init-+-crond
> Nov 20 20:23:07 pg-m03 mon:      |-master-+-pickup
> Nov 20 20:23:07 pg-m03 mon:      |        `-qmgr
> Nov 20 20:23:07 pg-m03 mon:      |-postmaster---32*[postmaster]
> Nov 20 20:23:07 pg-m03 mon:
> |-screen---sh---su---bash---monitoring.sh-+-logger
> Nov 20 20:23:07 pg-m03 mon:      |
> `-monitoring.sh---pstree
> Nov 20 20:23:07 pg-m03 mon:      |-sshd---sshd---perl
> Nov 20 20:23:07 pg-m03 mon:      |-syslog-ng
> Nov 20 20:23:07 pg-m03 mon:      |-udevd
> Nov 20 20:23:07 pg-m03 mon:      |-vzctl---bash---mc---bash
> Nov 20 20:23:07 pg-m03 mon:      `-xinetd
> Nov 20 20:23:07 pg-m03 mon:
> Nov 20 20:23:08 pg-m03 mon: top - 20:23:08 up 113 days,  3:29,  0 users,
> load average: 2.16, 2.10, 2.00
> Nov 20 20:23:08 pg-m03 mon: Tasks:   1 total,   0 running,   1 sleeping,   0
> stopped,   0 zombie
> Nov 20 20:23:08 pg-m03 mon: Cpu(s): 23.8%us,  2.8%sy,  0.0%ni, 72.7%id,
> 0.8%wa,  0.0%hi,  0.0%si,  0.0%st
> Nov 20 20:23:08 pg-m03 mon: Mem:   1463616k total,  1097776k used,   365840k
> free,        0k buffers
>
> Also I've been logging all queries which are active in the database each 10
> seconds. During the slowdown the database was not overloaded, no strange
> queries, except one thing: during the whole slowdown period two pg processes
> was in "<IDLE> in transaction" state, and in 3 minutes (see 3rd column -
> it's a "time from start of the query") these processes disappeared, and the
> slowdown disappeared too:
>
>                                      | duration        | query
> Nov 20 20:22:05 pg-m03 mon:    21586 | 00:00:37.571057 | <IDLE> in
> transaction
> Nov 20 20:22:05 pg-m03 mon:    21567 | 00:00:37.557606 | <IDLE> in
> transaction
> Nov 20 20:22:36 pg-m03 mon:    21586 | 00:01:09.119639 | <IDLE> in
> transaction
> Nov 20 20:22:36 pg-m03 mon:    21567 | 00:01:09.106188 | <IDLE> in
> transaction
> Nov 20 20:22:47 pg-m03 mon:    21586 | 00:01:19.637888 | <IDLE> in
> transaction
> Nov 20 20:22:47 pg-m03 mon:    21567 | 00:01:19.624437 | <IDLE> in
> transaction
> Nov 20 20:22:57 pg-m03 mon:    21586 | 00:01:30.155492 | <IDLE> in
> transaction
> Nov 20 20:22:57 pg-m03 mon:    21567 | 00:01:30.142041 | <IDLE> in
> transaction
> Nov 20 20:24:21 pg-m03 mon:    21586 | 00:02:54.294799 | <IDLE> in
> transaction
> Nov 20 20:24:21 pg-m03 mon:    21567 | 00:02:54.281348 | <IDLE> in
> transaction
> Nov 20 20:24:31 - no more "<IDLE> in transaction" (seems they disappeared
> after 3 minutes timeout?)
>
> Maybe these are correlated? But how could it be that two "idle" processes
> cause slowdown of other connections (note: not "timeout", but exactly -
> slowdown from 1ms to 300ms)?
>
> What could be the problem of all these things? Such slowdown typically
> happens 2-3 times per day, no direct correlation with checkpoints, disks
> loads or something else. It just starts to work slow even on simplest
> queries, and then it restores normal operations.

Hm. Maybe application is leaking transaction and locks along with it?
Using any connection pooling -- in particular, one managed on client
side?  One of the first things to check in slow down period is
pg_locks view: both in terms of raw # of granted locks and any
ungranted ones.

merlin


pgsql-general by date:

Previous
From: "Rhys A.D. Stewart"
Date:
Subject: get column name passed to a function
Next
From: Adrian Klaver
Date:
Subject: Re: get column name passed to a function