Re: Simple queries slowdown, maybe related to 3-minute long
" in transaction"? - Mailing list pgsql-general
From | Merlin Moncure |
---|---|
Subject | Re: Simple queries slowdown, maybe related to 3-minute long
" |
Date | |
Msg-id | CAHyXU0w4tD4_J0ZpeyPdb+V+6+BGkh8bLLTruWYgw4MsgfiMdw@mail.gmail.com Whole thread Raw |
In response to |
Simple queries slowdown, maybe related to 3-minute long " |
List | pgsql-general |
On Tue, Nov 20, 2012 at 1:25 PM, Dmitry Koterov <dmitry@koterov.ru> wrote: > Hello. > > Sometimes I see a strange slowdown on my PG 9.1 server: it looks like the > simplest queries which typically take 1ms or less (e.g. selection of a row > by its primary key) take 300ms or even more. It is related to all queries > within the connection, not the single one: once upon a time all fast queries > start to work 300 times slower. In a couple of minutes the effect > disappears. The machine is not overloaded, number of connections is not too > high, mtr shows no packet loss. > > E.g. see the following log from a particular website page I've been > refreshing (times are in milliseconds, you may see that typically all 22 > queries take near 55ms, but within 3 minutes - from 20:21 till 20:23 approx > - even simplest queries took more than 300ms instead of 1ms, so the total > time is many seconds). > > web-01: [2012-11-20 20:20:57] web07725] total=250 cpu=210 > php=203 dbq=22 db=55ms > web-01: [2012-11-20 20:21:06] web07752] total=6254 cpu=220 > php=219 dbq=22 db=6035ms <-- > web-01: [2012-11-20 20:21:48] web07718] total=7506 cpu=220 > php=222 dbq=22 db=7284ms <-- > web-01: [2012-11-20 20:22:15] web07717] total=11143 cpu=224 > php=225 dbq=22 db=10918ms <-- > web-01: [2012-11-20 20:23:44] web07725] total=4402 cpu=224 > php=220 dbq=22 db=4182ms <-- > web-01: [2012-11-20 20:24:03] web07725] total=268 cpu=212 > php=211 dbq=22 db=57ms > > The machine is not overloaded during all this period (I've been logging > "top" and "pstree" each 10 seconds, results are quite typical): > > Nov 20 20:23:07 pg-m03 mon: [Tue Nov 20 20:23:07 MSK 2012] > Nov 20 20:23:07 pg-m03 mon: init-+-crond > Nov 20 20:23:07 pg-m03 mon: |-master-+-pickup > Nov 20 20:23:07 pg-m03 mon: | `-qmgr > Nov 20 20:23:07 pg-m03 mon: |-postmaster---32*[postmaster] > Nov 20 20:23:07 pg-m03 mon: > |-screen---sh---su---bash---monitoring.sh-+-logger > Nov 20 20:23:07 pg-m03 mon: | > `-monitoring.sh---pstree > Nov 20 20:23:07 pg-m03 mon: |-sshd---sshd---perl > Nov 20 20:23:07 pg-m03 mon: |-syslog-ng > Nov 20 20:23:07 pg-m03 mon: |-udevd > Nov 20 20:23:07 pg-m03 mon: |-vzctl---bash---mc---bash > Nov 20 20:23:07 pg-m03 mon: `-xinetd > Nov 20 20:23:07 pg-m03 mon: > Nov 20 20:23:08 pg-m03 mon: top - 20:23:08 up 113 days, 3:29, 0 users, > load average: 2.16, 2.10, 2.00 > Nov 20 20:23:08 pg-m03 mon: Tasks: 1 total, 0 running, 1 sleeping, 0 > stopped, 0 zombie > Nov 20 20:23:08 pg-m03 mon: Cpu(s): 23.8%us, 2.8%sy, 0.0%ni, 72.7%id, > 0.8%wa, 0.0%hi, 0.0%si, 0.0%st > Nov 20 20:23:08 pg-m03 mon: Mem: 1463616k total, 1097776k used, 365840k > free, 0k buffers > > Also I've been logging all queries which are active in the database each 10 > seconds. During the slowdown the database was not overloaded, no strange > queries, except one thing: during the whole slowdown period two pg processes > was in "<IDLE> in transaction" state, and in 3 minutes (see 3rd column - > it's a "time from start of the query") these processes disappeared, and the > slowdown disappeared too: > > | duration | query > Nov 20 20:22:05 pg-m03 mon: 21586 | 00:00:37.571057 | <IDLE> in > transaction > Nov 20 20:22:05 pg-m03 mon: 21567 | 00:00:37.557606 | <IDLE> in > transaction > Nov 20 20:22:36 pg-m03 mon: 21586 | 00:01:09.119639 | <IDLE> in > transaction > Nov 20 20:22:36 pg-m03 mon: 21567 | 00:01:09.106188 | <IDLE> in > transaction > Nov 20 20:22:47 pg-m03 mon: 21586 | 00:01:19.637888 | <IDLE> in > transaction > Nov 20 20:22:47 pg-m03 mon: 21567 | 00:01:19.624437 | <IDLE> in > transaction > Nov 20 20:22:57 pg-m03 mon: 21586 | 00:01:30.155492 | <IDLE> in > transaction > Nov 20 20:22:57 pg-m03 mon: 21567 | 00:01:30.142041 | <IDLE> in > transaction > Nov 20 20:24:21 pg-m03 mon: 21586 | 00:02:54.294799 | <IDLE> in > transaction > Nov 20 20:24:21 pg-m03 mon: 21567 | 00:02:54.281348 | <IDLE> in > transaction > Nov 20 20:24:31 - no more "<IDLE> in transaction" (seems they disappeared > after 3 minutes timeout?) > > Maybe these are correlated? But how could it be that two "idle" processes > cause slowdown of other connections (note: not "timeout", but exactly - > slowdown from 1ms to 300ms)? > > What could be the problem of all these things? Such slowdown typically > happens 2-3 times per day, no direct correlation with checkpoints, disks > loads or something else. It just starts to work slow even on simplest > queries, and then it restores normal operations. Hm. Maybe application is leaking transaction and locks along with it? Using any connection pooling -- in particular, one managed on client side? One of the first things to check in slow down period is pg_locks view: both in terms of raw # of granted locks and any ungranted ones. merlin
pgsql-general by date: