Simple queries slowdown, maybe related to 3-minute long " in transaction"? - Mailing list pgsql-general

From Dmitry Koterov
Subject Simple queries slowdown, maybe related to 3-minute long " in transaction"?
Date
Msg-id CA+CZih6qz4aZKLoBJSfBNTy_RL2nZDqzE13_PprKE_X2--eLjw@mail.gmail.com
Whole thread Raw
Responses Re: Simple queries slowdown, maybe related to 3-minute long " in transaction"?
List pgsql-general
Hello.

Sometimes I see a strange slowdown on my PG 9.1 server: it looks like the simplest queries which typically take 1ms or less (e.g. selection of a row by its primary key) take 300ms or even more. It is related to all queries within the connection, not the single one: once upon a time all fast queries start to work 300 times slower. In a couple of minutes the effect disappears. The machine is not overloaded, number of connections is not too high, mtr shows no packet loss.

E.g. see the following log from a particular website page I've been refreshing (times are in milliseconds, you may see that typically all 22 queries take near 55ms, but within 3 minutes - from 20:21 till 20:23 approx - even simplest queries took more than 300ms instead of 1ms, so the total time is many seconds).

web-01: [2012-11-20 20:20:57] web07725] total=250        cpu=210             php=203        dbq=22      db=55ms
web-01: [2012-11-20 20:21:06] web07752] total=6254       cpu=220             php=219        dbq=22      db=6035ms    <--
web-01: [2012-11-20 20:21:48] web07718] total=7506       cpu=220             php=222        dbq=22      db=7284ms    <--
web-01: [2012-11-20 20:22:15] web07717] total=11143      cpu=224             php=225        dbq=22      db=10918ms   <--
web-01: [2012-11-20 20:23:44] web07725] total=4402       cpu=224             php=220        dbq=22      db=4182ms    <--
web-01: [2012-11-20 20:24:03] web07725] total=268        cpu=212             php=211        dbq=22      db=57ms

The machine is not overloaded during all this period (I've been logging "top" and "pstree" each 10 seconds, results are quite typical):

Nov 20 20:23:07 pg-m03 mon: [Tue Nov 20 20:23:07 MSK 2012]
Nov 20 20:23:07 pg-m03 mon: init-+-crond
Nov 20 20:23:07 pg-m03 mon:      |-master-+-pickup
Nov 20 20:23:07 pg-m03 mon:      |        `-qmgr
Nov 20 20:23:07 pg-m03 mon:      |-postmaster---32*[postmaster]
Nov 20 20:23:07 pg-m03 mon:      |-screen---sh---su---bash---monitoring.sh-+-logger
Nov 20 20:23:07 pg-m03 mon:      |                                         `-monitoring.sh---pstree
Nov 20 20:23:07 pg-m03 mon:      |-sshd---sshd---perl
Nov 20 20:23:07 pg-m03 mon:      |-syslog-ng
Nov 20 20:23:07 pg-m03 mon:      |-udevd
Nov 20 20:23:07 pg-m03 mon:      |-vzctl---bash---mc---bash
Nov 20 20:23:07 pg-m03 mon:      `-xinetd
Nov 20 20:23:07 pg-m03 mon:
Nov 20 20:23:08 pg-m03 mon: top - 20:23:08 up 113 days,  3:29,  0 users,  load average: 2.16, 2.10, 2.00
Nov 20 20:23:08 pg-m03 mon: Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Nov 20 20:23:08 pg-m03 mon: Cpu(s): 23.8%us,  2.8%sy,  0.0%ni, 72.7%id,  0.8%wa,  0.0%hi,  0.0%si,  0.0%st
Nov 20 20:23:08 pg-m03 mon: Mem:   1463616k total,  1097776k used,   365840k free,        0k buffers

Also I've been logging all queries which are active in the database each 10 seconds. During the slowdown the database was not overloaded, no strange queries, except one thing: during the whole slowdown period two pg processes was in "<IDLE> in transaction" state, and in 3 minutes (see 3rd column - it's a "time from start of the query") these processes disappeared, and the slowdown disappeared too:

                                     | duration        | query
Nov 20 20:22:05 pg-m03 mon:    21586 | 00:00:37.571057 | <IDLE> in transaction
Nov 20 20:22:05 pg-m03 mon:    21567 | 00:00:37.557606 | <IDLE> in transaction
Nov 20 20:22:36 pg-m03 mon:    21586 | 00:01:09.119639 | <IDLE> in transaction
Nov 20 20:22:36 pg-m03 mon:    21567 | 00:01:09.106188 | <IDLE> in transaction
Nov 20 20:22:47 pg-m03 mon:    21586 | 00:01:19.637888 | <IDLE> in transaction
Nov 20 20:22:47 pg-m03 mon:    21567 | 00:01:19.624437 | <IDLE> in transaction
Nov 20 20:22:57 pg-m03 mon:    21586 | 00:01:30.155492 | <IDLE> in transaction
Nov 20 20:22:57 pg-m03 mon:    21567 | 00:01:30.142041 | <IDLE> in transaction
Nov 20 20:24:21 pg-m03 mon:    21586 | 00:02:54.294799 | <IDLE> in transaction
Nov 20 20:24:21 pg-m03 mon:    21567 | 00:02:54.281348 | <IDLE> in transaction
Nov 20 20:24:31 - no more "<IDLE> in transaction" (seems they disappeared after 3 minutes timeout?)

Maybe these are correlated? But how could it be that two "idle" processes cause slowdown of other connections (note: not "timeout", but exactly - slowdown from 1ms to 300ms)?

What could be the problem of all these things? Such slowdown typically happens 2-3 times per day, no direct correlation with checkpoints, disks loads or something else. It just starts to work slow even on simplest queries, and then it restores normal operations.

pgsql-general by date:

Previous
From: Matthew Vernon
Date:
Subject: Re: COPY FROM in psql
Next
From: "Rhys A.D. Stewart"
Date:
Subject: get column name passed to a function