Connections getting stuck sending data to client - Mailing list pgsql-general

From Chris Butler
Subject Connections getting stuck sending data to client
Date
Msg-id 20081031153635.GM4519@zedcore.com
Whole thread Raw
Responses Re: Connections getting stuck sending data to client  ("Scott Marlowe" <scott.marlowe@gmail.com>)
List pgsql-general
I've been having intermittent problems with our DB server (running
postgresql 8.3.3) reaching its connection limit, all because of a SELECT
statement that's stuck while sending data. This gets stuck because there's a
transaction waiting to do an ALTER TABLE, then the subsequent SELECTs wait
for the ALTER.

The problem seems to be that the client connection drops while the server is
sending data to the client. On the server I end up with:

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address             State
[...]
tcp        1  17376 db.zedcore.com:postgres     portia.healthjobsuk.c:36520 CLOSE_WAIT

By the time our monitoring system has picked up the problem, netstat on the
client contains no trace of the offending port.

As you can see, the Send-Q is quite full. Also, if I strace the postgres
process, I get:

[root@db2:~]# strace -fp 13572
Process 13572 attached - interrupt to quit
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 1) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xf)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xf)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0) = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xf)                       = 44
sendto(8, "\0\tPermanent\0\0\0\0010\0\0\0\0010\0\0\0\n1225843"..., 7200, 0, NULL, 0

It appears that the server is blocking on the send operation, I guess
because the send buffer is already full. Also, I was unable to kill the
backend process with either SIGINT or TERM (possibly a bug?).

A tcpdump of traffic for that port:

14:34:07.975493 IP db.zedcore.com.postgres > portia.healthjobsuk.com.36520: . 2086302610:2086304058(1448) ack 381974125
win78 <nop,nop,timestamp 3025282443 595580714> 
14:34:07.976483 IP portia.healthjobsuk.com > db.zedcore.com: ICMP portia.healthjobsuk.com tcp port 36520 unreachable,
length556 
14:36:07.960308 IP db.zedcore.com.postgres > portia.healthjobsuk.com.36520: . 0:1448(1448) ack 1 win 78
<nop,nop,timestamp3025402443 595580714> 
14:36:07.961565 IP portia.healthjobsuk.com > db.zedcore.com: ICMP portia.healthjobsuk.com tcp port 36520 unreachable,
length556 


A bit of googling led me towards changing the TCP keepalive settings in
postgresql.conf, although I don't think they've made any difference.

Does anyone have any ideas what's happening, and whether there's anything I
can do to stop the problem?

--
Chris Butler
Zedcore Systems Ltd
UK tel: 0114 238 1828 ext 72

pgsql-general by date:

Previous
From: "tomas.kejzlar@gmail.com"
Date:
Subject: plruby on windows
Next
From: Joao Ferreira
Date:
Subject: Re: speed up restore from dump