Thread: CentOS & PostgreSQL help re: TIME_WAIT
Zombie PostgreSQL processes in a "TIME_WAIT" state are consuming all available sockets on a web server I'm running. I've Googled & RTFM'ed but am still stumped. Sure would appreciate any ideas. I've recently migrated a PHP-based web app running against PostgreSQL from a single server running FreeBSD to a cluster consisting of: - two virtual machines, both running CentOS 5.4, Linux version 2.6.18-14.10.1.el5 both with 3 Gb RAM allocated, both with two dual-core Intel processors allocated. - the web server is running Apache 2.2.14 & PHP 5.31. - the database server is running PostgreSQL 8.4.1, with pg_hba.conf set up to trust the webserver on port 5432. - both Apache & PostgreSQL are set to accept 225 max connections, otherwise the conf's are pretty much default. - web server is running OpenSSL for secure login, but serving general html pages without https. - tcp_keepalive_time in both is default 7200 seconds (which, as I read in various posts, etc., shouldn't really matter anyway, but...) Various posts suggest that this could be a PHP programming issues, but as the problem just surfaced with the migration, I'm inclined to think it's probably either a PostgreSQL configuration issue or something related to the OS? A cron job restarting Apache every hour is keeping the webserver alive, but I'd sure like a better solution... Any ideas would be greatly appreciated... Thanks!
"Reggie Euser" <reggie@busicast.com> writes: > Zombie PostgreSQL processes in a "TIME_WAIT" state are consuming all > available sockets on a web server I'm running. I've Googled & RTFM'ed but am > still stumped. Sure would appreciate any ideas. That seems a bit confused. There's no such thing as a "process in a TIME_WAIT state". A TCP network socket could be in TIME_WAIT but it's not a process, and certainly not zombie. Please be a little clearer. In general, sockets sitting a long time in TIME_WAIT would be a network problem. That state means the user process already closed the socket and the network stack is waiting for the other end to acknowledge connection closure. If it's not getting the ACK then you have either buggy network code in one kernel or the other, or a network-level problem (maybe an overaggressive firewall in between?). Postgres processes sitting in zombie state would indicate that the postmaster has somehow gotten wedged and is failing to notice its dead children. That shouldn't happen really --- are you still able to make connections to the database? It doesn't seem like there'd be any direct linkage between that and a network problem, but ... regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > In general, sockets sitting a long time in TIME_WAIT would be a > network problem. That state means the user process already closed > the socket and the network stack is waiting for the other end to > acknowledge connection closure. If it's not getting the ACK then > you have either buggy network code in one kernel or the other, or > a network-level problem (maybe an overaggressive firewall in > between?). Not to discount those possibilities, but I've seen one other cause: a storm of connection attempts. It could be a DoS attack or a poorly written client. -Kevin
Thanks to both of you. Kevin, it's not a DoS, I'm certain; client problem, maybe . Tom - sorry for my confusion. I'm chasing the network/firewall possibilities as most likely causes. PostgreSQL is running quite smoothly. FWIW, what little I know about PostgreSQL, I've picked up by using it, reading documentation on the web and, most helpfully, reading your comments. Many thanks again. ----- Original Message ----- From: "Kevin Grittner" <Kevin.Grittner@wicourts.gov> > > Not to discount those possibilities, but I've seen one other cause: > a storm of connection attempts. It could be a DoS attack or a > poorly written client. > > -Kevin >"Reggie Euser" <reggie@busicast.com> writes: >> Zombie PostgreSQL processes in a "TIME_WAIT" state are consuming all >> available sockets on a web server I'm running. I've Googled & RTFM'ed but >> am > >still stumped. Sure would appreciate any ideas. That seems a bit confused. There's no such thing as a "process in a TIME_WAIT state". A TCP network socket could be in TIME_WAIT but it's not a process, and certainly not zombie. Please be a little clearer. In general, sockets sitting a long time in TIME_WAIT would be a network problem. That state means the user process already closed the socket and the network stack is waiting for the other end to acknowledge connection closure. If it's not getting the ACK then you have either buggy network code in one kernel or the other, or a network-level problem (maybe an overaggressive firewall in between?). Postgres processes sitting in zombie state would indicate that the postmaster has somehow gotten wedged and is failing to notice its dead children. That shouldn't happen really --- are you still able to make connections to the database? It doesn't seem like there'd be any direct linkage between that and a network problem, but ... regards, tom lane
Do you have an F5 load balancer in front of these web servers? Renato Oliveira Systems Administrator e-mail: renato.oliveira@grant.co.uk Tel: +44 (0)1763 260811 Fax: +44 (0)1763 262410 http://www.grant.co.uk/ Grant Instruments (Cambridge) Ltd Company registered in England, registration number 658133 Registered office address: 29 Station Road, Shepreth, CAMBS SG8 6GB UK -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Reggie Euser Sent: 29 January 2010 15:14 To: pgsql-admin@postgresql.org Subject: Re: [ADMIN] CentOS & PostgreSQL help re: TIME_WAIT Thanks to both of you. Kevin, it's not a DoS, I'm certain; client problem, maybe . Tom - sorry for my confusion. I'm chasing the network/firewall possibilities as most likely causes. PostgreSQL is running quite smoothly. FWIW, what little I know about PostgreSQL, I've picked up by using it, reading documentation on the web and, most helpfully, reading your comments. Many thanks again. ----- Original Message ----- From: "Kevin Grittner" <Kevin.Grittner@wicourts.gov> > > Not to discount those possibilities, but I've seen one other cause: > a storm of connection attempts. It could be a DoS attack or a > poorly written client. > > -Kevin >"Reggie Euser" <reggie@busicast.com> writes: >> Zombie PostgreSQL processes in a "TIME_WAIT" state are consuming all >> available sockets on a web server I'm running. I've Googled & RTFM'ed but >> am > >still stumped. Sure would appreciate any ideas. That seems a bit confused. There's no such thing as a "process in a TIME_WAIT state". A TCP network socket could be in TIME_WAIT but it's not a process, and certainly not zombie. Please be a little clearer. In general, sockets sitting a long time in TIME_WAIT would be a network problem. That state means the user process already closed the socket and the network stack is waiting for the other end to acknowledge connection closure. If it's not getting the ACK then you have either buggy network code in one kernel or the other, or a network-level problem (maybe an overaggressive firewall in between?). Postgres processes sitting in zombie state would indicate that the postmaster has somehow gotten wedged and is failing to notice its dead children. That shouldn't happen really --- are you still able to make connections to the database? It doesn't seem like there'd be any direct linkage between that and a network problem, but ... regards, tom lane -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin -----Original Message----- P Please consider the environment before printing this email CONFIDENTIALITY: The information in this e-mail and any attachments is confidential. It is intended only for the named recipients(s).If you are not the named recipient please notify the sender immediately and do not disclose the contents toanother person or take copies. VIRUSES: The contents of this e-mail or attachment(s) may contain viruses which could damage your own computer system. WhilstGrant Instruments (Cambridge) Ltd has taken every reasonable precaution to minimise this risk, we cannot accept liabilityfor any damage which you sustain as a result of software viruses. You should therefore carry out your own viruschecks before opening the attachment(s). OpenXML: For information about the OpenXML file format in use within Grant Instruments please visit our http://www.grant.co.uk/Support/openxml.html
On Thu, Jan 28, 2010 at 8:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > In general, sockets sitting a long time in TIME_WAIT would be a network > problem. That state means the user process already closed the socket > and the network stack is waiting for the other end to acknowledge > connection closure. I think you're describing FIN_WAIT. TIME_WAIT is after the finack has been sent and the connection is well and truly dead. The same host/port pair can't be reused for 2*msl in case the finack needs to be resent or a duplicate fin arrives. Normally the server uses SO_LINGER and skips TIME_WAIT so it can listen on the same port immediately and accept more connections. So only the client enters TIME_WAIT and its for some random high-numbered port that the OS won't hand out until it expires. If you're seeing the postgres *server* with sockets in TIME_WAIT state for port 5432 or whatever your postgres port is then I think that's a bug and you should report it in more detail. please send the output of netstat -an or whatever data you have showing the problem. If you're seeing the web server's outgoing ports in TIME_WAIT state then I think that's normal and shouldn't be causing you problems. -- greg
Greg Stark <gsstark@mit.edu> writes: > On Thu, Jan 28, 2010 at 8:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> In general, sockets sitting a long time in TIME_WAIT would be a network >> problem. �That state means the user process already closed the socket >> and the network stack is waiting for the other end to acknowledge >> connection closure. > I think you're describing FIN_WAIT. TIME_WAIT is after the finack has > been sent and the connection is well and truly dead. The same > host/port pair can't be reused for 2*msl in case the finack needs to > be resent or a duplicate fin arrives. > Normally the server uses SO_LINGER and skips TIME_WAIT so it can > listen on the same port immediately and accept more connections. So > only the client enters TIME_WAIT and its for some random high-numbered > port that the OS won't hand out until it expires. Hmm. This may well be a platform-specific behavior of the network stack. On my mail machine (running a pretty ancient HPUX release) I can normally see a dozen or so inbound connections to port 25 that are sitting in TIME_WAIT state, probably reflecting spambots that didn't bother to close the connection gracefully. It's possible that a newer network stack would bypass that state. But in any case the userland process has definitely dropped the connection, right? So it's not Postgres' issue, it's a networking issue. regards, tom lane
No... ----- Original Message ----- From: "Renato Oliveira" <renato.oliveira@grant.co.uk> To: "Reggie Euser" <reggie@busicast.com>; <pgsql-admin@postgresql.org> Sent: Friday, January 29, 2010 10:15 AM Subject: Re: [ADMIN] CentOS & PostgreSQL help re: TIME_WAIT Do you have an F5 load balancer in front of these web servers? Renato Oliveira Systems Administrator e-mail: renato.oliveira@grant.co.uk Tel: +44 (0)1763 260811 Fax: +44 (0)1763 262410 http://www.grant.co.uk/ Grant Instruments (Cambridge) Ltd Company registered in England, registration number 658133 Registered office address: 29 Station Road, Shepreth, CAMBS SG8 6GB UK -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Reggie Euser Sent: 29 January 2010 15:14 To: pgsql-admin@postgresql.org Subject: Re: [ADMIN] CentOS & PostgreSQL help re: TIME_WAIT Thanks to both of you. Kevin, it's not a DoS, I'm certain; client problem, maybe . Tom - sorry for my confusion. I'm chasing the network/firewall possibilities as most likely causes. PostgreSQL is running quite smoothly. FWIW, what little I know about PostgreSQL, I've picked up by using it, reading documentation on the web and, most helpfully, reading your comments. Many thanks again. ----- Original Message ----- From: "Kevin Grittner" <Kevin.Grittner@wicourts.gov> > > Not to discount those possibilities, but I've seen one other cause: > a storm of connection attempts. It could be a DoS attack or a > poorly written client. > > -Kevin >"Reggie Euser" <reggie@busicast.com> writes: >> Zombie PostgreSQL processes in a "TIME_WAIT" state are consuming all >> available sockets on a web server I'm running. I've Googled & RTFM'ed but >> am > >still stumped. Sure would appreciate any ideas. That seems a bit confused. There's no such thing as a "process in a TIME_WAIT state". A TCP network socket could be in TIME_WAIT but it's not a process, and certainly not zombie. Please be a little clearer. In general, sockets sitting a long time in TIME_WAIT would be a network problem. That state means the user process already closed the socket and the network stack is waiting for the other end to acknowledge connection closure. If it's not getting the ACK then you have either buggy network code in one kernel or the other, or a network-level problem (maybe an overaggressive firewall in between?). Postgres processes sitting in zombie state would indicate that the postmaster has somehow gotten wedged and is failing to notice its dead children. That shouldn't happen really --- are you still able to make connections to the database? It doesn't seem like there'd be any direct linkage between that and a network problem, but ... regards, tom lane -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin -----Original Message----- P Please consider the environment before printing this email CONFIDENTIALITY: The information in this e-mail and any attachments is confidential. It is intended only for the named recipients(s). If you are not the named recipient please notify the sender immediately and do not disclose the contents to another person or take copies. VIRUSES: The contents of this e-mail or attachment(s) may contain viruses which could damage your own computer system. Whilst Grant Instruments (Cambridge) Ltd has taken every reasonable precaution to minimise this risk, we cannot accept liability for any damage which you sustain as a result of software viruses. You should therefore carry out your own virus checks before opening the attachment(s). OpenXML: For information about the OpenXML file format in use within Grant Instruments please visit our http://www.grant.co.uk/Support/openxml.html -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin