Thread: CentOS & PostgreSQL help re: TIME_WAIT

CentOS & PostgreSQL help re: TIME_WAIT

From

"Reggie Euser"

Date:

28 January 2010, 16:36:47

Zombie PostgreSQL processes in a "TIME_WAIT" state are consuming all
available sockets on a web server I'm running. I've Googled & RTFM'ed but am
still stumped.  Sure would appreciate any ideas.

I've recently migrated a PHP-based web app running against PostgreSQL from a
single server running FreeBSD to a cluster consisting of:

-  two virtual machines, both running CentOS 5.4, Linux version
2.6.18-14.10.1.el5 both with 3 Gb RAM allocated, both with two dual-core
Intel processors allocated.

-  the web server is running Apache 2.2.14 & PHP 5.31.

-  the database server is running PostgreSQL 8.4.1, with pg_hba.conf set up
to trust the webserver on port 5432.

-  both Apache & PostgreSQL are set to accept 225 max connections, otherwise
the conf's are pretty much default.

-  web server is running OpenSSL for secure login, but serving general html
pages without https.

-  tcp_keepalive_time in both is default 7200 seconds (which, as I read in
various posts, etc., shouldn't really matter anyway, but...)

Various posts suggest that this could be a PHP programming issues, but as
the problem just surfaced with the migration, I'm inclined to think it's
probably either a PostgreSQL configuration issue or something related to the
OS?

A cron job restarting Apache every hour is keeping the webserver alive, but
I'd sure like a better solution...

Any ideas would be greatly appreciated...

Thanks!

Re: CentOS & PostgreSQL help re: TIME_WAIT

From

Tom Lane

Date:

28 January 2010, 16:55:56

"Reggie Euser" <reggie@busicast.com> writes:
> Zombie PostgreSQL processes in a "TIME_WAIT" state are consuming all
> available sockets on a web server I'm running. I've Googled & RTFM'ed but am
> still stumped.  Sure would appreciate any ideas.

That seems a bit confused.  There's no such thing as a "process in a
TIME_WAIT state".  A TCP network socket could be in TIME_WAIT but
it's not a process, and certainly not zombie.  Please be a little
clearer.

In general, sockets sitting a long time in TIME_WAIT would be a network
problem.  That state means the user process already closed the socket
and the network stack is waiting for the other end to acknowledge
connection closure.  If it's not getting the ACK then you have either
buggy network code in one kernel or the other, or a network-level
problem (maybe an overaggressive firewall in between?).

Postgres processes sitting in zombie state would indicate that the
postmaster has somehow gotten wedged and is failing to notice its
dead children.  That shouldn't happen really --- are you still able
to make connections to the database?  It doesn't seem like there'd be
any direct linkage between that and a network problem, but ...

            regards, tom lane

Re: CentOS & PostgreSQL help re: TIME_WAIT

From

"Kevin Grittner"

Date:

28 January 2010, 17:04:21

Tom Lane <tgl@sss.pgh.pa.us> wrote:

> In general, sockets sitting a long time in TIME_WAIT would be a
> network problem.  That state means the user process already closed
> the socket and the network stack is waiting for the other end to
> acknowledge connection closure.  If it's not getting the ACK then
> you have either buggy network code in one kernel or the other, or
> a network-level problem (maybe an overaggressive firewall in
> between?).

Not to discount those possibilities, but I've seen one other cause:
a storm of connection attempts.  It could be a DoS attack or a
poorly written client.

-Kevin

Re: CentOS & PostgreSQL help re: TIME_WAIT

From

"Reggie Euser"

Date:

29 January 2010, 11:14:41

Thanks to both of you. Kevin, it's not a DoS, I'm certain; client problem,
maybe .

Tom - sorry for my confusion. I'm chasing the network/firewall possibilities
as most likely causes.  PostgreSQL is running quite smoothly.

FWIW, what little I know about PostgreSQL, I've picked up by using it,
reading documentation on the web and, most helpfully, reading your comments.
Many thanks again.

----- Original Message -----
From: "Kevin Grittner" <Kevin.Grittner@wicourts.gov>
>
> Not to discount those possibilities, but I've seen one other cause:
> a storm of connection attempts.  It could be a DoS attack or a
> poorly written client.
>
> -Kevin

>"Reggie Euser" <reggie@busicast.com> writes:
>> Zombie PostgreSQL processes in a "TIME_WAIT" state are consuming all
>> available sockets on a web server I'm running. I've Googled & RTFM'ed but
>> am
> >still stumped.  Sure would appreciate any ideas.

That seems a bit confused.  There's no such thing as a "process in a
TIME_WAIT state".  A TCP network socket could be in TIME_WAIT but
it's not a process, and certainly not zombie.  Please be a little
clearer.

In general, sockets sitting a long time in TIME_WAIT would be a network
problem.  That state means the user process already closed the socket
and the network stack is waiting for the other end to acknowledge
connection closure.  If it's not getting the ACK then you have either
buggy network code in one kernel or the other, or a network-level
problem (maybe an overaggressive firewall in between?).

Postgres processes sitting in zombie state would indicate that the
postmaster has somehow gotten wedged and is failing to notice its
dead children.  That shouldn't happen really --- are you still able
to make connections to the database?  It doesn't seem like there'd be
any direct linkage between that and a network problem, but ...

regards, tom lane

Re: CentOS & PostgreSQL help re: TIME_WAIT

From

Renato Oliveira

Date:

29 January 2010, 11:17:12

Do you have an F5 load balancer in front of these web servers?

Renato Oliveira
Systems Administrator
e-mail: renato.oliveira@grant.co.uk

Tel: +44 (0)1763 260811
Fax: +44 (0)1763 262410
http://www.grant.co.uk/

Grant Instruments (Cambridge) Ltd

Company registered in England, registration number 658133

Registered office address:
29 Station Road,
Shepreth,
CAMBS SG8 6GB
UK

-----Original Message-----

From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Reggie Euser
Sent: 29 January 2010 15:14
To: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] CentOS & PostgreSQL help re: TIME_WAIT

Thanks to both of you. Kevin, it's not a DoS, I'm certain; client problem,
maybe .

Tom - sorry for my confusion. I'm chasing the network/firewall possibilities
as most likely causes.  PostgreSQL is running quite smoothly.

FWIW, what little I know about PostgreSQL, I've picked up by using it,
reading documentation on the web and, most helpfully, reading your comments.
Many thanks again.

----- Original Message -----
From: "Kevin Grittner" <Kevin.Grittner@wicourts.gov>
>
> Not to discount those possibilities, but I've seen one other cause:
> a storm of connection attempts.  It could be a DoS attack or a
> poorly written client.
>
> -Kevin

>"Reggie Euser" <reggie@busicast.com> writes:
>> Zombie PostgreSQL processes in a "TIME_WAIT" state are consuming all
>> available sockets on a web server I'm running. I've Googled & RTFM'ed but
>> am
> >still stumped.  Sure would appreciate any ideas.

That seems a bit confused.  There's no such thing as a "process in a
TIME_WAIT state".  A TCP network socket could be in TIME_WAIT but
it's not a process, and certainly not zombie.  Please be a little
clearer.

In general, sockets sitting a long time in TIME_WAIT would be a network
problem.  That state means the user process already closed the socket
and the network stack is waiting for the other end to acknowledge
connection closure.  If it's not getting the ACK then you have either
buggy network code in one kernel or the other, or a network-level
problem (maybe an overaggressive firewall in between?).

Postgres processes sitting in zombie state would indicate that the
postmaster has somehow gotten wedged and is failing to notice its
dead children.  That shouldn't happen really --- are you still able
to make connections to the database?  It doesn't seem like there'd be
any direct linkage between that and a network problem, but ...

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

-----Original Message-----

P Please consider the environment before printing this email
CONFIDENTIALITY: The information in this e-mail and any attachments is confidential. It is intended only for the named
recipients(s).If you are not the named recipient please notify the sender immediately and do not disclose the contents
toanother person or take copies. 

VIRUSES: The contents of this e-mail or attachment(s) may contain viruses which could damage your own computer system.
WhilstGrant Instruments (Cambridge) Ltd has taken every reasonable precaution to minimise this risk, we cannot accept
liabilityfor any damage which you sustain as a result of software viruses. You should therefore carry out your own
viruschecks before opening the attachment(s). 

OpenXML: For information about the OpenXML file format in use within Grant Instruments please visit our
http://www.grant.co.uk/Support/openxml.html

Re: CentOS & PostgreSQL help re: TIME_WAIT

From

Greg Stark

Date:

29 January 2010, 14:30:51

On Thu, Jan 28, 2010 at 8:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In general, sockets sitting a long time in TIME_WAIT would be a network
> problem.  That state means the user process already closed the socket
> and the network stack is waiting for the other end to acknowledge
> connection closure.

I think you're describing FIN_WAIT. TIME_WAIT is after the finack has
been sent and the connection is well and truly dead. The same
host/port pair can't be reused for 2*msl in case the finack needs to
be resent or a duplicate fin arrives.

Normally the server uses SO_LINGER and skips TIME_WAIT so it can
listen on the same port immediately and accept more connections.  So
only the client enters TIME_WAIT and its for some random high-numbered
port that the OS won't hand out until it expires.

If you're seeing the postgres *server* with sockets in TIME_WAIT state
for port 5432 or whatever your postgres port is then I think that's a
bug and you should report it in more detail. please send the output of
netstat -an or whatever data you have showing the problem.

If you're seeing the web server's outgoing ports in TIME_WAIT state
then I think that's normal and shouldn't be causing you problems.

--
greg

Re: CentOS & PostgreSQL help re: TIME_WAIT

From

Tom Lane

Date:

29 January 2010, 15:26:43

Greg Stark <gsstark@mit.edu> writes:
> On Thu, Jan 28, 2010 at 8:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In general, sockets sitting a long time in TIME_WAIT would be a network
>> problem. �That state means the user process already closed the socket
>> and the network stack is waiting for the other end to acknowledge
>> connection closure.

> I think you're describing FIN_WAIT. TIME_WAIT is after the finack has
> been sent and the connection is well and truly dead. The same
> host/port pair can't be reused for 2*msl in case the finack needs to
> be resent or a duplicate fin arrives.

> Normally the server uses SO_LINGER and skips TIME_WAIT so it can
> listen on the same port immediately and accept more connections.  So
> only the client enters TIME_WAIT and its for some random high-numbered
> port that the OS won't hand out until it expires.

Hmm.  This may well be a platform-specific behavior of the network
stack.  On my mail machine (running a pretty ancient HPUX release)
I can normally see a dozen or so inbound connections to port 25
that are sitting in TIME_WAIT state, probably reflecting spambots
that didn't bother to close the connection gracefully.  It's possible
that a newer network stack would bypass that state.  But in any case
the userland process has definitely dropped the connection, right?
So it's not Postgres' issue, it's a networking issue.

            regards, tom lane

Re: CentOS & PostgreSQL help re: TIME_WAIT

From

"Reggie Euser"

Date:

29 January 2010, 18:21:14

No...

----- Original Message -----
From: "Renato Oliveira" <renato.oliveira@grant.co.uk>
To: "Reggie Euser" <reggie@busicast.com>; <pgsql-admin@postgresql.org>
Sent: Friday, January 29, 2010 10:15 AM
Subject: Re: [ADMIN] CentOS & PostgreSQL help re: TIME_WAIT

Do you have an F5 load balancer in front of these web servers?

Renato Oliveira
Systems Administrator
e-mail: renato.oliveira@grant.co.uk

Tel: +44 (0)1763 260811
Fax: +44 (0)1763 262410
http://www.grant.co.uk/

Grant Instruments (Cambridge) Ltd

Company registered in England, registration number 658133

Registered office address:
29 Station Road,
Shepreth,
CAMBS SG8 6GB
UK

-----Original Message-----

From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Reggie Euser
Sent: 29 January 2010 15:14
To: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] CentOS & PostgreSQL help re: TIME_WAIT

Thanks to both of you. Kevin, it's not a DoS, I'm certain; client problem,
maybe .

Tom - sorry for my confusion. I'm chasing the network/firewall possibilities
as most likely causes.  PostgreSQL is running quite smoothly.

FWIW, what little I know about PostgreSQL, I've picked up by using it,
reading documentation on the web and, most helpfully, reading your comments.
Many thanks again.

----- Original Message -----
From: "Kevin Grittner" <Kevin.Grittner@wicourts.gov>
>
> Not to discount those possibilities, but I've seen one other cause:
> a storm of connection attempts.  It could be a DoS attack or a
> poorly written client.
>
> -Kevin

>"Reggie Euser" <reggie@busicast.com> writes:
>> Zombie PostgreSQL processes in a "TIME_WAIT" state are consuming all
>> available sockets on a web server I'm running. I've Googled & RTFM'ed but
>> am
> >still stumped.  Sure would appreciate any ideas.

That seems a bit confused.  There's no such thing as a "process in a
TIME_WAIT state".  A TCP network socket could be in TIME_WAIT but
it's not a process, and certainly not zombie.  Please be a little
clearer.

In general, sockets sitting a long time in TIME_WAIT would be a network
problem.  That state means the user process already closed the socket
and the network stack is waiting for the other end to acknowledge
connection closure.  If it's not getting the ACK then you have either
buggy network code in one kernel or the other, or a network-level
problem (maybe an overaggressive firewall in between?).

Postgres processes sitting in zombie state would indicate that the
postmaster has somehow gotten wedged and is failing to notice its
dead children.  That shouldn't happen really --- are you still able
to make connections to the database?  It doesn't seem like there'd be
any direct linkage between that and a network problem, but ...

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

-----Original Message-----

P Please consider the environment before printing this email
CONFIDENTIALITY: The information in this e-mail and any attachments is
confidential. It is intended only for the named recipients(s). If you are
not the named recipient please notify the sender immediately and do not
disclose the contents to another person or take copies.

VIRUSES: The contents of this e-mail or attachment(s) may contain viruses
which could damage your own computer system. Whilst Grant Instruments
(Cambridge) Ltd has taken every reasonable precaution to minimise this risk,
we cannot accept liability for any damage which you sustain as a result of
software viruses. You should therefore carry out your own virus checks
before opening the attachment(s).

OpenXML: For information about the OpenXML file format in use within Grant
Instruments please visit our http://www.grant.co.uk/Support/openxml.html

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin