Re: stopping processes, preventing connections - Mailing list pgsql-general

From Greg Smith
Subject Re: stopping processes, preventing connections
Date
Msg-id 4BA11C76.8070605@2ndquadrant.com
Whole thread Raw
In response to Re: stopping processes, preventing connections  (Herouth Maoz <herouth@unicell.co.il>)
Responses Re: stopping processes, preventing connections
List pgsql-general
Herouth Maoz wrote:
> Aren't socket writes supposed to have time outs of some sort? Stupid policies notwithstanding, processes on the
clientside can disappear for any number of reasons - bugs, power failures, whatever - and this is not something that is
supposedto cause a backend to hang, I would assume. 
>

Note that you're not in the PostgreSQL code at the point where this is
stuck at--you're deep in the libc socket code.  Making sure that sockets
will always have well behaved behavior at the OS level is not always
possible, due to the TPC/IP's emphasis on robust delivery.  See section
2.8 "Why does it take so long to detect that the peer died?" at
http://www.faqs.org/faqs/unix-faq/socket/ for some background here, and
note that the point you're stuck in is inside of keepalive handling in
the database trying to do the right thing here.

As a general commentary on this area, in most cases where I've seen an
unkillable backend, which usually becomes noticed when the server won't
shutdown, have resulted from bad socket behavior.  It's really a tricky
area to get right, and presuming the database backends will be robust in
the case of every possible weird OS behavior is hard to guarantee.

However, if you can repeatably get the server into this bad state at
will, it may be worth spending some more time digging into this in hopes
there is something valuable to learn about your situation that can
improve the keepalive handling on the server side.  Did you mention your
PostgreSQL server version and platform?  I didn't see the exact code
path you're stuck in during a quick look at the code involved (using a
snapshot of recent development), which makes me wonder if this isn't
already a resolved problem in a newer version.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: stopping processes, preventing connections
Next
From: Stuart McGraw
Date:
Subject: building a c function