On Thu, 2009-07-30 at 11:41 +0200, Greg Stark wrote:
> I know this is a popular feeling. But you're throwing away decades of
> work in making TCP reliable. You would change feelings quickly if you
> ever faced this scenario too. All it takes is some bad memory or a bad
> wire and you would be turning a performance drain into random
> connection drops.
But if I get bad memory or bad wire I'll get much worse problems
already, and don't tell me it will work more reliably if you don't kill
the connection. It's a lot better to find out sooner that you have those
problems and fix them than having spurious errors which you'll get even
if you don't kill the connection in case of such problems.
> Well it ought to have eventually died. Your patience may have ran out
> before the keep-alive timeouts fired though.
Well it lived for at least one hour (could be more, I don't remember for
sure) keeping vacuum from doing it's job on a heavily updated DB. It was
not so much about my patience as about starting to have abysmal
performance, AFTER we fixed the initial cause of the crash, and without
any warning, except of course I did find out immediately that bloat
happens and found the idle transactions and killed them, but I imagine
the hair-pulling for a less experienced postgres DBA. I would have also
preferred that postgres solves this issue on it's own - the network
stack is clearly not fast enough in resolving it.
Cheers,
Csaba.