Thread: BUG #5465: dblink TCP connection hangs blocking translation from being terminated

Hi

In our setup, we make use extensively of dblink.
Due to the fact that some queries take some time to complete and that the link
is over the internet, sometime the server process (the transaction that runs
the dblink queries) hangs when the link goes down, keeping locks on several
records plus some advisory locks and thus freezing the whole (most of) the
database.

What I have found is this bug, that is remarkably similar (if not identical)
with what we are experiencing.


http://postgresql.1045698.n5.nabble.com/BUG-5465-dblink-TCP-connection-hangs-blocking-translation-from-being-terminated-td2132419.html#a2132420

The bug dates from may 2010 and no update since.

One of the comments states that there is work done for version 9.0 ...but I
haven't seen anything in the changelog related to this in any version starting
with the one we are using (9.1.12).

<quote>
I believe this is a known issue in dblink, where it's not possible to
cancel it when it's waiting in the TCP layer in the kernel.
Unfortunately, there is no fix ATM - there was some work towards it
for 9.0 at one point, but I think this is actually the first real
bug-report on the issue...
</quote>

Has there been any progress in this direction?

Thank you!

--
Robert Voinea
Software Engineer
+4 0740 467 262

Don't take life too seriously. You'll never get out of it alive.
(Elbert Hubbard)
Robert Voinea <rvoinea@gmail.com> writes:
> In our setup, we make use extensively of dblink.
> Due to the fact that some queries take some time to complete and that the link
> is over the internet, sometime the server process (the transaction that runs
> the dblink queries) hangs when the link goes down, keeping locks on several
> records plus some advisory locks and thus freezing the whole (most of) the
> database.

> What I have found is this bug, that is remarkably similar (if not identical)
> with what we are experiencing.
>
http://postgresql.1045698.n5.nabble.com/BUG-5465-dblink-TCP-connection-hangs-blocking-translation-from-being-terminated-td2132419.html#a2132420

That does not sound like a Postgres bug to me.  What you are unhappy about
is that the kernel isn't timing out a lost TCP connection more quickly.
The default timeout is long (>1 hour probably), but that's required by
Internet standards.  The appropriate fix for this is to use aggressive
keepalive parameters on the connection.  You can set libpq's keepalive
parameters in the connection string given to dblink.

            regards, tom lane
Hi

On Friday 14 March 2014 09:45:22 Tom Lane wrote:
> Robert Voinea <rvoinea@gmail.com> writes:
> > In our setup, we make use extensively of dblink.
> > Due to the fact that some queries take some time to complete and that the
> > link is over the internet, sometime the server process (the transaction
> > that runs the dblink queries) hangs when the link goes down, keeping
> > locks on several records plus some advisory locks and thus freezing the
> > whole (most of) the database.
> >
> > What I have found is this bug, that is remarkably similar (if not
> > identical) with what we are experiencing.
> > http://postgresql.1045698.n5.nabble.com/BUG-5465-dblink-TCP-connection-han
> > gs-blocking-translation-from-being-terminated-td2132419.html#a2132420
> That does not sound like a Postgres bug to me.  What you are unhappy about
> is that the kernel isn't timing out a lost TCP connection more quickly.
> The default timeout is long (>1 hour probably), but that's required by
> Internet standards.  The appropriate fix for this is to use aggressive
> keepalive parameters on the connection.  You can set libpq's keepalive
> parameters in the connection string given to dblink.
>
>             regards, tom lane

I seem to have missed those parameters... and the fact that you actually need
keep-alive on both client AND server, not only on the server.

Thank you!

--
Robert Voinea
Software Engineer
+4 0740 467 262

Don't take life too seriously. You'll never get out of it alive.
(Elbert Hubbard)