BUG #5465: dblink TCP connection hangs blocking translation from being terminated - Mailing list pgsql-bugs

From Valentine Gogichashvili
Subject BUG #5465: dblink TCP connection hangs blocking translation from being terminated
Date
Msg-id 201005190910.o4J9AT3A063957@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #5465: dblink TCP connection hangs blocking translation from being terminated  (Magnus Hagander <magnus@hagander.net>)
List pgsql-bugs
The following bug has been logged online:

Bug reference:      5465
Logged by:          Valentine Gogichashvili
Email address:      valgog@gmail.com
PostgreSQL version: 8.2.1
Operating system:   Red Hat 3.4.6-3 (kernel 2.6.9-42.0.3.ELsmp)
Description:        dblink TCP connection hangs blocking translation from
being terminated
Details:

Hi all,

we have an issue on our productive server. A stored procedure, that uses
dblink to get some data from the remote database hangs not responding to
kill signal and holds several locks on some tables as well as an advisory
lock. So I have this transaction to be completed in order to have a
possibility to operate the database normally.

It was exactly on the time, that the procedure was accessing remote
database, the machine hosting this remote database had a panic attack and
rebooted. But the ESTABLISHED connection is still hanging on the production
database machine:

$ netstat | grep remote_db_host
tcp 0 0 production_db_host:60248 remote_db_host:postgres ESTABLISHED

$ lsof | grep remote_db_host
postgres 1365 postgres 199u IPv4 23003779784 TCP
production_db_host:60248->remote_db_host:postgres (ESTABLISHED)

On the database session list one can see the hanging transaction:

production_db=# select procpid, now() - query_start as running, waiting,
substr(current_query,1,120) as current_query from pg_stat_activity where
current_query not like '%----STATQ-----%' and current_query != '<IDLE>'
order by query_start desc;
 procpid |        running         | waiting |
                                 current_query
---------+------------------------+---------+-------------------------------
----------------------------------------------------------------------------
------------------------------------
    1365 | 2 days 00:17:57.992004 | f       | SELECT * FROM
get_remote_data()

It seems like the dblink is waiting for the connection to be closed or
reseted and also makes the hole transaction hang not processing kill
signals.

Does the dblink TCP connection have any timeout?

How would it be possible to shutdown the DB in case this session process is
not responding to normal kill signals? Will it hinder the database from
shutting down normally? My previous experience with issuing immediate stops
or killing with -9 had been quite catastrophic and I could not start the DB
afterwards. What would you suggest in this case?

With best regards,

-- Valentine Gogichashvili

pgsql-bugs by date:

Previous
From: Craig Ringer
Date:
Subject: Re: BUG #5463: incorrect password when downloading/installing
Next
From: "Michael Enke"
Date:
Subject: BUG #5464: ecpg on 64bit system converts "long long" to "long"