> > You could get a backend's PID using PQbackendPID and then kill it by > calling pg_terminate_backend() to kill the remote backend to automate > scenario of remote backend being killed. >
I already added the test case in v2 patch itself(added one more test case in v3 patch), using the similar approach.
> > > For, one of the Ashutosh's review comments from [3] "the fact that the > > same connection may be used by multiple plan nodes", I tried to have > > few use cases where there exist joins on two foreign tables on the > > same remote server, in a single query, so essentially, the same > > connection was used for multiple plan nodes. In this case we avoid > > retrying for the second GetConnection() request for the second foreign > > table, with the check entry->xact_depth <= 0 , xact_depth after the > > first GetConnection() and the first remote query will become 1 and we > > don't hit the retry logic and seems like we are safe here. Please add > > If I'm missing something here. > > > > Request the community to consider the patch for further review if the > > overall idea seems beneficial. > > I think this idea will be generally useful if your work on dropping > stale connection uses idle_connection_timeout or something like that > on the remote server.
Assuming we use idle_connection_timeout or some other means(as it is not yet finalized, I will try to respond in that mail chain) to drop stale/idle connections from the local backend, I think we have two options 1) deleting that cached entry from the connection cache entirely using disconnect_pg_server() and hash table remove. This frees up some space and we don't have to deal with the connection invalidations and don't have to bother on resetting cached entry's other parameters such as xact_depth, have_prep_stmt etc. 2) or we could just drop the connections using disconnect_pg_server(), retain the hash entry, reset other parameters, and deal with the invalidations. This is like, we maintain unnecessary info in the cached entry, where we actually don't have a connection at all and keep holding some space for cached entry.
IMO, if we go with option 1, then it will be good.
Anyways, this connection retry feature will not have any dependency on the idle_connection_timeout or dropping stale connection feature, because there can be many reasons where remote backends go away/get killed.
If I'm not sidetracking - if we use something like idle_session_timeout [1] on the remote server, this retry feature will be very useful.
> > About the patch. It seems we could just catch the error from > begin_remote_xact() in GetConnection() and retry connection if the > error is "bad connection". Retrying using PQreset() might be better > than calling PQConnect* always. >
Thanks for the comment, it made life easier. Added the patch with the changes. Please take a look at the v3 patch and let me know if still something needs to be done.