Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away - Mailing list pgsql-hackers

From Kasahara Tatsuhito
Subject Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Date
Msg-id CAP0=ZV+X_f-T6F+H_dO_QypJMkrn5NGfijHTScPG8CDDOC6tzA@mail.gmail.com
Whole thread Raw
In response to Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
List pgsql-hackers
Hi,

On Wed, Jul 8, 2020 at 9:40 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> One way, we could solve the above problem is that, upon firing the new
> foreign query from local backend using the cached connection,
> (assuming the remote backend that was cached in the local backed got
> killed for some reasons), instead of failing the query in the local
> backend, upon detecting error from the remote backend, we could just
> delete the cached old entry and try getting another connection to
> remote backend, cache it and proceed to submit the query. This has to
> happen only at the beginning of remote xact.
+1.

I think this is a very useful feature.
In an environment with connection pooling for local, if a remote
server has a failover or switchover,
this feature would prevent unexpected errors of local queries after
recovery of the remote server.

I haven't looked at the code in detail yet, some comments here.

1. To keep the previous behavior (and performance), how about allowing
the user to specify
   whether or not to retry as a GUC parameter or in the FOREIGN SERVER OPTION?

2. How about logging a LOG message when retry was success to let us know
   the retry feature worked or how often the retries worked ?

> I couldn't think of adding a test case to the existing postgres_fdw
> regression test suite with an automated scenario of the remote backend
> getting killed.

Couldn't you confirm this by adding a test case like the following?
===================================================
BEGIN;
-- Generate a connection to remote
SELECT * FROM ft1 LIMIT 1;

-- retrieve pid of postgres_fdw and kill it
-- could use the other unique identifier (not postgres_fdw but
fdw_retry_check, etc ) for application name
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE
backend_type = 'client backend' AND application_name = 'postgres_fdw'

-- COMMIT, so next query will should success if connection-retry works
COMMIT;
SELECT * FROM ft1 LIMIT 1;
===================================================

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com



pgsql-hackers by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: Postgres is not able to handle more than 4k tables!?
Next
From: Jeff Davis
Date:
Subject: Re: Default setting for enable_hashagg_disk