Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Date
Msg-id CAExHW5vn_JpBOB8Xr1y=w=W1U9Um6UX-X_GPS2+Vd15vBwNrfA@mail.gmail.com
Whole thread Raw
In response to Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
List pgsql-hackers
On Wed, Jul 8, 2020 at 6:10 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> I couldn't think of adding a test case to the existing postgres_fdw
> regression test suite with an automated scenario of the remote backend
> getting killed.

You could get a backend's PID using PQbackendPID and then kill it by
calling pg_terminate_backend() to kill the remote backend to automate
scenario of remote backend being killed.

>
> I would like to thank Ashutosh Bapat (ashutosh.bapat.oss@gmail.com)
> for the suggestion to fix this and the review of my initial patch
> attached in [2]. I tried to address the review comments provided on my
> initial patch [3].
>
> For, one of the Ashutosh's review comments from [3] "the fact that the
> same connection may be used by multiple plan nodes", I tried to have
> few use cases where there exist joins on two foreign tables on the
> same remote server, in a single query, so essentially, the same
> connection was used for multiple plan nodes. In this case we avoid
> retrying for the second GetConnection() request for the second foreign
> table, with the check entry->xact_depth <= 0 , xact_depth after the
> first GetConnection() and the first remote query will become 1 and we
> don't hit the retry logic and seems like we are safe here. Please add
> If I'm missing something here.
>
> Request the community to consider the patch for further review if the
> overall idea seems beneficial.

I think this idea will be generally useful if your work on dropping
stale connection uses idle_connection_timeout or something like that
on the remote server.

About the patch. It seems we could just catch the error from
begin_remote_xact() in GetConnection() and retry connection if the
error is "bad connection". Retrying using PQreset() might be better
than calling PQConnect* always.


>
> [1]
https://www.postgresql.org/message-id/CAExHW5t21B_XPQy_hownm1Qq%3DhMrgOhX%2B8gDj3YEKFvpk%3DVRgw%40mail.gmail.com
> [2]  https://www.postgresql.org/message-id/CALj2ACXp6DQ3iLGx5g%2BLgVtGwC4F6K9WzKQJpyR4FfdydQzC_g%40mail.gmail.com
> [3]  https://www.postgresql.org/message-id/CAExHW5u3Gyv6Q1BEr6zMg0t%2B59e8c4KMfKVrV3Z%3D4UKKjJ19nQ%40mail.gmail.com
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com



--
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Bug with indexes on whole-row expressions
Next
From: Amit Kapila
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions