Home > mailing lists

Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away - Mailing list pgsql-hackers

From	Bharath Rupireddy
Subject	Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Date	September 17, 2020 09:44:20
Msg-id	CALj2ACX3hVTge3p4sKQj5+EDPbD7KEhaPS4b2WpLb0BAXWfCdw@mail.gmail.com Whole thread Raw
In response to	Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses	Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
List	pgsql-hackers

Tree view

Thanks for the review comments. I will post a new patch soon
addressing all the comments.

On Tue, Sep 15, 2020 at 2:49 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> +                       PQreset(entry->conn);
>
> Isn't using PQreset() to reconnect to the foreign server unsafe?
> When new connection is established, some SET commands seem
> to need to be issued like configure_remote_session() does.
> But PQreset() doesn't do that at all.
>

This is a good catch. Thanks, I missed this point. Indeed we need to
set the session params. We can do this in two ways: 1. use
configure_remote_session() after PQreset(). 2. use connect_pg_server()
instead of PQreset() and configure_remote_session(). One problem I see
with the 2nd way is that we will be doing the checks that are being
performed in connect_pg_server() twice, which we would have done for
the first time before retrying.  The required parameters such as
keywords, values, are still in entry->conn structure from the first
attempt, which can safely be used by PQreset(). So, I will go with the
1st way. Thoughts?

>
> Originally when GetConnection() establishes new connection,
> it resets all transient state fields, to be sure all are clean (as the
> comment says). With the patch, even when reconnecting
> the remote server, shouldn't we do the same, for safe?
>

I guess there is no need for us to reset all the transient state
before we do begin_remote_xact() in the 2nd attempt. We retry the
connection only when entry->xact_depth <= 0 i.e. beginning of the
remote txn and the begin_remote_xact() doesn't modify any transient
state if entry->xact_depth <= 0, except for entry->changing_xact_state
= true; all other transient state is intact in entry structure. In the
error case, we will not reach the code after do_sql_command in
begin_remote_xact(). If needed, we can only set
entry->changing_xact_state to false which is set to true before
do_sql_command().

        entry->changing_xact_state = true;
        do_sql_command(entry->conn, sql);
        entry->xact_depth = 1;
        entry->changing_xact_state = false;

>
> +                       PGresult *res = NULL;
> +                       res = PQgetResult(entry->conn);
> +                       PQclear(res);
> Are these really necessary? I was just thinking that's not because
> pgfdw_get_result() and pgfdw_report_error() seem to do that
> already in do_sql_command().
>

If an error occurs in the first attempt, we return from
pgfdw_get_result()'s  if (!PQconsumeInput(conn)) to the catch block we
added and pgfdw_report_error() will never get called. And the below
part of the code is reached only in scenarios as mentioned in the
comments. Removing this might have problems if we receive errors other
than CONNECTION_BAD or for subtxns. We could clear if any result and
just rethrow the error upstream. I think no problem having this code
here.

        else
        {
            /*
             * We are here, due to either some error other than CONNECTION_BAD
             * occurred or connection may have broken during start of a
             * subtransacation. Just, clear off any result, try rethrowing the
             * error, so that it will be caught appropriately.
             */
            PGresult *res = NULL;
            res = PQgetResult(entry->conn);
            PQclear(res);
            PG_RE_THROW();
        }

>
> +               /* Start a new transaction or subtransaction if needed. */
> +               begin_remote_xact(entry);
>
> Even when there is no cached connection and new connection is made,
> then if begin_remote_xact() reports an error, another new connection
> tries to be made again. This behavior looks a bit strange.
>

When there is no cached connection, we try to acquire one, if we
can't, then no error will be thrown to the user, just we retry one
more time. If we get in the 2nd attempt, fine, if not, we will throw
the error to the user. Assume in the 1st attempt the remote server is
unreachable, we may hope to connect in the 2nd attempt. I think there
is no problem here.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: David Rowley
Date: 17 September 2020, 09:43:37
Subject: Re: Force update_process_title=on in crash recovery?

From: Amit Kapila
Date: 17 September 2020, 09:56:47
Subject: Re: Global snapshots

Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away - Mailing list pgsql-hackers

Previous

Next