Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect - Mailing list pgsql-committers

From Fujii Masao
Subject Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect
Date
Msg-id 78d509fe-f87a-7046-672a-f86cb6f6957d@oss.nttdata.com
Whole thread Raw
In response to Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-committers

On 2020/10/07 22:25, Fujii Masao wrote:
> 
> 
> On 2020/10/07 12:54, Fujii Masao wrote:
>>
>>
>> On 2020/10/07 11:13, Michael Paquier wrote:
>>> Hi Fujii-san,
>>>
>>> On Tue, Oct 06, 2020 at 01:52:55AM +0000, Fujii Masao wrote:
>>>> postgres_fdw: reestablish new connection if cached one is detected as broken.
>>>>
>>>> In postgres_fdw, once remote connections are established, they are cached
>>>> and re-used for subsequent queries and transactions. There can be some
>>>> cases where those cached connections are unavaiable, for example,
>>>> by the restart of remote server. In these cases, previously an error was
>>>> reported and the query accessing to remote server failed if new remote
>>>> transaction failed to start because the cached connection was broken.
>>>>
>>>> This commit improves postgres_fdw so that new connection is remade
>>>> if broken connection is detected when starting new remote transaction.
>>>> This is useful to avoid unnecessary failure of queries when connection is
>>>> broken but can be reestablished.
>>>
>>> lorikeet is telling that the test introduced by this commit is
>>> unstable:
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2020-10-06%2008%3A28%3A36
>>
>> Thanks for letting me know this!
>>
>>>
>>> Some details:
>>>   BEGIN;
>>>   SELECT 1 FROM ft1 LIMIT 1;
>>> - ?column?
>>> -----------
>>> -        1
>>> -(1 row)
>>> -
>>> +ERROR:  could not receive data from server: Software caused connection abort
>>> +CONTEXT:  remote SQL command: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
>>
>> This error means that new connection was successfully reestablished
>> after the cached connection was terminated, and then the above connection
>> error occurred when issuing "START TRANSACTION" command on that
>> new connection. There seems no suspicious relevant log messages in the
>> logfile. So I'm not sure why this error happened, yet.
>>
>> Per the previous discusson at [1], lorikeet sometimes seems to cause
>> connection-relation failure in the regression test. So the cause of error
>> that we faced today also may be lorikeet itself.
> 
> Since it's not good to keep the buildfarm member red, I will revert
> the commit unless I come up with something even after further
> investigation.
> 
> My current just guess is that PQstatus(conn) doesn't indicate
> CONNECTION_BAD when the above error occurs, and which
> prevents new connection from being reestablished because of
> the following check.
> 
> +        if (PQstatus(entry->conn) != CONNECTION_BAD ||
> +            entry->xact_depth > 0 ||
> +            retry_conn)
> +            PG_RE_THROW();

The error message in discussion is reported when recv() fails and
errno=ECONNABORTED. As far as I read the code, pqReadData() marks
the connection as CONNECTION_BAD when errno=ECONNRESET,
but not when errno=ECONNABORTED. So since PQstatus(entry->conn)
doesn't indicate CONNECTION_BAD in ECONNABORTED case,
the above check is passed through, an error is re-thrown and
new connection is not reestablished.

Therefore, the easy fix is to make libpq mark the connection as
CONNECTION_BAD even in ECONNABORTED, like we do in ECONNRESET.
But is it safe to do that? Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-committers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect
Next
From: Tom Lane
Date:
Subject: pgsql: Rethink recent fix for pg_dump's handling of extension config ta