Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect - Mailing list pgsql-committers

From Fujii Masao
Subject Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect
Date
Msg-id 53c3deb8-666d-fc4c-f0b5-e8c397bc6250@oss.nttdata.com
Whole thread Raw
In response to Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-committers

On 2020/10/11 9:16, Tom Lane wrote:
> Fujii Masao <masao.fujii@oss.nttdata.com> writes:
>>>> Therefore, the easy fix is to make libpq mark the connection as
>>>> CONNECTION_BAD even in ECONNABORTED, like we do in ECONNRESET.
> 
> So in the wake of commit fe27009cb,

Many thanks for the commit fe27009cb !!


> this is what lorikeet is doing:
> 
> @@ -9028,9 +9028,7 @@
>   CALL terminate_backend_and_wait('fdw_retry_check');
>   SAVEPOINT s;
>   SELECT 1 FROM ft1 LIMIT 1;    -- should fail
> -ERROR:  server closed the connection unexpectedly
> -    This probably means the server terminated abnormally
> -    before or while processing the request.
> +ERROR:  could not receive data from server: Software caused connection abort
>   CONTEXT:  remote SQL command: SAVEPOINT s2
>   COMMIT;
>   -- Clean up
> 
> which is better --- the connection recovery is working --- but
> obviously still not a "pass".
> 
> The only way to make this test pass as-is is to hack libpq so that
> it deems ECONNABORTED to be a server crash, which frankly I do not
> think is a good change.  I don't see any principled reason to think
> that it means that rather than a network connectivity failure.
> 
> What I've done instead as a stopgap is to adjust the test case to be
> insensitive to the exact error message text.

For now I've not come up with better idea than current this fix.


> Maybe there's a better
> way, but I can't think of anything besides having two (or more?)
> expected-output files.  That would be quite unpalatable as things
> stand, though perhaps we could make it tolerable by splitting this
> test off into a second test script.
> 
> Meanwhile, now that I've looked at commit 32a9c0bdf, I'm not very
> happy with it:
> 
> * The control flow seems rather forced.  I think it was designed
> on the assumption that reindenting the existing code is forbidden.
> Why not use an actual loop, instead of a goto?  I also think that
> it's far from obvious that the loop isn't an infinite loop; it
> took me quite a while to glom onto the fact that the test inside the
> PG_CATCH avoids that by checking retry_conn.  Maybe a comment would
> be enough to improve that, but I feel the control logic could stand
> a rethink.

Isn't it simpler and easier-to-read to just reestablish new connection again
in the retry case instead of using a loop because we retry only once?
Patch attached.


> 
> * The CATCH case is completely failing to clean up after itself.
> At the minimum, there has to be a FlushErrorState() there.
> I don't think it'd be terribly hard to drive this code to an
> error-stack-overflow PANIC.

You're right. Sorry I forgot to take into consideration this :(
I fixed this issue in the attached patch.


> 
> * As is so often true of proposed patches in which PG_CATCH is
> thought to be an adequate way to catch an error, this is just
> unbelievably fragile.  It will break, and badly so, if it catches
> an error that is anything other than what it is expecting ...
> and it's not even particularly trying to verify that the error is
> what it's expecting.  It might be okay, or at least a little bit
> harder to break, if it verified that the error's SQLSTATE is
> ERRCODE_CONNECTION_FAILURE.

"PQstatus()==CONNECTION_BAD" was checked for that purpose. But that's not enough?
Anyway, in the patch, I changed the code so that sqlstate==ERRCODE_CONNECTION_FAILURE
is checked.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

pgsql-committers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: pgsql: Create ResultRelInfos later in InitPlan, index them by RT index.
Next
From: Tom Lane
Date:
Subject: Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect