Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away |
Date | |
Msg-id | CALj2ACUAi23vf1WiHNar_LksM9EDOWXcbHCo-fD4Mbr1d=78YQ@mail.gmail.com Whole thread Raw |
Responses |
Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away |
List | pgsql-hackers |
Hi, Currently with the postgres_fdw remote connections cached in the local backend, the queries that use the cached connections from local backend will not check whether the remote backend is killed or gone away, and it goes tries to submit the query and fails if the remote backend is killed. This problem was found during the discussions made in [1]. One way, we could solve the above problem is that, upon firing the new foreign query from local backend using the cached connection, (assuming the remote backend that was cached in the local backed got killed for some reasons), instead of failing the query in the local backend, upon detecting error from the remote backend, we could just delete the cached old entry and try getting another connection to remote backend, cache it and proceed to submit the query. This has to happen only at the beginning of remote xact. This way, instead of the query getting failed, the query succeeds if the local backend is able to get a new remote backend connection. Attaching the patch that implements the above retry mechanism. The way I tested the patch: 1. select * from foreign_tbl; /*from local backend - this results in a remote connection being cached in the postgres_fdw connection cache and a remote backend is opened.*/ 2. (intentionally) kill the remote backend, just to simulate the scenario. 3. select * from foreign_tbl; /*from local backend - without patch this throws error "ERROR: server closed the connection unexpectedly". with patch - try to use the cached connection at the beginning of remote xact, upon receiving error from remote postgres server, instead of aborting the query, delete the cached entry, try to get a new connection, if it gets, caches it and uses that for executing the query, query succeeds. I couldn't think of adding a test case to the existing postgres_fdw regression test suite with an automated scenario of the remote backend getting killed. I would like to thank Ashutosh Bapat (ashutosh.bapat.oss@gmail.com) for the suggestion to fix this and the review of my initial patch attached in [2]. I tried to address the review comments provided on my initial patch [3]. For, one of the Ashutosh's review comments from [3] "the fact that the same connection may be used by multiple plan nodes", I tried to have few use cases where there exist joins on two foreign tables on the same remote server, in a single query, so essentially, the same connection was used for multiple plan nodes. In this case we avoid retrying for the second GetConnection() request for the second foreign table, with the check entry->xact_depth <= 0 , xact_depth after the first GetConnection() and the first remote query will become 1 and we don't hit the retry logic and seems like we are safe here. Please add If I'm missing something here. Request the community to consider the patch for further review if the overall idea seems beneficial. [1] https://www.postgresql.org/message-id/CAExHW5t21B_XPQy_hownm1Qq%3DhMrgOhX%2B8gDj3YEKFvpk%3DVRgw%40mail.gmail.com [2] https://www.postgresql.org/message-id/CALj2ACXp6DQ3iLGx5g%2BLgVtGwC4F6K9WzKQJpyR4FfdydQzC_g%40mail.gmail.com [3] https://www.postgresql.org/message-id/CAExHW5u3Gyv6Q1BEr6zMg0t%2B59e8c4KMfKVrV3Z%3D4UKKjJ19nQ%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: