Thread: [PATCH] postgres_fdw connection caching - cause remote sessionslinger till the local session exit

Hi,

When a query on foreign table is executed from a local session using
postgres_fdw, as expected the local postgres backend opens a
connection which causes a remote session/backend to be opened on the
remote postgres server for query execution.

One observation is that, even after the query is finished, the remote
session/backend still persists on the remote postgres server. Upon
researching, I found that there is a concept of Connection Caching for
the remote connections made using postgres_fdw. Local backend/session
can cache up to 8 different connections per backend. This caching is
useful as it avoids the cost of reestablishing new connections per
foreign query.

However, at times, there may be situations where the long lasting
local sessions may execute very few foreign queries and remaining all
are local queries, in this scenario, the remote sessions opened by the
local sessions/backends may not be useful as they remain idle and eat
up the remote server connections capacity. This problem gets even
worse(though this use case is a bit imaginary) if all of
max_connections(default 100 and each backend caching 8 remote
connections) local sessions open remote sessions and they are cached
in the local backend.

I propose to have a new session level GUC called
"enable_connectioncache"(name can be changed if it doesn't correctly
mean the purpose) with the default value being true which means that
all the remote connections are cached. If set to false, the
connections are not cached and so are remote sessions closed by the local backend/session at
the end of each remote transaction.

Attached the initial patch(based on commit
9550ea3027aa4f290c998afd8836a927df40b09d), test setup.

Another approach to solve this problem could be that (based on Robert's
idea[1]) automatic clean up of cache entries, but letting users decide
on caching also seems to be good.

Please note that documentation is still pending.

Thoughts?

Test Case:
without patch:
1. Run the query on foreign table
2. Look for the backend/session opened on the remote postgres server, it exists till the local session remains active.

with patch:
1. SET enable_connectioncache TO false;
2. Run the query on the foreign table
3. Look for the backend/session opened on the remote postgres server, it should not exist.

[1] - https://www.postgresql.org/message-id/CA%2BTgmob_ksTOgmbXhno%2Bk5XXPOK%2B-JYYLoU3MpXuutP4bH7gzA%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachment
On Mon, Jun 22, 2020 at 11:26 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> When a query on foreign table is executed from a local session using
> postgres_fdw, as expected the local postgres backend opens a
> connection which causes a remote session/backend to be opened on the
> remote postgres server for query execution.
>
> One observation is that, even after the query is finished, the remote
> session/backend still persists on the remote postgres server. Upon
> researching, I found that there is a concept of Connection Caching for
> the remote connections made using postgres_fdw. Local backend/session
> can cache up to 8 different connections per backend. This caching is
> useful as it avoids the cost of reestablishing new connections per
> foreign query.
>
> However, at times, there may be situations where the long lasting
> local sessions may execute very few foreign queries and remaining all
> are local queries, in this scenario, the remote sessions opened by the
> local sessions/backends may not be useful as they remain idle and eat
> up the remote server connections capacity. This problem gets even
> worse(though this use case is a bit imaginary) if all of
> max_connections(default 100 and each backend caching 8 remote
> connections) local sessions open remote sessions and they are cached
> in the local backend.
>
> I propose to have a new session level GUC called
> "enable_connectioncache"(name can be changed if it doesn't correctly
> mean the purpose) with the default value being true which means that
> all the remote connections are cached. If set to false, the
> connections are not cached and so are remote sessions closed by the local backend/session at
> the end of each remote transaction.
>
> Attached the initial patch(based on commit
> 9550ea3027aa4f290c998afd8836a927df40b09d), test setup.

Few comments:

 #backend_flush_after = 0               # measured in pages, 0 disables
-
+#enable_connectioncache = on
This guc could be  placed in CONNECTIONS AND AUTHENTICATION section.

+
+       /* see if the cache was for postgres_fdw connections and
+          user chose to disable connection caching*/
+       if ((strcmp(hashp->tabname,"postgres_fdw connections") == 0) &&
+               !enable_connectioncache)

Should be changed to postgres style commenting like:
/*
 * See if the cache was for postgres_fdw connections and
 * user chose to disable connection caching.
 */

+       /* if true, fdw connections in a session are cached, else
+          discarded at the end of every remote transaction.
+       */
+       bool        enableconncache;
Should be changed to postgres style commenting.

+/* parameter for enabling fdw connection hashing */
+bool   enable_connectioncache = true;
+

Should this be connection caching?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



On Mon, Jun 22, 2020 at 11:26 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> Attached the initial patch(based on commit
> 9550ea3027aa4f290c998afd8836a927df40b09d), test setup.
>

make check is failing
sysviews.out     2020-06-27 07:22:32.162146320 +0530
@@ -73,6 +73,7 @@
               name              | setting
 --------------------------------+---------
  enable_bitmapscan              | on
+ enable_connectioncache         | on

one of the test expect files needs to be updated.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



On Sun, Jun 21, 2020 at 10:56 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
When a query on foreign table is executed from a local session using
postgres_fdw, as expected the local postgres backend opens a
connection which causes a remote session/backend to be opened on the
remote postgres server for query execution.

[...]
 
I propose to have a new session level GUC called
"enable_connectioncache"(name can be changed if it doesn't correctly
mean the purpose) with the default value being true which means that
all the remote connections are cached. If set to false, the
connections are not cached and so are remote sessions closed by the local backend/session at
the end of each remote transaction.

[...] 
Thoughts?

Test Case:
without patch:
1. Run the query on foreign table
2. Look for the backend/session opened on the remote postgres server, it exists till the local session remains active.

with patch:
1. SET enable_connectioncache TO false;
2. Run the query on the foreign table
3. Look for the backend/session opened on the remote postgres server, it should not exist.

If this is just going to apply to postgres_fdw why not just have that module provide a function "disconnect_open_sessions()" or the like that does this upon user command?  I suppose there would be some potential value to having this be set per-user but that wouldn't preclude the usefulness of a function.   And by having a function the usefulness of the GUC seems reduced.  On a related note is there any entanglement here with the supplied dblink and/or dblink_fdw [1] modules as they do provide connect and disconnect functions and also leverages postgres_fdw (or dblink_fdw if specified, which brings us back to the question of whether this option should be respected by that FDW).

Otherwise, I would imagine that having multiple queries execute before wanting to drop the connection would be desirable so at minimum a test case that does something like:

SELECT count(*) FROM remote.tbl1;
-- connection still open
SET enable_connectioncache TO false;
SELECT count(*) FROM remote.tbl2;
-- now it was closed

Or maybe even better, have the close action happen on a transaction boundary.

And if it doesn't just apply to postgres_fdw (or at least doesn't have to) then the description text should be less specific.

David J.

[1] The only place I see "dblink_fdw" in the documentation is in the dblink module's dblink_connect page.  I would probably modify that page to say:
"It is recommended to use the foreign-data wrapper dblink_fdw (installed by this module) when defining the foreign server." (adding the parenthetical).

Thanks for the responses.

>
> If this is just going to apply to postgres_fdw why not just have that module provide a function
"disconnect_open_sessions()"or the like that does this upon user command?  I suppose there would be some potential
valueto having this be set per-user but that wouldn't preclude the usefulness of a function.   And by having a function
theusefulness of the GUC seems reduced. 
>

The idea of having module-specific functions to remove cached entries
seems like a good idea. Users have to frequently call these functions
to clean up the cached entries in a long lasting single session. This
may not
be always possible if these sessions are from an application not from
a psql-like client which is a more frequent scenario in the customer
use cases. In this case users might have to change their application
code that is
issuing queries to postgres server to include these functions.

Assuming the fact that the server/session configuration happens much
before the user application starts to submit actual database queries,
having a GUC just helps to avoid making such function calls in between
the session, by having to set the GUC either to true if required to
cache connections or to off if not to cache connections.

>
> On a related note is there any entanglement here with the supplied dblink and/or dblink_fdw [1] modules as they do
provideconnect and disconnect functions and also leverages postgres_fdw (or dblink_fdw if specified, which brings us
backto the question of whether this option should be respected by that FDW). 
>

I found that dblink also has the connection caching concept and it
does provide a user a function to disconnect/remove cached connections
using a function, dblink_disconnect() using connection name as it's
input.
IMO, this solution seems a bit problematic as explained in my first
response in this mail.
The postgres_fdw connection caching and dblink connection caching has
no link at all. Please point me if I'm missing anything here.
But probably, this GUC can be extended from a bool to an enum of type
config_enum_entry and use it for dblink as well. This is extensible as
well. Please let me know if this is okay, so that I can code for it.

>
> Otherwise, I would imagine that having multiple queries execute before wanting to drop the connection would be
desirableso at minimum a test case that does something like: 
>
> SELECT count(*) FROM remote.tbl1;
> -- connection still open
> SET enable_connectioncache TO false;
> SELECT count(*) FROM remote.tbl2;
> -- now it was closed
>
> Or maybe even better, have the close action happen on a transaction boundary.
>

This is a valid scenario, as the same connection can be used in the
same transaction multiple times. With my attached initial patch above
the point is already covered. The decision to cache or not cache the
connection happens at the main transaction end i.e. in
pgfdw_xact_callback().

>
> And if it doesn't just apply to postgres_fdw (or at least doesn't have to) then the description text should be less
specific.
>

If we are agreed on a generic GUC for postgres_fdw, dblink and so on.
I will change the description and documentation accordingly.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Mon, 22 Jun 2020 at 14:56, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> When a query on foreign table is executed from a local session using
> postgres_fdw, as expected the local postgres backend opens a
> connection which causes a remote session/backend to be opened on the
> remote postgres server for query execution.
>
> One observation is that, even after the query is finished, the remote
> session/backend still persists on the remote postgres server. Upon
> researching, I found that there is a concept of Connection Caching for
> the remote connections made using postgres_fdw. Local backend/session
> can cache up to 8 different connections per backend. This caching is
> useful as it avoids the cost of reestablishing new connections per
> foreign query.
>
> However, at times, there may be situations where the long lasting
> local sessions may execute very few foreign queries and remaining all
> are local queries, in this scenario, the remote sessions opened by the
> local sessions/backends may not be useful as they remain idle and eat
> up the remote server connections capacity. This problem gets even
> worse(though this use case is a bit imaginary) if all of
> max_connections(default 100 and each backend caching 8 remote
> connections) local sessions open remote sessions and they are cached
> in the local backend.
>
> I propose to have a new session level GUC called
> "enable_connectioncache"(name can be changed if it doesn't correctly
> mean the purpose) with the default value being true which means that
> all the remote connections are cached. If set to false, the
> connections are not cached and so are remote sessions closed by the local backend/session at
> the end of each remote transaction.

I've not looked at your patch deeply but if this problem is talking
only about postgres_fdw I think we should improve postgres_fdw, not
adding a GUC to the core. It’s not that all FDW plugins use connection
cache and postgres_fdw’s connection cache is implemented within
postgres_fdw, I think we should focus on improving postgres_fdw. I
also think it’s not a good design that the core manages connections to
remote servers connected via FDW. I wonder if we can add a
postgres_fdw option for this purpose, say keep_connection [on|off].
That way, we can set it per server so that remote connections to the
particular server don’t remain idle.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Tue, Jun 30, 2020 at 12:23:28PM +0900, Masahiko Sawada wrote:
> > I propose to have a new session level GUC called
> > "enable_connectioncache"(name can be changed if it doesn't correctly
> > mean the purpose) with the default value being true which means that
> > all the remote connections are cached. If set to false, the
> > connections are not cached and so are remote sessions closed by the local backend/session at
> > the end of each remote transaction.
> 
> I've not looked at your patch deeply but if this problem is talking
> only about postgres_fdw I think we should improve postgres_fdw, not
> adding a GUC to the core. It’s not that all FDW plugins use connection
> cache and postgres_fdw’s connection cache is implemented within
> postgres_fdw, I think we should focus on improving postgres_fdw. I
> also think it’s not a good design that the core manages connections to
> remote servers connected via FDW. I wonder if we can add a
> postgres_fdw option for this purpose, say keep_connection [on|off].
> That way, we can set it per server so that remote connections to the
> particular server don’t remain idle.

I thought we would add a core capability, idle_session_timeout, which
would disconnect idle sessions, and the postgres_fdw would use that.  We
have already had requests for idle_session_timeout, but avoided it
because it seemed better to tell people to monitor pg_stat_activity and
terminate sessions that way, but now that postgres_fdw needs it too,
there might be enough of a requirement to add it.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee






On Tue, Jun 30, 2020 at 8:54 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
On Mon, 22 Jun 2020 at 14:56, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> When a query on foreign table is executed from a local session using
> postgres_fdw, as expected the local postgres backend opens a
> connection which causes a remote session/backend to be opened on the
> remote postgres server for query execution.
>
> One observation is that, even after the query is finished, the remote
> session/backend still persists on the remote postgres server. Upon
> researching, I found that there is a concept of Connection Caching for
> the remote connections made using postgres_fdw. Local backend/session
> can cache up to 8 different connections per backend. This caching is
> useful as it avoids the cost of reestablishing new connections per
> foreign query.
>
> However, at times, there may be situations where the long lasting
> local sessions may execute very few foreign queries and remaining all
> are local queries, in this scenario, the remote sessions opened by the
> local sessions/backends may not be useful as they remain idle and eat
> up the remote server connections capacity. This problem gets even
> worse(though this use case is a bit imaginary) if all of
> max_connections(default 100 and each backend caching 8 remote
> connections) local sessions open remote sessions and they are cached
> in the local backend.
>
> I propose to have a new session level GUC called
> "enable_connectioncache"(name can be changed if it doesn't correctly
> mean the purpose) with the default value being true which means that
> all the remote connections are cached. If set to false, the
> connections are not cached and so are remote sessions closed by the local backend/session at
> the end of each remote transaction.

I've not looked at your patch deeply but if this problem is talking
only about postgres_fdw I think we should improve postgres_fdw, not
adding a GUC to the core. It’s not that all FDW plugins use connection
cache and postgres_fdw’s connection cache is implemented within
postgres_fdw, I think we should focus on improving postgres_fdw. I
also think it’s not a good design that the core manages connections to
remote servers connected via FDW. I wonder if we can add a
postgres_fdw option for this purpose, say keep_connection [on|off].
That way, we can set it per server so that remote connections to the
particular server don’t remain idle.

 
+1

I have not looked at the implementation, but I agree that here problem
is with postgres_fdw so we should try to solve that by keeping it limited
to postgres_fdw.   I liked the idea of passing it as an option to the FDW
connection.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




--
Rushabh Lathia
>
> I've not looked at your patch deeply but if this problem is talking
> only about postgres_fdw I think we should improve postgres_fdw, not
> adding a GUC to the core. It’s not that all FDW plugins use connection
> cache and postgres_fdw’s connection cache is implemented within
> postgres_fdw, I think we should focus on improving postgres_fdw. I
> also think it’s not a good design that the core manages connections to
> remote servers connected via FDW. I wonder if we can add a
> postgres_fdw option for this purpose, say keep_connection [on|off].
> That way, we can set it per server so that remote connections to the
> particular server don’t remain idle.
>

If I understand it correctly, your suggestion is to add
keep_connection option and use that while defining the server object.
IMO having keep_connection option at the server object level may not
serve the purpose being discussed here.
For instance, let's say I create a foreign server in session 1 with
keep_connection on, and I want to use that
server object in session 2 with keep_connection off and session 3 with
keep_connection on and so on.
One way we can change the server's keep_connection option is to alter
the server object, but that's not a good choice,
as we have to alter it at the system level.

Overall, though we define the server object in a single session, it
will be used in multiple sessions, having an
option at the per-server level would not be a good idea.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com





On Wed, Jul 1, 2020 at 2:45 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> I've not looked at your patch deeply but if this problem is talking
> only about postgres_fdw I think we should improve postgres_fdw, not
> adding a GUC to the core. It’s not that all FDW plugins use connection
> cache and postgres_fdw’s connection cache is implemented within
> postgres_fdw, I think we should focus on improving postgres_fdw. I
> also think it’s not a good design that the core manages connections to
> remote servers connected via FDW. I wonder if we can add a
> postgres_fdw option for this purpose, say keep_connection [on|off].
> That way, we can set it per server so that remote connections to the
> particular server don’t remain idle.
>

If I understand it correctly, your suggestion is to add
keep_connection option and use that while defining the server object.
IMO having keep_connection option at the server object level may not
serve the purpose being discussed here.
For instance, let's say I create a foreign server in session 1 with
keep_connection on, and I want to use that
server object in session 2 with keep_connection off and session 3 with
keep_connection on and so on.

In my opinion, in such cases, one needs to create two server object one with
keep-connection ON and one with keep-connection off.  And need to decide
to use appropriate for the particular session.
 
One way we can change the server's keep_connection option is to alter
the server object, but that's not a good choice,
as we have to alter it at the system level.

Overall, though we define the server object in a single session, it
will be used in multiple sessions, having an
option at the per-server level would not be a good idea.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




--
Rushabh Lathia
>
> I thought we would add a core capability, idle_session_timeout, which
> would disconnect idle sessions, and the postgres_fdw would use that.  We
> have already had requests for idle_session_timeout, but avoided it
> because it seemed better to tell people to monitor pg_stat_activity and
> terminate sessions that way, but now that postgres_fdw needs it too,
> there might be enough of a requirement to add it.
>

If we were to use idle_session_timeout (from patch [1]) for the remote
session to go off without
having to delete the corresponding entry from local connection cache and
after that if we submit foreign query from local session, then below
error would occur,
which may not be an expected behaviour. (I took the patch from [1] and
intentionally set the
idle_session_timeout to a low value on remote server, issued a
foreign_tbl query which
caused remote session to open and after idle_session_timeout , the
remote session
closes and now issue the foreign_tbl query from local session)

postgres=# SELECT * FROM foreign_tbl;
ERROR: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
CONTEXT: remote SQL command: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
postgres=#

Another way is that if we are thinking to use idle_session_timeout
infra on the local postgres server to remove cached entries
from the local connection cache, then the question arises:

do we intend to use the same configuration parameter value set for
idle_session_timeout for connection cache as well?
Probably not, as we might use different values for different purposes
of the same idle_session_timeout parameter,
let's say 2000sec for idle_session_timeout and 1000sec for connection
cache cleanup.

[1] - https://www.postgresql.org/message-id/763A0689-F189-459E-946F-F0EC4458980B%40hotmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

On Wed, Jul 1, 2020 at 3:33 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
>
>
>
> On Wed, Jul 1, 2020 at 2:45 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> >
>> > I've not looked at your patch deeply but if this problem is talking
>> > only about postgres_fdw I think we should improve postgres_fdw, not
>> > adding a GUC to the core. It’s not that all FDW plugins use connection
>> > cache and postgres_fdw’s connection cache is implemented within
>> > postgres_fdw, I think we should focus on improving postgres_fdw. I
>> > also think it’s not a good design that the core manages connections to
>> > remote servers connected via FDW. I wonder if we can add a
>> > postgres_fdw option for this purpose, say keep_connection [on|off].
>> > That way, we can set it per server so that remote connections to the
>> > particular server don’t remain idle.
>> >
>>
>> If I understand it correctly, your suggestion is to add
>> keep_connection option and use that while defining the server object.
>> IMO having keep_connection option at the server object level may not
>> serve the purpose being discussed here.
>> For instance, let's say I create a foreign server in session 1 with
>> keep_connection on, and I want to use that
>> server object in session 2 with keep_connection off and session 3 with
>> keep_connection on and so on.
>
>
> In my opinion, in such cases, one needs to create two server object one with
> keep-connection ON and one with keep-connection off.  And need to decide
> to use appropriate for the particular session.
>
>>
>> One way we can change the server's keep_connection option is to alter
>> the server object, but that's not a good choice,
>> as we have to alter it at the system level.
>>
>> Overall, though we define the server object in a single session, it
>> will be used in multiple sessions, having an
>> option at the per-server level would not be a good idea.
>>
>> With Regards,
>> Bharath Rupireddy.
>> EnterpriseDB: http://www.enterprisedb.com
>>
>>
>
>
> --
> Rushabh Lathia



On Wed, Jul 1, 2020 at 3:54 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

> If we were to use idle_session_timeout (from patch [1]) for the remote
> session to go off without
> having to delete the corresponding entry from local connection cache and
> after that if we submit foreign query from local session, then below
> error would occur,
> which may not be an expected behaviour. (I took the patch from [1] and
> intentionally set the
> idle_session_timeout to a low value on remote server, issued a
> foreign_tbl query which
> caused remote session to open and after idle_session_timeout , the
> remote session
> closes and now issue the foreign_tbl query from local session)
>
> postgres=# SELECT * FROM foreign_tbl;
> ERROR: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> CONTEXT: remote SQL command: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
> postgres=#

This is actually strange. AFAIR the code, without looking at the
current code, when a query picks a foreign connection it checks its
state. It's possible that the connection has not been marked bad by
the time you fire new query. If the problem exists probably we should
fix it anyway since the backend at the other end of the connection has
higher chances of being killed while the connection was sitting idle
in the cache.

-- 
Best Wishes,
Ashutosh Bapat



On Wed, 1 Jul 2020 at 18:14, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> >
> > I've not looked at your patch deeply but if this problem is talking
> > only about postgres_fdw I think we should improve postgres_fdw, not
> > adding a GUC to the core. It’s not that all FDW plugins use connection
> > cache and postgres_fdw’s connection cache is implemented within
> > postgres_fdw, I think we should focus on improving postgres_fdw. I
> > also think it’s not a good design that the core manages connections to
> > remote servers connected via FDW. I wonder if we can add a
> > postgres_fdw option for this purpose, say keep_connection [on|off].
> > That way, we can set it per server so that remote connections to the
> > particular server don’t remain idle.
> >
>
> If I understand it correctly, your suggestion is to add
> keep_connection option and use that while defining the server object.
> IMO having keep_connection option at the server object level may not
> serve the purpose being discussed here.
> For instance, let's say I create a foreign server in session 1 with
> keep_connection on, and I want to use that
> server object in session 2 with keep_connection off and session 3 with
> keep_connection on and so on.
> One way we can change the server's keep_connection option is to alter
> the server object, but that's not a good choice,
> as we have to alter it at the system level.

Is there use-case in practice where different backends need to have
different connection cache setting even if all of them connect the
same server? I thought since the problem that this feature is trying
to resolve is not to  eat up the remote server connections capacity by
disabling connection cache, we’d like to disable connection cache to
the particular server, for example, which sets low max_connections.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



> > If we were to use idle_session_timeout (from patch [1]) for the remote
> > session to go off without
> > having to delete the corresponding entry from local connection cache and
> > after that if we submit foreign query from local session, then below
> > error would occur,
> > which may not be an expected behaviour. (I took the patch from [1] and
> > intentionally set the
> > idle_session_timeout to a low value on remote server, issued a
> > foreign_tbl query which
> > caused remote session to open and after idle_session_timeout , the
> > remote session
> > closes and now issue the foreign_tbl query from local session)
> >
> > postgres=# SELECT * FROM foreign_tbl;
> > ERROR: server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > CONTEXT: remote SQL command: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
> > postgres=#
>
> This is actually strange. AFAIR the code, without looking at the
> current code, when a query picks a foreign connection it checks its
> state. It's possible that the connection has not been marked bad by
> the time you fire new query. If the problem exists probably we should
> fix it anyway since the backend at the other end of the connection has
> higher chances of being killed while the connection was sitting idle
> in the cache.
>

Thanks Ashutosh for the suggestion. One way, we could solve the above
problem is that, upon firing the new foreign query from local backend using cached
connection, (assuming the remote backend/session that was cached in the local backed got
killed by some means), instead of failing the query in the local backend/session, upon
detecting error from remote backend, we could just delete the cached old entry and try getting another
connection to remote backend/session, cache it and proceed to submit the query. This has to happen only at
the beginning of remote xact.

This way, instead of failing(as mentioned above " server closed the connection unexpectedly"),
the query succeeds if the local session is able to get a new remote backend connection.

I worked on a POC patch to prove the above point. Attaching the patch.
Please note that, the patch doesn't contain comments and has some issues like having some new
variable in PGconn structure and the things like.

If the approach makes some sense, then I can rework properly on the patch and probably
can open another thread for the review and other stuff.

The way I tested the patch:

1. select * from foreign_tbl;
/*from local session - this results in a
remote connection being cached in
the connection cache and
a remote backend/session is opened.
*/
2. kill the remote backend/session
3. select * from foreign_tbl;
/*from local session - without patch
this throws error "ERROR: server closed the connection unexpectedly"
with path - try to use
the cached connection at the beginning of remote xact, upon receiving
error from remote postgres
server, instead of aborting the query, delete the cached entry, try to
get a new connection, if it
gets, cache it and use that for executing the query, query succeeds.
*/

With Regards,
Bharath Rupireddy.


On Wed, Jul 1, 2020 at 7:13 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
On Wed, 1 Jul 2020 at 18:14, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> >
> > I've not looked at your patch deeply but if this problem is talking
> > only about postgres_fdw I think we should improve postgres_fdw, not
> > adding a GUC to the core. It’s not that all FDW plugins use connection
> > cache and postgres_fdw’s connection cache is implemented within
> > postgres_fdw, I think we should focus on improving postgres_fdw. I
> > also think it’s not a good design that the core manages connections to
> > remote servers connected via FDW. I wonder if we can add a
> > postgres_fdw option for this purpose, say keep_connection [on|off].
> > That way, we can set it per server so that remote connections to the
> > particular server don’t remain idle.
> >
>
> If I understand it correctly, your suggestion is to add
> keep_connection option and use that while defining the server object.
> IMO having keep_connection option at the server object level may not
> serve the purpose being discussed here.
> For instance, let's say I create a foreign server in session 1 with
> keep_connection on, and I want to use that
> server object in session 2 with keep_connection off and session 3 with
> keep_connection on and so on.
> One way we can change the server's keep_connection option is to alter
> the server object, but that's not a good choice,
> as we have to alter it at the system level.

Is there use-case in practice where different backends need to have
different connection cache setting even if all of them connect the
same server? I thought since the problem that this feature is trying
to resolve is not to  eat up the remote server connections capacity by
disabling connection cache, we’d like to disable connection cache to
the particular server, for example, which sets low max_connections.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
>>
>> If I understand it correctly, your suggestion is to add
>> keep_connection option and use that while defining the server object.
>> IMO having keep_connection option at the server object level may not
>> serve the purpose being discussed here.
>> For instance, let's say I create a foreign server in session 1 with
>> keep_connection on, and I want to use that
>> server object in session 2 with keep_connection off and session 3 with
>> keep_connection on and so on.
>
> In my opinion, in such cases, one needs to create two server object one with
> keep-connection ON and one with keep-connection off.  And need to decide
> to use appropriate for the particular session.
>

Yes, having two variants of foreign servers: one with keep-connections
on (this can be default behavior,
even if user doesn't mention this option, internally it can be treated
as keep-connections on) ,
and if users need no connection hashing, another foreign server with
all other options same but keep-connections
off.

This looks okay to me, if we want to avoid a core session level GUC.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



> >
> > If I understand it correctly, your suggestion is to add
> > keep_connection option and use that while defining the server object.
> > IMO having keep_connection option at the server object level may not
> > serve the purpose being discussed here.
> > For instance, let's say I create a foreign server in session 1 with
> > keep_connection on, and I want to use that
> > server object in session 2 with keep_connection off and session 3 with
> > keep_connection on and so on.
> > One way we can change the server's keep_connection option is to alter
> > the server object, but that's not a good choice,
> > as we have to alter it at the system level.
>
> Is there use-case in practice where different backends need to have
> different connection cache setting even if all of them connect the
> same server?

Currently, connection cache exists at each backend/session level and
gets destroyed
on backend/session exit. I think the same cached connection can be
used until it gets invalidated
due to user mapping or server definition changes.

One way is to have a shared memory based connection cache instead of
backend level cache,
but it has its own problems, like maintenance, invalidation, dealing
with concurrent usages etc.

> I thought since the problem that this feature is trying
> to resolve is not to  eat up the remote server connections capacity by
> disabling connection cache, we’d like to disable connection cache to
> the particular server, for example, which sets low max_connections.
>

Currently, the user mapping oid acts as the key for the cache's hash
table, so the cache entries
are not made directly using foreign server ids though each entry would
have some information related
to foreign server.

Just to reiterate, the main idea if this feature  is to give the user
a way to choose, whether to use connection caching or not,
if he decides that his session uses remote queries very rarely, then
he can disable, or if the remote queries are more frequent in
a particular session, he can choose to use connection caching.

In a way, this feature addresses the point that local sessions not
eating up remote connections/sessions by
letting users decide(as users know better when to cache or when not
to) to cache or not cache the remote connections
and thus releasing them immediately if there is not much usage from
local session.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Thu, Jul 2, 2020 at 4:29 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > This is actually strange. AFAIR the code, without looking at the
> > current code, when a query picks a foreign connection it checks its
> > state. It's possible that the connection has not been marked bad by
> > the time you fire new query. If the problem exists probably we should
> > fix it anyway since the backend at the other end of the connection has
> > higher chances of being killed while the connection was sitting idle
> > in the cache.
> >
>
> Thanks Ashutosh for the suggestion. One way, we could solve the above
> problem is that, upon firing the new foreign query from local backend using cached
> connection, (assuming the remote backend/session that was cached in the local backed got
> killed by some means), instead of failing the query in the local backend/session, upon
> detecting error from remote backend, we could just delete the cached old entry and try getting another
> connection to remote backend/session, cache it and proceed to submit the query. This has to happen only at
> the beginning of remote xact.

Yes, I believe that would be good.

>
> This way, instead of failing(as mentioned above " server closed the connection unexpectedly"),
> the query succeeds if the local session is able to get a new remote backend connection.
>

In GetConnection() there's a comment
    /*
     * We don't check the health of cached connection here, because it would
     * require some overhead.  Broken connection will be detected when the
     * connection is actually used.
     */
Possibly this is where you want to check the health of connection when
it's being used the first time in a transaction.

> I worked on a POC patch to prove the above point. Attaching the patch.
> Please note that, the patch doesn't contain comments and has some issues like having some new
> variable in PGconn structure and the things like.

I don't think changing the PGConn structure for this is going to help.
It's a libpq construct and used by many other applications/tools other
than postgres_fdw. Instead you could use ConnCacheEntry for the same.
See how we track invalidated connection and reconnect upon
invalidation.

>
> If the approach makes some sense, then I can rework properly on the patch and probably
> can open another thread for the review and other stuff.
>
> The way I tested the patch:
>
> 1. select * from foreign_tbl;
> /*from local session - this results in a
> remote connection being cached in
> the connection cache and
> a remote backend/session is opened.
> */
> 2. kill the remote backend/session
> 3. select * from foreign_tbl;
> /*from local session - without patch
> this throws error "ERROR: server closed the connection unexpectedly"
> with path - try to use
> the cached connection at the beginning of remote xact, upon receiving
> error from remote postgres
> server, instead of aborting the query, delete the cached entry, try to
> get a new connection, if it
> gets, cache it and use that for executing the query, query succeeds.
> */

This will work. Be cognizant of the fact that the same connection may
be used by multiple plan nodes.


--
Best Wishes,
Ashutosh Bapat



On Wed, Jul 1, 2020 at 5:15 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> If I understand it correctly, your suggestion is to add
> keep_connection option and use that while defining the server object.
> IMO having keep_connection option at the server object level may not
> serve the purpose being discussed here.
> For instance, let's say I create a foreign server in session 1 with
> keep_connection on, and I want to use that
> server object in session 2 with keep_connection off and session 3 with
> keep_connection on and so on.
> One way we can change the server's keep_connection option is to alter
> the server object, but that's not a good choice,
> as we have to alter it at the system level.
>
> Overall, though we define the server object in a single session, it
> will be used in multiple sessions, having an
> option at the per-server level would not be a good idea.

You present this here as if it should be a Boolean (on or off) but I
don't see why that should be the case. You can imagine trying to close
connections if they have been idle for a certain length of time, or if
there are more than a certain number of them, rather than (or in
addition to) always/never. Which one is best, and why?

I tend to think this is better as an FDW property rather than a core
facility, but I'm not 100% sure of that and I think it likely depends
somewhat on the answers we choose to the questions in the preceding
paragraph.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



> > If I understand it correctly, your suggestion is to add
> > keep_connection option and use that while defining the server object.
> > IMO having keep_connection option at the server object level may not
> > serve the purpose being discussed here.
> > For instance, let's say I create a foreign server in session 1 with
> > keep_connection on, and I want to use that
> > server object in session 2 with keep_connection off and session 3 with
> > keep_connection on and so on.
> > One way we can change the server's keep_connection option is to alter
> > the server object, but that's not a good choice,
> > as we have to alter it at the system level.
> >
> > Overall, though we define the server object in a single session, it
> > will be used in multiple sessions, having an
> > option at the per-server level would not be a good idea.
>
> You present this here as if it should be a Boolean (on or off) but I
> don't see why that should be the case. You can imagine trying to close
> connections if they have been idle for a certain length of time, or if
> there are more than a certain number of them, rather than (or in
> addition to) always/never. Which one is best, and why?
>
If the cached connection idle time property is used (I'm thinking we
can define it per server object) then the local backend might have to
close the connections which are lying unused more than idle time. To
perform this task, the local backend might have to do it before it
goes into idle state(as suggested by you in [1]).  Please correct, if
my understanding/thinking is wrong here.

If the connection clean up is to be done by the local backend, then a
point can be - let say a local session initially issues few foreign
queries for which connections are cached, and it keeps executing all
local queries, without never going to idle mode(I think this scenario
looks too much impractical to me), then we may never clean the unused
cached connections. If this scenario is really impractical if we are
sure that there are high chances that the local backend goes to idle
mode, then the idea of having per-server-object idle time and letting
the local backend clean it up before it goes to idle mode looks great
to me.

>
> I tend to think this is better as an FDW property rather than a core
> facility, but I'm not 100% sure of that and I think it likely depends
> somewhat on the answers we choose to the questions in the preceding
> paragraph.
>
I completely agree on having it as a FDW property.

[1] - https://www.postgresql.org/message-id/CA%2BTgmob_ksTOgmbXhno%2Bk5XXPOK%2B-JYYLoU3MpXuutP4bH7gzA%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Wed, Jul 8, 2020 at 9:26 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> If the cached connection idle time property is used (I'm thinking we
> can define it per server object) then the local backend might have to
> close the connections which are lying unused more than idle time. To
> perform this task, the local backend might have to do it before it
> goes into idle state(as suggested by you in [1]).  Please correct, if
> my understanding/thinking is wrong here.
>
> If the connection clean up is to be done by the local backend, then a
> point can be - let say a local session initially issues few foreign
> queries for which connections are cached, and it keeps executing all
> local queries, without never going to idle mode(I think this scenario
> looks too much impractical to me), then we may never clean the unused
> cached connections. If this scenario is really impractical if we are
> sure that there are high chances that the local backend goes to idle
> mode, then the idea of having per-server-object idle time and letting
> the local backend clean it up before it goes to idle mode looks great
> to me.

If it just did it before going idle, then what about sessions that
haven't reached the timeout at the point when we go idle, but do reach
the timeout later? And how would the FDW get control at the right time
anyway?

> > I tend to think this is better as an FDW property rather than a core
> > facility, but I'm not 100% sure of that and I think it likely depends
> > somewhat on the answers we choose to the questions in the preceding
> > paragraph.
> >
> I completely agree on having it as a FDW property.

Right, but not everyone does. It looks to me like there have been
votes on both sides.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Thanks all for the ideas. There have been various points/approaches
discussed in the entire email chain so far.
I would like to summarize all of them here, so that we can agree on
one of the options and proceed further with this feature.

The issue this feature is trying to solve:
In postgres_fdw, rarely used remote connections lie ilde in the
connection cache(per backend) and so are remote sessions, for long
lasting local sessions which may unnecessarily eatup connections on
remote postgres servers.

Approach #1:
A new session level GUC (proposed name "enable_connectioncache"), when
set to true(which is by default) caches the remote connections
otherwise not. When set to false, everytime foreign query is issued a
new connection is made at the remote xact begin and dropped from the
connection cache at the remote xact end. This GUC applies to all the
foreign servers that are used in the session, it may not be possible
to have the control at the foreign server level. It may not be a good
idea to have postgres core controlling postgres_fdw property.

Approach #2:
A new postgres_fdw function, similar to dblink's dblink_disconnect(),
(possibly named postgres_fdw_disconnect_open_connections()). Seems
easy, but users have to frequently call this function to clean up the
cached entries. This may not be always possible, requires some sort of
monitoring and issuing this new disconnect function from in between
application code.

Approach #3:
A postgres_fdw foreign server level option: keep_connection(on/off).
When set to on (which is by default), caches the entries related to
that particular foreign server otherwise not. This gives control at
the foreign server level, which may not be possible with a single GUC.
It also addresses the concern that having postgres core solving
postgres_fdw problem. But, when the same foreign server is to be used
in multiple other sessions with different keep_connection
options(on/off), then a possible solution is to have two foreign
server definitions for the same server, one with keep_connection on
and another with off and use the foreign server accordingly and when
there is any change in other foreign server properties/options, need
to maintain the two versions of foreign servers.

Approach #4:
A postgres_fdw foreign server level option: connection idle time, the
amount of idle time for that server cached entry, after which the
cached entry goes away. Probably the backend, before itself going to
idle, has to be checking the cached entries and see if any of the
entries has timed out. One problem is that, if the backend just did it
before going idle, then what about sessions that haven't reached the
timeout at the point when we go idle, but do reach the timeout later?

I tried to summarize and put in the points in a concise manner,
forgive if I miss anything.

Thoughts?

Credits and thanks to: vignesh C, David G. Johnston, Masahiko Sawada,
Bruce Momjian, Rushabh Lathia, Ashutosh Bapat, Robert Haas.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Tue, Jul 14, 2020 at 6:09 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> Thanks all for the ideas. There have been various points/approaches
> discussed in the entire email chain so far.
> I would like to summarize all of them here, so that we can agree on
> one of the options and proceed further with this feature.

In my opinion, approach #2 seems easy to implement and it's hard to
imagine anyone finding much to complain about there, but it's not that
powerful either, because it isn't automatic. Now the other approaches
have to do with the way in which this should be controlled, and I
think there are two separate questions.

1. Should this be controlled by (a) a core GUC, (b) a postgres_fdw
GUC, (c) a postgres_fdw server-level option?
2. Should it be (a) a timeout or (b) a Boolean (keep vs. don't keep)?

With regard to #1, even if we decided on a core GUC, I cannot imagine
that we'd accept enable_connectioncache as a name, because most
enable_whatever GUCs are for the planner, and this is something else.
Also, underscores between some words but not all words is a lousy
convention; let's not do more of that. Apart from those points, I
don't have a strong opinion; other people might. With regard to #2, a
timeout seems a lot more powerful, but also harder to implement
because you'd need some kind of core changes to let the FDW get
control at the proper time. Maybe that's an argument for 2(b), but I
have a bit of a hard time believing that 2(b) will provide a good user
experience. I doubt that most people want to have to decide between
slamming the connection shut even if it's going to be used again
almost immediately and keeping it open until the end of time. Those
are two pretty extreme positions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Tue, Jul 14, 2020 at 03:38:49PM +0530, Bharath Rupireddy wrote:
> Approach #4:
> A postgres_fdw foreign server level option: connection idle time, the
> amount of idle time for that server cached entry, after which the
> cached entry goes away. Probably the backend, before itself going to
> idle, has to be checking the cached entries and see if any of the
> entries has timed out. One problem is that, if the backend just did it
> before going idle, then what about sessions that haven't reached the
> timeout at the point when we go idle, but do reach the timeout later?

Imagine implementing idle_in_session_timeout (which is useful on its
own), and then, when you connect to a foreign postgres_fdw server, you
set idle_in_session_timeout on the foreign side, and it just
disconnects/exits after an idle timeout.  There is nothing the sending
side has to do.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




On Tue, Jul 14, 2020 at 10:28 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Tue, Jul 14, 2020 at 03:38:49PM +0530, Bharath Rupireddy wrote:
> > Approach #4:
> > A postgres_fdw foreign server level option: connection idle time, the
> > amount of idle time for that server cached entry, after which the
> > cached entry goes away. Probably the backend, before itself going to
> > idle, has to be checking the cached entries and see if any of the
> > entries has timed out. One problem is that, if the backend just did it
> > before going idle, then what about sessions that haven't reached the
> > timeout at the point when we go idle, but do reach the timeout later?
>
> Imagine implementing idle_in_session_timeout (which is useful on its
> own), and then, when you connect to a foreign postgres_fdw server, you
> set idle_in_session_timeout on the foreign side, and it just
> disconnects/exits after an idle timeout.  There is nothing the sending
> side has to do.
>

Assuming we use idle_in_session_timeout on remote backends,  the
remote sessions will be closed after timeout, but the locally cached
connection cache entries still exist and become stale. The subsequent
queries that may use the cached connections will fail, of course these
subsequent queries can retry the connections only at the beginning of
a remote txn but not in the middle of a remote txn, as being discussed
in [1]. For instance, in a long running local txn, let say we used a
remote connection at the beginning of the local txn(note that it will
open a remote session and it's entry is cached in local connection
cache), only we use the cached connection later at some point in the
local txn, by then let say the idle_in_session_timeout has happened on
the remote backend and the remote session would have been closed. The
long running local txn will fail instead of succeeding. Isn't it a
problem here? Please correct me, If I miss anything.

IMHO, we are not fully solving the problem with
idle_in_session_timeout on remote backends though we are addressing
the main problem partly by letting the remote sessions close by
themselves.

[1] -
https://www.postgresql.org/message-id/flat/CALj2ACUAi23vf1WiHNar_LksM9EDOWXcbHCo-fD4Mbr1d%3D78YQ%40mail.gmail.com


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Mon, Aug 03, 2020 at 04:41:58PM +0530, Bharath Rupireddy wrote:
> IMHO, we are not fully solving the problem with
> idle_in_session_timeout on remote backends though we are addressing
> the main problem partly by letting the remote sessions close by
> themselves.

This patch fails to compile on Windows.  And while skimming through
the patch, I can see that you are including libpq-int.h in a place
different than src/interfaces/libpq/.  This is incorrect as it should
remain strictly as a header internal to libpq.
--
Michael

Attachment
On Tue, Sep 29, 2020 at 11:21 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Aug 03, 2020 at 04:41:58PM +0530, Bharath Rupireddy wrote:
> > IMHO, we are not fully solving the problem with
> > idle_in_session_timeout on remote backends though we are addressing
> > the main problem partly by letting the remote sessions close by
> > themselves.
>
> This patch fails to compile on Windows.  And while skimming through
> the patch, I can see that you are including libpq-int.h in a place
> different than src/interfaces/libpq/.  This is incorrect as it should
> remain strictly as a header internal to libpq.
>

Unfortunately, we have not arrived at a final solution yet, please
ignore this patch. I will post a new patch, once the solution is
finalized. I will move it to the next commit fest if okay.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Tue, Sep 29, 2020 at 11:29:45AM +0530, Bharath Rupireddy wrote:
> Unfortunately, we have not arrived at a final solution yet, please
> ignore this patch. I will post a new patch, once the solution is
> finalized. I will move it to the next commit fest if okay.

If you are planning to get that addressed, moving it to next CF is
fine by me.  Thanks for the update!
--
Michael

Attachment
Status update for a commitfest entry.

This thread was inactive for a while and from the latest messages, I see that the patch needs some further work.
So I move it to "Waiting on Author".

The new status of this patch is: Waiting on Author

Hi,

On 2020-11-06 18:56, Anastasia Lubennikova wrote:
> Status update for a commitfest entry.
> 
> This thread was inactive for a while and from the latest messages, I
> see that the patch needs some further work.
> So I move it to "Waiting on Author".
> 
> The new status of this patch is: Waiting on Author

I had a look on the initial patch and discussed options [1] to proceed 
with this issue. I agree with Bruce about idle_session_timeout, it would 
be a nice to have in-core feature on its own. However, this should be a 
cluster-wide option and it will start dropping all idle connection not 
only foreign ones. So it may be not an option for some cases, when the 
same foreign server is used for another load as well.

Regarding the initial issue I prefer point #3, i.e. foreign server 
option. It has a couple of benefits IMO: 1) it may be set separately on 
per foreign server basis, 2) it will live only in the postgres_fdw 
contrib without any need to touch core. I would only supplement this 
postgres_fdw foreign server option with a GUC, e.g. 
postgres_fdw.keep_connections, so one could easily define such behavior 
for all foreign servers at once or override server-level option by 
setting this GUC on per session basis.

Attached is a small POC patch, which implements this contrib-level 
postgres_fdw.keep_connections GUC. What do you think?

[1] 
https://www.postgresql.org/message-id/CALj2ACUFNydy0uo0JL9A1isHQ9pFe1Fgqa_HVanfG6F8g21nSQ%40mail.gmail.com


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company
Attachment
Thanks for the interest shown!

On Wed, Nov 18, 2020 at 1:07 AM Alexey Kondratov
<a.kondratov@postgrespro.ru> wrote:
>
> I had a look on the initial patch and discussed options [1] to proceed
> with this issue. I agree with Bruce about idle_session_timeout, it would
> be a nice to have in-core feature on its own. However, this should be a
> cluster-wide option and it will start dropping all idle connection not
> only foreign ones. So it may be not an option for some cases, when the
> same foreign server is used for another load as well.
>

With idle_session_timeout the remote idle backends may go away, part
of our problem is solved. But we also need to clear that connection
entry from the local backend's connection cache.

>
> Regarding the initial issue I prefer point #3, i.e. foreign server
> option. It has a couple of benefits IMO: 1) it may be set separately on
> per foreign server basis, 2) it will live only in the postgres_fdw
> contrib without any need to touch core. I would only supplement this
> postgres_fdw foreign server option with a GUC, e.g.
> postgres_fdw.keep_connections, so one could easily define such behavior
> for all foreign servers at once or override server-level option by
> setting this GUC on per session basis.
>

Below is what I have in my mind, mostly inline with yours:

a) Have a server level option (keep_connetion true/false, with the
default being true), when set to false the connection that's made with
this foreign server is closed and cached entry from the connection
cache is deleted at the end of txn in pgfdw_xact_callback.
b) Have postgres_fdw level GUC postgres_fdw.keep_connections default
being true. When set to false by the user, the connections, that are
used after this, are closed and removed from the cache at the end of
respective txns. If we don't use a connection that was cached prior to
the user setting the GUC as false,  then we may not be able to clear
it. We can avoid this problem by recommending users either to set the
GUC to false right after the CREATE EXTENSION postgres_fdw; or else
use the function specified in (c).
c) Have a new function that gets defined as part of CREATE EXTENSION
postgres_fdw;, say postgres_fdw_discard_connections(), similar to
dblink's dblink_disconnect(), which discards all the remote
connections and clears connection cache. And we can also have server
name as input to postgres_fdw_discard_connections() to discard
selectively.

Thoughts? If okay with the approach, I will start working on the patch.

>
> Attached is a small POC patch, which implements this contrib-level
> postgres_fdw.keep_connections GUC. What do you think?
>

I see two problems with your patch: 1) It just disconnects the remote
connection at the end of txn if the GUC is set to false, but it
doesn't remove the connection cache entry from ConnectionHash. 2) What
happens if there are some cached connections, user set the GUC to
false and not run any foreign queries or not use those connections
thereafter, so only the new connections will not be cached? Will the
existing unused connections still remain in the connection cache? See
(b) above for a solution.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On 2020-11-18 16:39, Bharath Rupireddy wrote:
> Thanks for the interest shown!
> 
> On Wed, Nov 18, 2020 at 1:07 AM Alexey Kondratov
> <a.kondratov@postgrespro.ru> wrote:
>> 
>> Regarding the initial issue I prefer point #3, i.e. foreign server
>> option. It has a couple of benefits IMO: 1) it may be set separately 
>> on
>> per foreign server basis, 2) it will live only in the postgres_fdw
>> contrib without any need to touch core. I would only supplement this
>> postgres_fdw foreign server option with a GUC, e.g.
>> postgres_fdw.keep_connections, so one could easily define such 
>> behavior
>> for all foreign servers at once or override server-level option by
>> setting this GUC on per session basis.
>> 
> 
> Below is what I have in my mind, mostly inline with yours:
> 
> a) Have a server level option (keep_connetion true/false, with the
> default being true), when set to false the connection that's made with
> this foreign server is closed and cached entry from the connection
> cache is deleted at the end of txn in pgfdw_xact_callback.
> b) Have postgres_fdw level GUC postgres_fdw.keep_connections default
> being true. When set to false by the user, the connections, that are
> used after this, are closed and removed from the cache at the end of
> respective txns. If we don't use a connection that was cached prior to
> the user setting the GUC as false,  then we may not be able to clear
> it. We can avoid this problem by recommending users either to set the
> GUC to false right after the CREATE EXTENSION postgres_fdw; or else
> use the function specified in (c).
> c) Have a new function that gets defined as part of CREATE EXTENSION
> postgres_fdw;, say postgres_fdw_discard_connections(), similar to
> dblink's dblink_disconnect(), which discards all the remote
> connections and clears connection cache. And we can also have server
> name as input to postgres_fdw_discard_connections() to discard
> selectively.
> 
> Thoughts? If okay with the approach, I will start working on the patch.
> 

This approach looks solid enough from my perspective to give it a try. I 
would only make it as three separate patches for an ease of further 
review.

>> 
>> Attached is a small POC patch, which implements this contrib-level
>> postgres_fdw.keep_connections GUC. What do you think?
>> 
> 
> I see two problems with your patch: 1) It just disconnects the remote
> connection at the end of txn if the GUC is set to false, but it
> doesn't remove the connection cache entry from ConnectionHash.

Yes, and this looks like a valid state for postgres_fdw and it can get 
into the same state even without my patch. Next time GetConnection() 
will find this cache entry, figure out that entry->conn is NULL and 
establish a fresh connection. It is not clear for me right now, what 
benefits we will get from clearing also this cache entry, except just 
doing this for sanity.

> 2) What
> happens if there are some cached connections, user set the GUC to
> false and not run any foreign queries or not use those connections
> thereafter, so only the new connections will not be cached? Will the
> existing unused connections still remain in the connection cache? See
> (b) above for a solution.
> 

Yes, they will. This could be solved with that additional disconnect 
function as you proposed in c).


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



On Wed, Nov 18, 2020 at 10:32 PM Alexey Kondratov
<a.kondratov@postgrespro.ru> wrote:
>
> > Below is what I have in my mind, mostly inline with yours:
> >
> > a) Have a server level option (keep_connetion true/false, with the
> > default being true), when set to false the connection that's made with
> > this foreign server is closed and cached entry from the connection
> > cache is deleted at the end of txn in pgfdw_xact_callback.
> > b) Have postgres_fdw level GUC postgres_fdw.keep_connections default
> > being true. When set to false by the user, the connections, that are
> > used after this, are closed and removed from the cache at the end of
> > respective txns. If we don't use a connection that was cached prior to
> > the user setting the GUC as false,  then we may not be able to clear
> > it. We can avoid this problem by recommending users either to set the
> > GUC to false right after the CREATE EXTENSION postgres_fdw; or else
> > use the function specified in (c).
> > c) Have a new function that gets defined as part of CREATE EXTENSION
> > postgres_fdw;, say postgres_fdw_discard_connections(), similar to
> > dblink's dblink_disconnect(), which discards all the remote
> > connections and clears connection cache. And we can also have server
> > name as input to postgres_fdw_discard_connections() to discard
> > selectively.
> >
> > Thoughts? If okay with the approach, I will start working on the patch.
>
> This approach looks solid enough from my perspective to give it a try. I
> would only make it as three separate patches for an ease of further
> review.
>

Thanks! I will make separate patches and post them soon.

>
> >> Attached is a small POC patch, which implements this contrib-level
> >> postgres_fdw.keep_connections GUC. What do you think?
 >
> > I see two problems with your patch: 1) It just disconnects the remote
> > connection at the end of txn if the GUC is set to false, but it
> > doesn't remove the connection cache entry from ConnectionHash.
>
> Yes, and this looks like a valid state for postgres_fdw and it can get
> into the same state even without my patch. Next time GetConnection()
> will find this cache entry, figure out that entry->conn is NULL and
> establish a fresh connection. It is not clear for me right now, what
> benefits we will get from clearing also this cache entry, except just
> doing this for sanity.
>

By clearing the cache entry we will have 2 advantages: 1) we could
save a(small) bit of memory 2) we could allow new connections to be
cached, currently ConnectionHash can have only 8 entries. IMHO, along
with disconnecting, we can also clear off the cache entry. Thoughts?

>
> > 2) What
> > happens if there are some cached connections, user set the GUC to
> > false and not run any foreign queries or not use those connections
> > thereafter, so only the new connections will not be cached? Will the
> > existing unused connections still remain in the connection cache? See
> > (b) above for a solution.
> >
>
> Yes, they will. This could be solved with that additional disconnect
> function as you proposed in c).
>

Right.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On 2020-11-19 07:11, Bharath Rupireddy wrote:
> On Wed, Nov 18, 2020 at 10:32 PM Alexey Kondratov
> <a.kondratov@postgrespro.ru> wrote:
> 
> Thanks! I will make separate patches and post them soon.
> 
>> 
>> >> Attached is a small POC patch, which implements this contrib-level
>> >> postgres_fdw.keep_connections GUC. What do you think?
>  >
>> > I see two problems with your patch: 1) It just disconnects the remote
>> > connection at the end of txn if the GUC is set to false, but it
>> > doesn't remove the connection cache entry from ConnectionHash.
>> 
>> Yes, and this looks like a valid state for postgres_fdw and it can get
>> into the same state even without my patch. Next time GetConnection()
>> will find this cache entry, figure out that entry->conn is NULL and
>> establish a fresh connection. It is not clear for me right now, what
>> benefits we will get from clearing also this cache entry, except just
>> doing this for sanity.
>> 
> 
> By clearing the cache entry we will have 2 advantages: 1) we could
> save a(small) bit of memory 2) we could allow new connections to be
> cached, currently ConnectionHash can have only 8 entries. IMHO, along
> with disconnecting, we can also clear off the cache entry. Thoughts?
> 

IIUC, 8 is not a hard limit, it is just a starting size. ConnectionHash 
is not a shared-memory hash table, so dynahash can expand it on-the-fly 
as follow, for example, from the comment before hash_create():

  * Note: for a shared-memory hashtable, nelem needs to be a pretty good
  * estimate, since we can't expand the table on the fly.  But an 
unshared
  * hashtable can be expanded on-the-fly, so it's better for nelem to be
  * on the small side and let the table grow if it's exceeded.  An overly
  * large nelem will penalize hash_seq_search speed without buying much.

Also I am not sure that by doing just a HASH_REMOVE you will free any 
memory, since hash table is already allocated (or expanded) to some 
size. So HASH_REMOVE will only add removed entry to the freeList, I 
guess.

Anyway, I can hardly imagine bloating of ConnectionHash to be a problem 
even in the case, when one has thousands of foreign servers all being 
accessed during a single backend life span.


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



On Thu, Nov 19, 2020 at 5:39 PM Alexey Kondratov
<a.kondratov@postgrespro.ru> wrote:
>
> >
> > By clearing the cache entry we will have 2 advantages: 1) we could
> > save a(small) bit of memory 2) we could allow new connections to be
> > cached, currently ConnectionHash can have only 8 entries. IMHO, along
> > with disconnecting, we can also clear off the cache entry. Thoughts?
> >
>
> IIUC, 8 is not a hard limit, it is just a starting size. ConnectionHash
> is not a shared-memory hash table, so dynahash can expand it on-the-fly
> as follow, for example, from the comment before hash_create():
>

Thanks! Yes this is true. I was wrong earlier. I verified that 8 is
not a hard limit.

>
> Also I am not sure that by doing just a HASH_REMOVE you will free any
> memory, since hash table is already allocated (or expanded) to some
> size. So HASH_REMOVE will only add removed entry to the freeList, I
> guess.
>
> Anyway, I can hardly imagine bloating of ConnectionHash to be a problem
> even in the case, when one has thousands of foreign servers all being
> accessed during a single backend life span.
>

Okay. I will not add the code to remove the entries from cache.

Here is how I'm making 4 separate patches:

1. new function and it's documentation.
2. GUC and it's documentation.
3. server level option and it's documentation.
4. test cases for all of the above patches.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



>
> Here is how I'm making 4 separate patches:
>
> 1. new function and it's documentation.
> 2. GUC and it's documentation.
> 3. server level option and it's documentation.
> 4. test cases for all of the above patches.
>

Hi, I'm attaching the patches here. Note that, though the code changes
for this feature are small, I divided them up as separate patches to
make review easy.

v1-0001-postgres_fdw-function-to-discard-cached-connections.patch
This patch adds a new function that gets defined as part of CREATE
EXTENSION postgres_fdw; postgres_fdw_disconnect() when called with a
foreign server name discards the associated connections with the
server name. When called without any argument, discards all the
existing cached connections.

v1-0002-postgres_fdw-add-keep_connections-GUC-to-not-cache-connections.patch
This patch adds a new GUC postgres_fdw.keep_connections, default being
on, when set to off no remote connections are cached by the local
session.

v1-0003-postgres_fdw-server-level-option-keep_connection.patch
This patch adds a new server level option, keep_connection, default
being on, when set to off, the local session doesn't cache the
connections associated with the foreign server.

v1-0004-postgres_fdw-connection-cache-discard-tests-and-documentation.patch
This patch adds the tests and documentation related to this feature.

Please review the patches.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment
Hi,

On 2020-11-23 09:48, Bharath Rupireddy wrote:
>> 
>> Here is how I'm making 4 separate patches:
>> 
>> 1. new function and it's documentation.
>> 2. GUC and it's documentation.
>> 3. server level option and it's documentation.
>> 4. test cases for all of the above patches.
>> 
> 
> Hi, I'm attaching the patches here. Note that, though the code changes
> for this feature are small, I divided them up as separate patches to
> make review easy.
> 
> v1-0001-postgres_fdw-function-to-discard-cached-connections.patch
> 

This patch looks pretty straightforward for me, but there are some 
things to be addressed IMO:

+        server = GetForeignServerByName(servername, true);
+
+        if (server != NULL)
+        {

Yes, you return a false if no server was found, but for me it worth 
throwing an error in this case as, for example, dblink does in the 
dblink_disconnect().

+ result = disconnect_cached_connections(FOREIGNSERVEROID,
+     hashvalue,
+     false);

+        if (all || (!all && cacheid == FOREIGNSERVEROID &&
+            entry->server_hashvalue == hashvalue))
+        {
+            if (entry->conn != NULL &&
+                !all && cacheid == FOREIGNSERVEROID &&
+                entry->server_hashvalue == hashvalue)

These conditions look bulky for me. First, you pass FOREIGNSERVEROID to 
disconnect_cached_connections(), but actually it just duplicates 'all' 
flag, since when it is 'FOREIGNSERVEROID', then 'all == false'; when it 
is '-1', then 'all == true'. That is all, there are only two calls of 
disconnect_cached_connections(). That way, it seems that we should keep 
only 'all' flag at least for now, doesn't it?

Second, I think that we should just rewrite this if statement in order 
to simplify it and make more readable, e.g.:

    if ((all || entry->server_hashvalue == hashvalue) &&
        entry->conn != NULL)
    {
        disconnect_pg_server(entry);
        result = true;
    }

+    if (all)
+    {
+        hash_destroy(ConnectionHash);
+        ConnectionHash = NULL;
+        result = true;
+    }

Also, I am still not sure that it is a good idea to destroy the whole 
cache even in 'all' case, but maybe others will have a different 
opinion.

> 
> v1-0002-postgres_fdw-add-keep_connections-GUC-to-not-cache-connections.patch
> 

+            entry->changing_xact_state) ||
+            (entry->used_in_current_xact &&
+            !keep_connections))

I am not sure, but I think, that instead of adding this additional flag 
into ConnCacheEntry structure we can look on entry->xact_depth and use 
local:

bool used_in_current_xact = entry->xact_depth > 0;

for exactly the same purpose. Since we set entry->xact_depth to zero at 
the end of xact, then it was used if it is not zero. It is set to 1 by 
begin_remote_xact() called by GetConnection(), so everything seems to be 
fine.

Otherwise, both patches seem to be working as expected. I am going to 
have a look on the last two patches a bit later.

Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



Thanks for the review comments.

On Mon, Nov 23, 2020 at 9:57 PM Alexey Kondratov
<a.kondratov@postgrespro.ru> wrote:
>
> > v1-0001-postgres_fdw-function-to-discard-cached-connections.patch
>
> This patch looks pretty straightforward for me, but there are some
> things to be addressed IMO:
>
> +               server = GetForeignServerByName(servername, true);
> +
> +               if (server != NULL)
> +               {
>
> Yes, you return a false if no server was found, but for me it worth
> throwing an error in this case as, for example, dblink does in the
> dblink_disconnect().
>

dblink_disconnect() "Returns status, which is always OK (since any
error causes the function to throw an error instead of returning)."
This behaviour doesn't seem okay to me.

Since we throw true/false, I would prefer to throw a warning(with a
reason) while returning false over an error.

>
> + result = disconnect_cached_connections(FOREIGNSERVEROID,
> +        hashvalue,
> +        false);
>
> +               if (all || (!all && cacheid == FOREIGNSERVEROID &&
> +                       entry->server_hashvalue == hashvalue))
> +               {
> +                       if (entry->conn != NULL &&
> +                               !all && cacheid == FOREIGNSERVEROID &&
> +                               entry->server_hashvalue == hashvalue)
>
> These conditions look bulky for me. First, you pass FOREIGNSERVEROID to
> disconnect_cached_connections(), but actually it just duplicates 'all'
> flag, since when it is 'FOREIGNSERVEROID', then 'all == false'; when it
> is '-1', then 'all == true'. That is all, there are only two calls of
> disconnect_cached_connections(). That way, it seems that we should keep
> only 'all' flag at least for now, doesn't it?
>

I added cachid as an argument to disconnect_cached_connections() for
reusability. Say, someone wants to use it with a user mapping then
they can pass cacheid USERMAPPINGOID, hash value of user mapping. The
cacheid == USERMAPPINGOID && entry->mapping_hashvalue == hashvalue can
be added to disconnect_cached_connections().

>
> Second, I think that we should just rewrite this if statement in order
> to simplify it and make more readable, e.g.:
>
>         if ((all || entry->server_hashvalue == hashvalue) &&
>                 entry->conn != NULL)
>         {
>                 disconnect_pg_server(entry);
>                 result = true;
>         }
>

Yeah. I will add a cacheid check and change it to below.

         if ((all || (cacheid == FOREIGNSERVEROID &&
entry->server_hashvalue == hashvalue)) &&
                 entry->conn != NULL)
         {
                 disconnect_pg_server(entry);
                 result = true;
         }

>
> +       if (all)
> +       {
> +               hash_destroy(ConnectionHash);
> +               ConnectionHash = NULL;
> +               result = true;
> +       }
>
> Also, I am still not sure that it is a good idea to destroy the whole
> cache even in 'all' case, but maybe others will have a different
> opinion.
>

I think we should. When we disconnect all the connections, then no
point in keeping the connection cache hash data structure. If required
it gets created at the next first foreign server usage in the same
session. And also, hash_destroy() frees up memory context unlike
hash_search with HASH_REMOVE, so we can save a bit of memory.

> >
> > v1-0002-postgres_fdw-add-keep_connections-GUC-to-not-cache-connections.patch
> >
>
> +                       entry->changing_xact_state) ||
> +                       (entry->used_in_current_xact &&
> +                       !keep_connections))
>
> I am not sure, but I think, that instead of adding this additional flag
> into ConnCacheEntry structure we can look on entry->xact_depth and use
> local:
>
> bool used_in_current_xact = entry->xact_depth > 0;
>
> for exactly the same purpose. Since we set entry->xact_depth to zero at
> the end of xact, then it was used if it is not zero. It is set to 1 by
> begin_remote_xact() called by GetConnection(), so everything seems to be
> fine.
>

I missed this. Thanks, we can use the local variable as you suggested.
I will change it.

>
> Otherwise, both patches seem to be working as expected. I am going to
> have a look on the last two patches a bit later.
>

Thanks. I will work on the comments so far and post updated patches soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On 2020-11-24 06:52, Bharath Rupireddy wrote:
> Thanks for the review comments.
> 
> On Mon, Nov 23, 2020 at 9:57 PM Alexey Kondratov
> <a.kondratov@postgrespro.ru> wrote:
>> 
>> > v1-0001-postgres_fdw-function-to-discard-cached-connections.patch
>> 
>> This patch looks pretty straightforward for me, but there are some
>> things to be addressed IMO:
>> 
>> +               server = GetForeignServerByName(servername, true);
>> +
>> +               if (server != NULL)
>> +               {
>> 
>> Yes, you return a false if no server was found, but for me it worth
>> throwing an error in this case as, for example, dblink does in the
>> dblink_disconnect().
>> 
> 
> dblink_disconnect() "Returns status, which is always OK (since any
> error causes the function to throw an error instead of returning)."
> This behaviour doesn't seem okay to me.
> 
> Since we throw true/false, I would prefer to throw a warning(with a
> reason) while returning false over an error.
> 

I thought about something a bit more sophisticated:

1) Return 'true' if there were open connections and we successfully 
closed them.
2) Return 'false' in the no-op case, i.e. there were no open 
connections.
3) Rise an error if something went wrong. And non-existing server case 
belongs to this last category, IMO.

That looks like a semantically correct behavior, but let us wait for any 
other opinion.

> 
>> 
>> + result = disconnect_cached_connections(FOREIGNSERVEROID,
>> +        hashvalue,
>> +        false);
>> 
>> +               if (all || (!all && cacheid == FOREIGNSERVEROID &&
>> +                       entry->server_hashvalue == hashvalue))
>> +               {
>> +                       if (entry->conn != NULL &&
>> +                               !all && cacheid == FOREIGNSERVEROID &&
>> +                               entry->server_hashvalue == hashvalue)
>> 
>> These conditions look bulky for me. First, you pass FOREIGNSERVEROID 
>> to
>> disconnect_cached_connections(), but actually it just duplicates 'all'
>> flag, since when it is 'FOREIGNSERVEROID', then 'all == false'; when 
>> it
>> is '-1', then 'all == true'. That is all, there are only two calls of
>> disconnect_cached_connections(). That way, it seems that we should 
>> keep
>> only 'all' flag at least for now, doesn't it?
>> 
> 
> I added cachid as an argument to disconnect_cached_connections() for
> reusability. Say, someone wants to use it with a user mapping then
> they can pass cacheid USERMAPPINGOID, hash value of user mapping. The
> cacheid == USERMAPPINGOID && entry->mapping_hashvalue == hashvalue can
> be added to disconnect_cached_connections().
> 

Yeah, I have got your point and motivation to add this argument, but how 
we can use it? To disconnect all connections belonging to some specific 
user mapping? But any user mapping is hard bound to some foreign server, 
AFAIK, so we can pass serverid-based hash in this case.

In the case of pgfdw_inval_callback() this argument makes sense, since 
syscache callbacks work that way, but here I can hardly imagine a case 
where we can use it. Thus, it still looks as a preliminary complication 
for me, since we do not have plans to use it, do we? Anyway, everything 
seems to be working fine, so it is up to you to keep this additional 
argument.

> 
> v1-0003-postgres_fdw-server-level-option-keep_connection.patch
> This patch adds a new server level option, keep_connection, default
> being on, when set to off, the local session doesn't cache the
> connections associated with the foreign server.
> 

This patch looks good to me, except one note:

              (entry->used_in_current_xact &&
-            !keep_connections))
+            (!keep_connections || !entry->keep_connection)))
          {

Following this logic:

1) If keep_connections == true, then per-server keep_connection has a 
*higher* priority, so one can disable caching of a single foreign 
server.

2) But if keep_connections == false, then it works like a global switch 
off indifferently of per-server keep_connection's, i.e. they have a 
*lower* priority.

It looks fine for me, at least I cannot propose anything better, but 
maybe it should be documented in 0004?

> 
> v1-0004-postgres_fdw-connection-cache-discard-tests-and-documentation.patch
> This patch adds the tests and documentation related to this feature.
> 

I have not read all texts thoroughly, but what caught my eye:

+   A GUC, <varname>postgres_fdw.keep_connections</varname>, default 
being
+   <literal>on</literal>, when set to <literal>off</literal>, the local 
session

I think that GUC acronym is used widely only in the source code and 
Postgres docs tend to do not use it at all, except from acronyms list 
and a couple of 'GUC parameters' collocation usage. And it never used in 
a singular form there, so I think that it should be rather:

A configuration parameter, 
<varname>postgres_fdw.keep_connections</varname>, default being...

+     <para>
+      Note that when <varname>postgres_fdw.keep_connections</varname> 
is set to
+      off, <filename>postgres_fdw</filename> discards either the 
connections
+      that are made previously and will be used by the local session or 
the
+      connections that will be made newly. But the connections that are 
made
+      previously and kept, but not used after this parameter is set to 
off, are
+      not discarded. To discard them, use
+      <function>postgres_fdw_disconnect</function> function.
+     </para>

The whole paragraph is really difficult to follow. It could be something 
like that:

      <para>
       Note that setting <varname>postgres_fdw.keep_connections</varname> 
to
       off does not discard any previously made and still open 
connections immediately.
       They will be closed only at the end of a future transaction, which 
operated on them.

       To close all connections immediately use
       <function>postgres_fdw_disconnect</function> function.
      </para>


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



On Wed, Nov 25, 2020 at 2:43 AM Alexey Kondratov <a.kondratov@postgrespro.ru> wrote:
On 2020-11-24 06:52, Bharath Rupireddy wrote:
> Thanks for the review comments.
>
> On Mon, Nov 23, 2020 at 9:57 PM Alexey Kondratov
> <a.kondratov@postgrespro.ru> wrote:
>>
>> > v1-0001-postgres_fdw-function-to-discard-cached-connections.patch
>>
>> This patch looks pretty straightforward for me, but there are some
>> things to be addressed IMO:
>>
>> +               server = GetForeignServerByName(servername, true);
>> +
>> +               if (server != NULL)
>> +               {
>>
>> Yes, you return a false if no server was found, but for me it worth
>> throwing an error in this case as, for example, dblink does in the
>> dblink_disconnect().
>>
>
> dblink_disconnect() "Returns status, which is always OK (since any
> error causes the function to throw an error instead of returning)."
> This behaviour doesn't seem okay to me.
>
> Since we throw true/false, I would prefer to throw a warning(with a
> reason) while returning false over an error.
>

I thought about something a bit more sophisticated:

1) Return 'true' if there were open connections and we successfully
closed them.
2) Return 'false' in the no-op case, i.e. there were no open
connections.
3) Rise an error if something went wrong. And non-existing server case
belongs to this last category, IMO.

That looks like a semantically correct behavior, but let us wait for any
other opinion.

>
>>
>> + result = disconnect_cached_connections(FOREIGNSERVEROID,
>> +        hashvalue,
>> +        false);
>>
>> +               if (all || (!all && cacheid == FOREIGNSERVEROID &&
>> +                       entry->server_hashvalue == hashvalue))
>> +               {
>> +                       if (entry->conn != NULL &&
>> +                               !all && cacheid == FOREIGNSERVEROID &&
>> +                               entry->server_hashvalue == hashvalue)
>>
>> These conditions look bulky for me. First, you pass FOREIGNSERVEROID
>> to
>> disconnect_cached_connections(), but actually it just duplicates 'all'
>> flag, since when it is 'FOREIGNSERVEROID', then 'all == false'; when
>> it
>> is '-1', then 'all == true'. That is all, there are only two calls of
>> disconnect_cached_connections(). That way, it seems that we should
>> keep
>> only 'all' flag at least for now, doesn't it?
>>
>
> I added cachid as an argument to disconnect_cached_connections() for
> reusability. Say, someone wants to use it with a user mapping then
> they can pass cacheid USERMAPPINGOID, hash value of user mapping. The
> cacheid == USERMAPPINGOID && entry->mapping_hashvalue == hashvalue can
> be added to disconnect_cached_connections().
>

Yeah, I have got your point and motivation to add this argument, but how
we can use it? To disconnect all connections belonging to some specific
user mapping? But any user mapping is hard bound to some foreign server,
AFAIK, so we can pass serverid-based hash in this case.

In the case of pgfdw_inval_callback() this argument makes sense, since
syscache callbacks work that way, but here I can hardly imagine a case
where we can use it. Thus, it still looks as a preliminary complication
for me, since we do not have plans to use it, do we? Anyway, everything
seems to be working fine, so it is up to you to keep this additional
argument.

>
> v1-0003-postgres_fdw-server-level-option-keep_connection.patch
> This patch adds a new server level option, keep_connection, default
> being on, when set to off, the local session doesn't cache the
> connections associated with the foreign server.
>

This patch looks good to me, except one note:

                        (entry->used_in_current_xact &&
-                       !keep_connections))
+                       (!keep_connections || !entry->keep_connection)))
                {

Following this logic:

1) If keep_connections == true, then per-server keep_connection has a
*higher* priority, so one can disable caching of a single foreign
server.

2) But if keep_connections == false, then it works like a global switch
off indifferently of per-server keep_connection's, i.e. they have a
*lower* priority.

It looks fine for me, at least I cannot propose anything better, but
maybe it should be documented in 0004?

>
> v1-0004-postgres_fdw-connection-cache-discard-tests-and-documentation.patch
> This patch adds the tests and documentation related to this feature.
>

I have not read all texts thoroughly, but what caught my eye:

+   A GUC, <varname>postgres_fdw.keep_connections</varname>, default
being
+   <literal>on</literal>, when set to <literal>off</literal>, the local
session

I think that GUC acronym is used widely only in the source code and
Postgres docs tend to do not use it at all, except from acronyms list
and a couple of 'GUC parameters' collocation usage. And it never used in
a singular form there, so I think that it should be rather:

A configuration parameter,
<varname>postgres_fdw.keep_connections</varname>, default being...


A quick thought here.

Would it make sense to add a hook in the DISCARD ALL implementation that postgres_fdw can register for?

There's precedent here, since DISCARD ALL already has the same effect as SELECT pg_advisory_unlock_all(); amongst other things.




On Wed, Nov 25, 2020 at 7:24 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
>
> A quick thought here.
>
> Would it make sense to add a hook in the DISCARD ALL implementation that postgres_fdw can register for?
>
> There's precedent here, since DISCARD ALL already has the same effect as SELECT pg_advisory_unlock_all(); amongst
otherthings.
 
>

IIUC, then it is like a core(server) function doing some work for the
postgres_fdw module. Earlier in the discussion, one point raised was
that it's better not to have core handling something related to
postgres_fdw. This is the reason we have come up with postgres_fdw
specific function and a GUC, which get defined when extension is
created. Similarly, dblink also has it's own bunch of functions one
among them is dblink_disconnect().

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On 2020-11-25 06:17, Bharath Rupireddy wrote:
> On Wed, Nov 25, 2020 at 7:24 AM Craig Ringer
> <craig.ringer@enterprisedb.com> wrote:
>> 
>> A quick thought here.
>> 
>> Would it make sense to add a hook in the DISCARD ALL implementation 
>> that postgres_fdw can register for?
>> 
>> There's precedent here, since DISCARD ALL already has the same effect 
>> as SELECT pg_advisory_unlock_all(); amongst other things.
>> 
> 
> IIUC, then it is like a core(server) function doing some work for the
> postgres_fdw module. Earlier in the discussion, one point raised was
> that it's better not to have core handling something related to
> postgres_fdw. This is the reason we have come up with postgres_fdw
> specific function and a GUC, which get defined when extension is
> created. Similarly, dblink also has it's own bunch of functions one
> among them is dblink_disconnect().
> 

If I have got Craig correctly, he proposed that we already have a 
DISCARD ALL statement, which is processed by DiscardAll(), and it 
releases internal resources known from the core perspective. That way, 
we can introduce a general purpose hook DiscardAll_hook(), so 
postgres_fdw can get use of it to clean up its own resources 
(connections in our context) if needed. In other words, it is not like a 
core function doing some work for the postgres_fdw module, but rather 
like a callback/hook, that postgres_fdw is able to register to do some 
additional work.

It can be a good replacement for 0001, but won't it be already an 
overkill to drop all local caches along with remote connections? I mean, 
that it would be a nice to have hook from the extensibility perspective, 
but postgres_fdw_disconnect() still makes sense, since it does a very 
narrow and specific job.


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



On Wed, Nov 25, 2020 at 12:13 AM Alexey Kondratov
<a.kondratov@postgrespro.ru> wrote:
>
> 1) Return 'true' if there were open connections and we successfully
> closed them.
> 2) Return 'false' in the no-op case, i.e. there were no open
> connections.
> 3) Rise an error if something went wrong. And non-existing server case
> belongs to this last category, IMO.
>

Done this way.

>
> I am not sure, but I think, that instead of adding this additional flag
> into ConnCacheEntry structure we can look on entry->xact_depth and use
> local:
>
> bool used_in_current_xact = entry->xact_depth > 0;
>
> for exactly the same purpose. Since we set entry->xact_depth to zero at
> the end of xact, then it was used if it is not zero. It is set to 1 by
> begin_remote_xact() called by GetConnection(), so everything seems to be
> fine.
>

Done.

>
> In the case of pgfdw_inval_callback() this argument makes sense, since
> syscache callbacks work that way, but here I can hardly imagine a case
> where we can use it. Thus, it still looks as a preliminary complication
> for me, since we do not have plans to use it, do we? Anyway, everything
> seems to be working fine, so it is up to you to keep this additional
> argument.
>

Removed the cacheid variable.

>
> Following this logic:
>
> 1) If keep_connections == true, then per-server keep_connection has a
> *higher* priority, so one can disable caching of a single foreign
> server.
>
> 2) But if keep_connections == false, then it works like a global switch
> off indifferently of per-server keep_connection's, i.e. they have a
> *lower* priority.
>
> It looks fine for me, at least I cannot propose anything better, but
> maybe it should be documented in 0004?
>

Done.

>
> I think that GUC acronym is used widely only in the source code and
> Postgres docs tend to do not use it at all, except from acronyms list
> and a couple of 'GUC parameters' collocation usage. And it never used in
> a singular form there, so I think that it should be rather:
>
> A configuration parameter,
> <varname>postgres_fdw.keep_connections</varname>, default being...
>

Done.

>
> The whole paragraph is really difficult to follow. It could be something
> like that:
>
>       <para>
>        Note that setting <varname>postgres_fdw.keep_connections</varname>
> to
>        off does not discard any previously made and still open
> connections immediately.
>        They will be closed only at the end of a future transaction, which
> operated on them.
>
>        To close all connections immediately use
>        <function>postgres_fdw_disconnect</function> function.
>       </para>
>

Done.

Attaching the v2 patch set. Please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2020/11/27 11:12, Bharath Rupireddy wrote:
> On Wed, Nov 25, 2020 at 12:13 AM Alexey Kondratov
> <a.kondratov@postgrespro.ru> wrote:
>>
>> 1) Return 'true' if there were open connections and we successfully
>> closed them.
>> 2) Return 'false' in the no-op case, i.e. there were no open
>> connections.
>> 3) Rise an error if something went wrong. And non-existing server case
>> belongs to this last category, IMO.
>>
> 
> Done this way.
> 
>>
>> I am not sure, but I think, that instead of adding this additional flag
>> into ConnCacheEntry structure we can look on entry->xact_depth and use
>> local:
>>
>> bool used_in_current_xact = entry->xact_depth > 0;
>>
>> for exactly the same purpose. Since we set entry->xact_depth to zero at
>> the end of xact, then it was used if it is not zero. It is set to 1 by
>> begin_remote_xact() called by GetConnection(), so everything seems to be
>> fine.
>>
> 
> Done.
> 
>>
>> In the case of pgfdw_inval_callback() this argument makes sense, since
>> syscache callbacks work that way, but here I can hardly imagine a case
>> where we can use it. Thus, it still looks as a preliminary complication
>> for me, since we do not have plans to use it, do we? Anyway, everything
>> seems to be working fine, so it is up to you to keep this additional
>> argument.
>>
> 
> Removed the cacheid variable.
> 
>>
>> Following this logic:
>>
>> 1) If keep_connections == true, then per-server keep_connection has a
>> *higher* priority, so one can disable caching of a single foreign
>> server.
>>
>> 2) But if keep_connections == false, then it works like a global switch
>> off indifferently of per-server keep_connection's, i.e. they have a
>> *lower* priority.
>>
>> It looks fine for me, at least I cannot propose anything better, but
>> maybe it should be documented in 0004?
>>
> 
> Done.
> 
>>
>> I think that GUC acronym is used widely only in the source code and
>> Postgres docs tend to do not use it at all, except from acronyms list
>> and a couple of 'GUC parameters' collocation usage. And it never used in
>> a singular form there, so I think that it should be rather:
>>
>> A configuration parameter,
>> <varname>postgres_fdw.keep_connections</varname>, default being...
>>
> 
> Done.
> 
>>
>> The whole paragraph is really difficult to follow. It could be something
>> like that:
>>
>>        <para>
>>         Note that setting <varname>postgres_fdw.keep_connections</varname>
>> to
>>         off does not discard any previously made and still open
>> connections immediately.
>>         They will be closed only at the end of a future transaction, which
>> operated on them.
>>
>>         To close all connections immediately use
>>         <function>postgres_fdw_disconnect</function> function.
>>        </para>
>>
> 
> Done.
> 
> Attaching the v2 patch set. Please review it further.

Regarding the 0001 patch, we should add the function that returns
the information of cached connections like dblink_get_connections(),
together with 0001 patch? Otherwise it's not easy for users to
see how many cached connections are and determine whether to
disconnect them or not. Sorry if this was already discussed before.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Fri, Dec 4, 2020 at 11:49 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> > Attaching the v2 patch set. Please review it further.
>
> Regarding the 0001 patch, we should add the function that returns
> the information of cached connections like dblink_get_connections(),
> together with 0001 patch? Otherwise it's not easy for users to
> see how many cached connections are and determine whether to
> disconnect them or not. Sorry if this was already discussed before.
>

Thanks for bringing this up. Exactly this is what I was thinking a few
days back. Say the new function postgres_fdw_get_connections() which
can return an array of server names whose connections exist in the
cache. Without this function, the user may not know how many
connections this backend has until he checks it manually on the remote
server.

Thoughts? If okay, I can code the function in the 0001 patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Fri, Dec 4, 2020 at 1:46 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Dec 4, 2020 at 11:49 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >
> > > Attaching the v2 patch set. Please review it further.
> >
> > Regarding the 0001 patch, we should add the function that returns
> > the information of cached connections like dblink_get_connections(),
> > together with 0001 patch? Otherwise it's not easy for users to
> > see how many cached connections are and determine whether to
> > disconnect them or not. Sorry if this was already discussed before.
> >
>
> Thanks for bringing this up. Exactly this is what I was thinking a few
> days back. Say the new function postgres_fdw_get_connections() which
> can return an array of server names whose connections exist in the
> cache. Without this function, the user may not know how many
> connections this backend has until he checks it manually on the remote
> server.
>
> Thoughts? If okay, I can code the function in the 0001 patch.
>

Added a new function postgres_fdw_get_connections() into 0001 patch,
which returns a list of server names for which there exists an
existing open and active connection.

Attaching v3 patch set, please review it further.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2020/12/04 20:15, Bharath Rupireddy wrote:
> On Fri, Dec 4, 2020 at 1:46 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> On Fri, Dec 4, 2020 at 11:49 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>
>>>> Attaching the v2 patch set. Please review it further.
>>>
>>> Regarding the 0001 patch, we should add the function that returns
>>> the information of cached connections like dblink_get_connections(),
>>> together with 0001 patch? Otherwise it's not easy for users to
>>> see how many cached connections are and determine whether to
>>> disconnect them or not. Sorry if this was already discussed before.
>>>
>>
>> Thanks for bringing this up. Exactly this is what I was thinking a few
>> days back. Say the new function postgres_fdw_get_connections() which
>> can return an array of server names whose connections exist in the
>> cache. Without this function, the user may not know how many
>> connections this backend has until he checks it manually on the remote
>> server.
>>
>> Thoughts? If okay, I can code the function in the 0001 patch.
>>
> 
> Added a new function postgres_fdw_get_connections() into 0001 patch,

Thanks!


> which returns a list of server names for which there exists an
> existing open and active connection.
> 
> Attaching v3 patch set, please review it further.

I started reviewing 0001 patch.

IMO the 0001 patch should be self-contained so that we can commit it at first. That is, I think that it's better to
movethe documents and tests for the functions 0001 patch adds from 0004 to 0001.
 

Since 0001 introduces new user-visible functions into postgres_fdw, the version of postgres_fdw should be increased?

The similar code to get the server name from cached connection entry exists also in
pgfdw_reject_incomplete_xact_state_change().I'm tempted to make the "common" function for that code and use it both in
postgres_fdw_get_connections()and pgfdw_reject_incomplete_xact_state_change(), to simplify the code.
 

+            /* We only look for active and open remote connections. */
+            if (entry->invalidated || !entry->conn)
+                continue;

We should return even invalidated entry because it has still cached connection?
Also this makes me wonder if we should return both the server name and boolean flag indicating whether it's invalidated
ornot. If so, users can easily find the invalidated connection entry and disconnect it because there is no need to keep
invalidatedconnection.
 

+    if (all)
+    {
+        hash_destroy(ConnectionHash);
+        ConnectionHash = NULL;
+        result = true;
+    }

Could you tell me why ConnectionHash needs to be destroyed?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Wed, Dec 9, 2020 at 4:49 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> I started reviewing 0001 patch.
>

Thanks!

> IMO the 0001 patch should be self-contained so that we can commit it at first. That is, I think that it's better to
movethe documents and tests for the functions 0001 patch adds from 0004 to 0001. 
>

+1. I will make each patch self-contained in the next version which I
plan to submit soon.

> Since 0001 introduces new user-visible functions into postgres_fdw, the version of postgres_fdw should be increased?
>

Yeah looks like we should do that, dblink has done that when it
introduced new functions. In case the new functions are not required
for anyone, they can choose to go back to 1.0.

Should we make the new version as 1.1 or 2.0? I prefer to make it 1.1
as we are just adding few functionality over 1.0. I will change the
default_version from 1.0 to the 1.1 and add a new
postgres_fdw--1.1.sql file.

If okay, I will make changes to 0001 patch.

> The similar code to get the server name from cached connection entry exists also in
pgfdw_reject_incomplete_xact_state_change().I'm tempted to make the "common" function for that code and use it both in
postgres_fdw_get_connections()and pgfdw_reject_incomplete_xact_state_change(), to simplify the code. 
>

+1. I will move the server name finding code to a new function, say
char *pgfdw_get_server_name(ConnCacheEntry *entry);

> +                       /* We only look for active and open remote connections. */
> +                       if (entry->invalidated || !entry->conn)
> +                               continue;
>
> We should return even invalidated entry because it has still cached connection?
>

I checked this point earlier, for invalidated connections, the tuple
returned from the cache is also invalid and the following error will
be thrown. So, we can not get the server name for that user mapping.
Cache entries too would have been invalidated after the connection is
marked as invalid in pgfdw_inval_callback().

umaptup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(entry->key));
if (!HeapTupleIsValid(umaptup))
      elog(ERROR, "cache lookup failed for user mapping with OID %u",
entry->key);

Can we reload the sys cache entries of USERMAPPINGOID (if there is a
way) for invalid connections in our new function and then do a look
up? If not, another way could be storing the associated server name or
oid in the ConnCacheEntry. Currently we store user mapping oid(in
key), its hash value(in mapping_hashvalue) and foreign server oid's
hash value (in server_hashvalue). If we have the foreign server oid,
then we can just look up for the server name, but I'm not quite sure
whether we get the same issue i.e. invalid tuples when the entry gets
invalided (via pgfdw_inval_callback) due to some change in foreign
server options.

IMHO, we can simply choose to show all the active, valid connections. Thoughts?

> Also this makes me wonder if we should return both the server name and boolean flag indicating whether it's
invalidatedor not. If so, users can easily find the invalidated connection entry and disconnect it because there is no
needto keep invalidated connection. 
>

Currently we are returning a list of foreing server names with whom
there exist active connections. If we somehow address the above
mentioned problem for invalid connections and choose to show them as
well, then how should our output look like? Is it something like we
prepare a list of pairs (servername, validflag)?

> +       if (all)
> +       {
> +               hash_destroy(ConnectionHash);
> +               ConnectionHash = NULL;
> +               result = true;
> +       }
>
> Could you tell me why ConnectionHash needs to be destroyed?
>

Say, in a session there are hundreds of different foreign server
connections made and if users want to disconnect all of them with the
new function and don't want any further foreign connections in that
session, they can do it. But then why keep the cache just lying around
and holding those many entries? Instead we can destroy the cache and
if necessary it will be allocated later on next foreign server
connections.

IMHO, it is better to destroy the cache in case of disconnect all,
hoping to save memory, thinking that (next time if required) the cache
allocation doesn't take much time. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Thu, Dec 10, 2020 at 7:14 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> > +                       /* We only look for active and open remote connections. */
> > +                       if (entry->invalidated || !entry->conn)
> > +                               continue;
> >
> > We should return even invalidated entry because it has still cached connection?
> >
>
> I checked this point earlier, for invalidated connections, the tuple
> returned from the cache is also invalid and the following error will
> be thrown. So, we can not get the server name for that user mapping.
> Cache entries too would have been invalidated after the connection is
> marked as invalid in pgfdw_inval_callback().
>
> umaptup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(entry->key));
> if (!HeapTupleIsValid(umaptup))
>       elog(ERROR, "cache lookup failed for user mapping with OID %u",
> entry->key);
>

I further checked on returning invalidated connections in the output
of the function. Actually, the reason I'm seeing a null tuple from sys
cache (and hence the error "cache lookup failed for user mapping with
OID xxxx") for an invalidated connection is that the user mapping
(with OID entry->key that exists in the cache) is getting dropped, so
the sys cache returns null tuple. The use case is as follows:

1) Create a server, role, and user mapping of the role with the server
2) Run a foreign table query, so that the connection related to the
server gets cached
3) Issue DROP OWNED BY for the role created, since the user mapping is
dependent on that role, it gets dropped from the catalogue table and
an invalidation message will be pending to clear the sys cache
associated with that user mapping.
4) Now, if I do select * from postgres_fdw_get_connections() or for
that matter any query, at the beginning the txn
AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
gets called and marks the cached entry as invalidated. Remember the
reason for this invalidation message is that the user mapping with the
OID entry->key is dropped from 3). Later in
postgres_fdw_get_connections(), when we search the sys cache with
entry->key for that invalidated connection, since the user mapping is
dropped from the system, null tuple is returned.

If we were to show invalidated connections in the output of
postgres_fdw_get_connections(), we can ignore the entry and continue
further if the user mapping sys cache search returns null tuple:

umaptup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(entry->key));

if (!HeapTupleIsValid(umaptup))
   continue;

Thoughts?

> > Also this makes me wonder if we should return both the server name and boolean flag indicating whether it's
invalidatedor not. If so, users can easily find the invalidated connection entry and disconnect it because there is no
needto keep invalidated connection. 
> >
>
> Currently we are returning a list of foreing server names with whom
> there exist active connections. If we somehow address the above
> mentioned problem for invalid connections and choose to show them as
> well, then how should our output look like? Is it something like we
> prepare a list of pairs (servername, validflag)?

If agreed on above point, we can output something like: (myserver1,
valid), (myserver2, valid), (myserver3, invalid), (myserver4, valid)
....

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2020/12/11 19:16, Bharath Rupireddy wrote:
> On Thu, Dec 10, 2020 at 7:14 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>> +                       /* We only look for active and open remote connections. */
>>> +                       if (entry->invalidated || !entry->conn)
>>> +                               continue;
>>>
>>> We should return even invalidated entry because it has still cached connection?
>>>
>>
>> I checked this point earlier, for invalidated connections, the tuple
>> returned from the cache is also invalid and the following error will
>> be thrown. So, we can not get the server name for that user mapping.
>> Cache entries too would have been invalidated after the connection is
>> marked as invalid in pgfdw_inval_callback().
>>
>> umaptup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(entry->key));
>> if (!HeapTupleIsValid(umaptup))
>>        elog(ERROR, "cache lookup failed for user mapping with OID %u",
>> entry->key);
>>
> 
> I further checked on returning invalidated connections in the output
> of the function. Actually, the reason I'm seeing a null tuple from sys
> cache (and hence the error "cache lookup failed for user mapping with
> OID xxxx") for an invalidated connection is that the user mapping
> (with OID entry->key that exists in the cache) is getting dropped, so
> the sys cache returns null tuple. The use case is as follows:
> 
> 1) Create a server, role, and user mapping of the role with the server
> 2) Run a foreign table query, so that the connection related to the
> server gets cached
> 3) Issue DROP OWNED BY for the role created, since the user mapping is
> dependent on that role, it gets dropped from the catalogue table and
> an invalidation message will be pending to clear the sys cache
> associated with that user mapping.
> 4) Now, if I do select * from postgres_fdw_get_connections() or for
> that matter any query, at the beginning the txn
> AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
> gets called and marks the cached entry as invalidated. Remember the
> reason for this invalidation message is that the user mapping with the
> OID entry->key is dropped from 3). Later in
> postgres_fdw_get_connections(), when we search the sys cache with
> entry->key for that invalidated connection, since the user mapping is
> dropped from the system, null tuple is returned.

Thanks for the analysis! This means that the cached connection invalidated by drop of server or user mapping will not
beclosed even by the subsequent access to the foreign server and will remain until the backend exits. Right? If so,
thisseems like a connection-leak bug, at least for me.... Thought?
 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2020/12/10 10:44, Bharath Rupireddy wrote:
> On Wed, Dec 9, 2020 at 4:49 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> I started reviewing 0001 patch.
>>
> 
> Thanks!
> 
>> IMO the 0001 patch should be self-contained so that we can commit it at first. That is, I think that it's better to
movethe documents and tests for the functions 0001 patch adds from 0004 to 0001.
 
>>
> 
> +1. I will make each patch self-contained in the next version which I
> plan to submit soon.
> 
>> Since 0001 introduces new user-visible functions into postgres_fdw, the version of postgres_fdw should be
increased?
>>
> 
> Yeah looks like we should do that, dblink has done that when it
> introduced new functions. In case the new functions are not required
> for anyone, they can choose to go back to 1.0.
> 
> Should we make the new version as 1.1 or 2.0? I prefer to make it 1.1
> as we are just adding few functionality over 1.0. I will change the
> default_version from 1.0 to the 1.1 and add a new
> postgres_fdw--1.1.sql file.

+1


> 
> If okay, I will make changes to 0001 patch.
> 
>> The similar code to get the server name from cached connection entry exists also in
pgfdw_reject_incomplete_xact_state_change().I'm tempted to make the "common" function for that code and use it both in
postgres_fdw_get_connections()and pgfdw_reject_incomplete_xact_state_change(), to simplify the code.
 
>>
> 
> +1. I will move the server name finding code to a new function, say
> char *pgfdw_get_server_name(ConnCacheEntry *entry);
> 
>> +                       /* We only look for active and open remote connections. */
>> +                       if (entry->invalidated || !entry->conn)
>> +                               continue;
>>
>> We should return even invalidated entry because it has still cached connection?
>>
> 
> I checked this point earlier, for invalidated connections, the tuple
> returned from the cache is also invalid and the following error will
> be thrown. So, we can not get the server name for that user mapping.
> Cache entries too would have been invalidated after the connection is
> marked as invalid in pgfdw_inval_callback().
> 
> umaptup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(entry->key));
> if (!HeapTupleIsValid(umaptup))
>        elog(ERROR, "cache lookup failed for user mapping with OID %u",
> entry->key);
> 
> Can we reload the sys cache entries of USERMAPPINGOID (if there is a
> way) for invalid connections in our new function and then do a look
> up? If not, another way could be storing the associated server name or
> oid in the ConnCacheEntry. Currently we store user mapping oid(in
> key), its hash value(in mapping_hashvalue) and foreign server oid's
> hash value (in server_hashvalue). If we have the foreign server oid,
> then we can just look up for the server name, but I'm not quite sure
> whether we get the same issue i.e. invalid tuples when the entry gets
> invalided (via pgfdw_inval_callback) due to some change in foreign
> server options.
> 
> IMHO, we can simply choose to show all the active, valid connections. Thoughts?
> 
>> Also this makes me wonder if we should return both the server name and boolean flag indicating whether it's
invalidatedor not. If so, users can easily find the invalidated connection entry and disconnect it because there is no
needto keep invalidated connection.
 
>>
> 
> Currently we are returning a list of foreing server names with whom
> there exist active connections. If we somehow address the above
> mentioned problem for invalid connections and choose to show them as
> well, then how should our output look like? Is it something like we
> prepare a list of pairs (servername, validflag)?
> 
>> +       if (all)
>> +       {
>> +               hash_destroy(ConnectionHash);
>> +               ConnectionHash = NULL;
>> +               result = true;
>> +       }
>>
>> Could you tell me why ConnectionHash needs to be destroyed?
>>
> 
> Say, in a session there are hundreds of different foreign server
> connections made and if users want to disconnect all of them with the
> new function and don't want any further foreign connections in that
> session, they can do it. But then why keep the cache just lying around
> and holding those many entries? Instead we can destroy the cache and
> if necessary it will be allocated later on next foreign server
> connections.
> 
> IMHO, it is better to destroy the cache in case of disconnect all,
> hoping to save memory, thinking that (next time if required) the cache
> allocation doesn't take much time. Thoughts?

Ok, but why is ConnectionHash destroyed only when "all" is true? Even when "all" is false, for example, the following
querycan disconnect all the cached connections. Even in this case, i.e., whenever there are no cached connections,
ConnectionHashshould be destroyed?
 

     SELECT postgres_fdw_disconnect(srvname) FROM pg_foreign_server ;

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Fri, Dec 11, 2020 at 11:01 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> On 2020/12/11 19:16, Bharath Rupireddy wrote:
> > On Thu, Dec 10, 2020 at 7:14 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >>> +                       /* We only look for active and open remote connections. */
> >>> +                       if (entry->invalidated || !entry->conn)
> >>> +                               continue;
> >>>
> >>> We should return even invalidated entry because it has still cached connection?
> >>>
> >>
> >> I checked this point earlier, for invalidated connections, the tuple
> >> returned from the cache is also invalid and the following error will
> >> be thrown. So, we can not get the server name for that user mapping.
> >> Cache entries too would have been invalidated after the connection is
> >> marked as invalid in pgfdw_inval_callback().
> >>
> >> umaptup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(entry->key));
> >> if (!HeapTupleIsValid(umaptup))
> >>        elog(ERROR, "cache lookup failed for user mapping with OID %u",
> >> entry->key);
> >>
> >
> > I further checked on returning invalidated connections in the output
> > of the function. Actually, the reason I'm seeing a null tuple from sys
> > cache (and hence the error "cache lookup failed for user mapping with
> > OID xxxx") for an invalidated connection is that the user mapping
> > (with OID entry->key that exists in the cache) is getting dropped, so
> > the sys cache returns null tuple. The use case is as follows:
> >
> > 1) Create a server, role, and user mapping of the role with the server
> > 2) Run a foreign table query, so that the connection related to the
> > server gets cached
> > 3) Issue DROP OWNED BY for the role created, since the user mapping is
> > dependent on that role, it gets dropped from the catalogue table and
> > an invalidation message will be pending to clear the sys cache
> > associated with that user mapping.
> > 4) Now, if I do select * from postgres_fdw_get_connections() or for
> > that matter any query, at the beginning the txn
> > AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
> > gets called and marks the cached entry as invalidated. Remember the
> > reason for this invalidation message is that the user mapping with the
> > OID entry->key is dropped from 3). Later in
> > postgres_fdw_get_connections(), when we search the sys cache with
> > entry->key for that invalidated connection, since the user mapping is
> > dropped from the system, null tuple is returned.
>
> Thanks for the analysis! This means that the cached connection invalidated by drop of server or user mapping will not
beclosed even by the subsequent access to the foreign server and will remain until the backend exits. Right?
 

It will be first marked as invalidated via
AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback(),
and on the next use of that connection invalidated connections are
disconnected and reconnected.

    if (entry->conn != NULL && entry->invalidated && entry->xact_depth == 0)
    {
        elog(DEBUG3, "closing connection %p for option changes to take effect",
             entry->conn);
        disconnect_pg_server(entry);
    }

> If so, this seems like a connection-leak bug, at least for me.... Thought?
>

It's not a leak. The comment before pgfdw_inval_callback() [1]
explains why we can not immediately close/disconnect the connections
in pgfdw_inval_callback() after marking them as invalidated.

Here is the scenario how in the midst of a txn we get invalidation
messages(AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
happens):

1) select from foreign table with server1, usermapping1 in session1
2) begin a top txn in session1, run a few foreign queries that open up
sub txns internally. meanwhile alter/drop server1/usermapping1 in
session2, then at each start of sub txn also we get to process the
invalidation messages via
AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback().
So, if we disconnect right after marking invalidated in
pgfdw_inval_callback, that's a problem since we are in a sub txn under
a top txn.

I don't think we can do something here and disconnect the connections
right after the invalidation happens. Thoughts?


[1]
/*
 * Connection invalidation callback function
 *
 * After a change to a pg_foreign_server or pg_user_mapping catalog entry,
 * mark connections depending on that entry as needing to be remade.
 * We can't immediately destroy them, since they might be in the midst of
 * a transaction, but we'll remake them at the next opportunity.
 *
 * Although most cache invalidation callbacks blow away all the related stuff
 * regardless of the given hashvalue, connections are expensive enough that
 * it's worth trying to avoid that.
 *
 * NB: We could avoid unnecessary disconnection more strictly by examining
 * individual option values, but it seems too much effort for the gain.
 */
static void
pgfdw_inval_callback(Datum arg, int cacheid, uint32 hashvalue)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Fri, Dec 11, 2020 at 3:46 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> If we were to show invalidated connections in the output of
> postgres_fdw_get_connections(), we can ignore the entry and continue
> further if the user mapping sys cache search returns null tuple:
>
> umaptup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(entry->key));
>
> if (!HeapTupleIsValid(umaptup))
>    continue;

Any thoughts here?

> > > Also this makes me wonder if we should return both the server name and boolean flag indicating whether it's
invalidatedor not. If so, users can easily find the invalidated connection entry and disconnect it because there is no
needto keep invalidated connection. 
> > >
> >
> > Currently we are returning a list of foreing server names with whom
> > there exist active connections. If we somehow address the above
> > mentioned problem for invalid connections and choose to show them as
> > well, then how should our output look like? Is it something like we
> > prepare a list of pairs (servername, validflag)?
>
> If agreed on above point, we can output something like: (myserver1,
> valid), (myserver2, valid), (myserver3, invalid), (myserver4, valid)

And here on the output text?

In case we agreed on the above output format, one funniest thing could
occur is that if some hypothetical person has "valid" or "invalid" as
their foreign server names, they will have difficulty in reading their
output. (valid, valid), (valid, invalid), (invalid, valid), (invalid,
invalid).

Or should it be something like pairs of (server_name, true/false)?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2020/12/12 3:01, Bharath Rupireddy wrote:
> On Fri, Dec 11, 2020 at 11:01 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> On 2020/12/11 19:16, Bharath Rupireddy wrote:
>>> On Thu, Dec 10, 2020 at 7:14 AM Bharath Rupireddy
>>> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>>>> +                       /* We only look for active and open remote connections. */
>>>>> +                       if (entry->invalidated || !entry->conn)
>>>>> +                               continue;
>>>>>
>>>>> We should return even invalidated entry because it has still cached connection?
>>>>>
>>>>
>>>> I checked this point earlier, for invalidated connections, the tuple
>>>> returned from the cache is also invalid and the following error will
>>>> be thrown. So, we can not get the server name for that user mapping.
>>>> Cache entries too would have been invalidated after the connection is
>>>> marked as invalid in pgfdw_inval_callback().
>>>>
>>>> umaptup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(entry->key));
>>>> if (!HeapTupleIsValid(umaptup))
>>>>         elog(ERROR, "cache lookup failed for user mapping with OID %u",
>>>> entry->key);
>>>>
>>>
>>> I further checked on returning invalidated connections in the output
>>> of the function. Actually, the reason I'm seeing a null tuple from sys
>>> cache (and hence the error "cache lookup failed for user mapping with
>>> OID xxxx") for an invalidated connection is that the user mapping
>>> (with OID entry->key that exists in the cache) is getting dropped, so
>>> the sys cache returns null tuple. The use case is as follows:
>>>
>>> 1) Create a server, role, and user mapping of the role with the server
>>> 2) Run a foreign table query, so that the connection related to the
>>> server gets cached
>>> 3) Issue DROP OWNED BY for the role created, since the user mapping is
>>> dependent on that role, it gets dropped from the catalogue table and
>>> an invalidation message will be pending to clear the sys cache
>>> associated with that user mapping.
>>> 4) Now, if I do select * from postgres_fdw_get_connections() or for
>>> that matter any query, at the beginning the txn
>>> AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
>>> gets called and marks the cached entry as invalidated. Remember the
>>> reason for this invalidation message is that the user mapping with the
>>> OID entry->key is dropped from 3). Later in
>>> postgres_fdw_get_connections(), when we search the sys cache with
>>> entry->key for that invalidated connection, since the user mapping is
>>> dropped from the system, null tuple is returned.
>>
>> Thanks for the analysis! This means that the cached connection invalidated by drop of server or user mapping will
notbe closed even by the subsequent access to the foreign server and will remain until the backend exits. Right?
 
> 
> It will be first marked as invalidated via
> AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback(),
> and on the next use of that connection invalidated connections are
> disconnected and reconnected.

I was thinking that in the case of drop of user mapping or server, hash_search(ConnnectionHash) in GetConnection()
cannotfind the cached connection entry invalidated by that drop. Because "user->umid" used as hash key is changed. So I
wasthinking that that invalidated connection will not be closed nor reconnected.
 


> 
>      if (entry->conn != NULL && entry->invalidated && entry->xact_depth == 0)
>      {
>          elog(DEBUG3, "closing connection %p for option changes to take effect",
>               entry->conn);
>          disconnect_pg_server(entry);
>      }
> 
>> If so, this seems like a connection-leak bug, at least for me.... Thought?
>>
> 
> It's not a leak. The comment before pgfdw_inval_callback() [1]
> explains why we can not immediately close/disconnect the connections
> in pgfdw_inval_callback() after marking them as invalidated.

*If* invalidated connection cannot be close immediately even in the case of drop of server or user mapping, we can
deferit to the subsequent call to GetConnection(). That is, GetConnection() closes not only the target invalidated
connectionbut also the other all invalidated connections. Of course, invalidated connections will remain until
subsequentGetConnection() is called, though.
 


> 
> Here is the scenario how in the midst of a txn we get invalidation
> messages(AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
> happens):
> 
> 1) select from foreign table with server1, usermapping1 in session1
> 2) begin a top txn in session1, run a few foreign queries that open up
> sub txns internally. meanwhile alter/drop server1/usermapping1 in
> session2, then at each start of sub txn also we get to process the
> invalidation messages via
> AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback().
> So, if we disconnect right after marking invalidated in
> pgfdw_inval_callback, that's a problem since we are in a sub txn under
> a top txn.

Maybe. But what is the actual problem here?

OTOH, if cached connection should not be close in the middle of transaction, postgres_fdw_disconnect() also should be
disallowedto be executed during transaction?
 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Sat, Dec 12, 2020 at 12:19 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> I was thinking that in the case of drop of user mapping or server, hash_search(ConnnectionHash) in GetConnection() cannot find the cached connection entry invalidated by that drop. Because "user->umid" used as hash key is changed. So I was thinking that that invalidated connection will not be closed nor reconnected.
>

You are right in saying that the connection leaks.

Use case 1:
1) Run foreign query in session1 with server1, user mapping1
2) Drop user mapping1 in another session2, invalidation message gets logged which will have to be processed by other sessions
3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via pgfdw_inval_callback(). Whatever may be the type of foreign query (select, update, explain, delete, insert, analyze etc.), upon next call to GetUserMapping() from postgres_fdw.c, the cache lookup fails(with ERROR:  user mapping not found for "XXXX") since the user mapping1 has been dropped in session2 and the query will also fail before reaching GetConnection() where the connections associated with invalidated entries would have got disconnected.

So, the connection associated with invalidated entry would remain until the local session exits which is a problem to solve.

Use case 2:
1) Run foreign query in session1 with server1, user mapping1
2) Try to drop foreign server1, then we would not be allowed to do so because of dependency. If we use CASCADE, then the dependent user mapping1 and foreign tables get dropped too [1].
3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via pgfdw_inval_callback(), it fails because there is no foreign table and user mapping1.

But, note that the connection remains open in session1, which is again a problem to solve.

To solve the above connection leak problem, it looks like the right place to close all the invalid connections is pgfdw_xact_callback(), once registered, which gets called at the end of every txn in the current session(by then all the sub txns also would have been finished). Note that if there are too many invalidated entries, then one of the following txn has to bear running this extra code, but that's okay than having leaked connections. Thoughts? If okay, I can code a separate patch.

static void
pgfdw_xact_callback(XactEvent event, void *arg)
{
    HASH_SEQ_STATUS scan;
    ConnCacheEntry *entry;
     /* HERE WE CAN LOOK FOR ALL INVALIDATED ENTRIES AND DISCONNECT THEM*/
    /* Quick exit if no connections were touched in this transaction. */
    if (!xact_got_connection)
        return;

And we can also extend postgres_fdw_disconnect() something like.

postgres_fdw_disconnect(bool invalid_only) --> default for invalid_only false. disconnects all connections. when invalid_only is set to true then disconnects only invalid connections.
postgres_fdw_disconnect('server_name') --> disconnections connections associated with the specified foreign server

Having said this, I'm not in favour of invalid_only flag, because if we choose to change the code in pgfdw_xact_callback to solve connection leak problem, we may not need this invalid_only flag at all, because at the end of txn (even for the txns in which the queries fail with error, pgfdw_xact_callback gets called), all the existing invalid connections get disconnected. Thoughts?

[1]
postgres=# drop server loopback1 ;
ERROR:  cannot drop server loopback1 because other objects depend on it
DETAIL:  user mapping for bharath on server loopback1 depends on server loopback1
foreign table f1 depends on server loopback1
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

postgres=# drop server loopback1 CASCADE ;
NOTICE:  drop cascades to 2 other objects
DETAIL:  drop cascades to user mapping for bharath on server loopback1
drop cascades to foreign table f1
DROP SERVER

> >      if (entry->conn != NULL && entry->invalidated && entry->xact_depth == 0)
> >      {
> >          elog(DEBUG3, "closing connection %p for option changes to take effect",
> >               entry->conn);
> >          disconnect_pg_server(entry);
> >      }
> >
> >> If so, this seems like a connection-leak bug, at least for me.... Thought?
> >>
> >
> > It's not a leak. The comment before pgfdw_inval_callback() [1]
> > explains why we can not immediately close/disconnect the connections
> > in pgfdw_inval_callback() after marking them as invalidated.
>
> *If* invalidated connection cannot be close immediately even in the case of drop of server or user mapping, we can defer it to the subsequent call to GetConnection(). That is, GetConnection() closes not only the target invalidated connection but also the other all invalidated connections. Of course, invalidated connections will remain until subsequent GetConnection() is called, though.
>

I think my detailed response to the above comment clarifies this.

> > Here is the scenario how in the midst of a txn we get invalidation
> > messages(AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
> > happens):
> >
> > 1) select from foreign table with server1, usermapping1 in session1
> > 2) begin a top txn in session1, run a few foreign queries that open up
> > sub txns internally. meanwhile alter/drop server1/usermapping1 in
> > session2, then at each start of sub txn also we get to process the
> > invalidation messages via
> > AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback().
> > So, if we disconnect right after marking invalidated in
> > pgfdw_inval_callback, that's a problem since we are in a sub txn under
> > a top txn.
>
> Maybe. But what is the actual problem here?
>
> OTOH, if cached connection should not be close in the middle of transaction, postgres_fdw_disconnect() also should be disallowed to be executed during transaction?

+1. Yeah that makes sense. We can avoid closing the connection if (entry->xact_depth > 0). I will modify it in disconnect_cached_connections().

> >> Could you tell me why ConnectionHash needs to be destroyed?
> >
> > Say, in a session there are hundreds of different foreign server
> > connections made and if users want to disconnect all of them with the
> > new function and don't want any further foreign connections in that
> > session, they can do it. But then why keep the cache just lying around
> > and holding those many entries? Instead we can destroy the cache and
> > if necessary it will be allocated later on next foreign server
> > connections.
> >
> > IMHO, it is better to destroy the cache in case of disconnect all,
> > hoping to save memory, thinking that (next time if required) the cache
> > allocation doesn't take much time. Thoughts?
>
> Ok, but why is ConnectionHash destroyed only when "all" is true? Even when "all" is false, for example, the following query can disconnect all the cached connections. Even in this case, i.e., whenever there are no cached connections, ConnectionHash should be destroyed?
>
>      SELECT postgres_fdw_disconnect(srvname) FROM pg_foreign_server ;

+1. I can check all the cache entries to see if there are any active connections, in the same loop where I try to find the cache entry for the given foreign server,  if none exists, then I destroy the cache. Thoughts?


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

On 2020/12/12 15:05, Bharath Rupireddy wrote:
> On Sat, Dec 12, 2020 at 12:19 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>
wrote:
>  > I was thinking that in the case of drop of user mapping or server, hash_search(ConnnectionHash) in GetConnection()
cannotfind the cached connection entry invalidated by that drop. Because "user->umid" used as hash key is changed. So I
wasthinking that that invalidated connection will not be closed nor reconnected.
 
>  >
> 
> You are right in saying that the connection leaks.
> 
> Use case 1:
> 1) Run foreign query in session1 with server1, user mapping1
> 2) Drop user mapping1 in another session2, invalidation message gets logged which will have to be processed by other
sessions
> 3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via
pgfdw_inval_callback().Whatever may be the type of foreign query (select, update, explain, delete, insert, analyze
etc.),upon next call to GetUserMapping() from postgres_fdw.c, the cache lookup fails(with ERROR:  user mapping not
foundfor "XXXX") since the user mapping1 has been dropped in session2 and the query will also fail before reaching
GetConnection()where the connections associated with invalidated entries would have got disconnected.
 
> 
> So, the connection associated with invalidated entry would remain until the local session exits which is a problem to
solve.
> 
> Use case 2:
> 1) Run foreign query in session1 with server1, user mapping1
> 2) Try to drop foreign server1, then we would not be allowed to do so because of dependency. If we use CASCADE, then
thedependent user mapping1 and foreign tables get dropped too [1].
 
> 3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via
pgfdw_inval_callback(),it fails because there is no foreign table and user mapping1.
 
> 
> But, note that the connection remains open in session1, which is again a problem to solve.
> 
> To solve the above connection leak problem, it looks like the right place to close all the invalid connections is
pgfdw_xact_callback(),once registered, which gets called at the end of every txn in the current session(by then all the
subtxns also would have been finished). Note that if there are too many invalidated entries, then one of the following
txnhas to bear running this extra code, but that's okay than having leaked connections. Thoughts? If okay, I can code a
separatepatch.
 

Thanks for further analysis! Sounds good. Also +1 for making it as separate patch. Maybe only this patch needs to be
back-patched.


> static void
> pgfdw_xact_callback(XactEvent event, void *arg)
> {
>      HASH_SEQ_STATUS scan;
>      ConnCacheEntry *entry;
> *     /* HERE WE CAN LOOK FOR ALL INVALIDATED ENTRIES AND DISCONNECT THEM*/*

This may cause the connection to be closed before sending COMMIT TRANSACTION command to the foreign server, i.e., the
connectionis closed in the middle of the transaction. So as you explained before, we should avoid that? If this my
understandingis right, probably the connection should be closed after COMMIT TRANSACTION command is sent to the foreign
server.What about changing the following code in pgfdw_xact_callback() so that it closes the connection even when it's
markedas invalidated?
 

        if (PQstatus(entry->conn) != CONNECTION_OK ||
            PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
            entry->changing_xact_state)
        {
            elog(DEBUG3, "discarding connection %p", entry->conn);
            disconnect_pg_server(entry);
        }


>      /* Quick exit if no connections were touched in this transaction. */
>      if (!xact_got_connection)
>          return;
> 
> And we can also extend postgres_fdw_disconnect() something like.
> 
> postgres_fdw_disconnect(bool invalid_only) --> default for invalid_only false. disconnects all connections. when
invalid_onlyis set to true then disconnects only invalid connections.
 
> postgres_fdw_disconnect('server_name') --> disconnections connections associated with the specified foreign server
> 
> Having said this, I'm not in favour of invalid_only flag, because if we choose to change the code in
pgfdw_xact_callbackto solve connection leak problem, we may not need this invalid_only flag at all, because at the end
oftxn (even for the txns in which the queries fail with error, pgfdw_xact_callback gets called), all the existing
invalidconnections get disconnected. Thoughts?
 

+1 not to have invalid_only flag. On the other hand, I think that postgres_fdw_get_connections() should return all the
cachedconnections including invalidated ones. Otherwise, the number of connections observed via
postgres_fdw_get_connections()may be different from the number of connections actually established, and which would be
confusingto users. BTW, even after fixing the connection-leak issue, postgres_fdw_get_connections() may see invalidated
cachedconnections when it's called during the transaction.
 


> 
> [1]
> postgres=# drop server loopback1 ;
> ERROR:  cannot drop server loopback1 because other objects depend on it
> DETAIL:  user mapping for bharath on server loopback1 depends on server loopback1
> foreign table f1 depends on server loopback1
> HINT:  Use DROP ... CASCADE to drop the dependent objects too.
> 
> postgres=# drop server loopback1 CASCADE ;
> NOTICE:  drop cascades to 2 other objects
> DETAIL:  drop cascades to user mapping for bharath on server loopback1
> drop cascades to foreign table f1
> DROP SERVER
> 
>  > >      if (entry->conn != NULL && entry->invalidated && entry->xact_depth == 0)
>  > >      {
>  > >          elog(DEBUG3, "closing connection %p for option changes to take effect",
>  > >               entry->conn);
>  > >          disconnect_pg_server(entry);
>  > >      }
>  > >
>  > >> If so, this seems like a connection-leak bug, at least for me.... Thought?
>  > >>
>  > >
>  > > It's not a leak. The comment before pgfdw_inval_callback() [1]
>  > > explains why we can not immediately close/disconnect the connections
>  > > in pgfdw_inval_callback() after marking them as invalidated.
>  >
>  > *If* invalidated connection cannot be close immediately even in the case of drop of server or user mapping, we can
deferit to the subsequent call to GetConnection(). That is, GetConnection() closes not only the target invalidated
connectionbut also the other all invalidated connections. Of course, invalidated connections will remain until
subsequentGetConnection() is called, though.
 
>  >
> 
> I think my detailed response to the above comment clarifies this.
> 
>  > > Here is the scenario how in the midst of a txn we get invalidation
>  > > messages(AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
>  > > happens):
>  > >
>  > > 1) select from foreign table with server1, usermapping1 in session1
>  > > 2) begin a top txn in session1, run a few foreign queries that open up
>  > > sub txns internally. meanwhile alter/drop server1/usermapping1 in
>  > > session2, then at each start of sub txn also we get to process the
>  > > invalidation messages via
>  > > AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback().
>  > > So, if we disconnect right after marking invalidated in
>  > > pgfdw_inval_callback, that's a problem since we are in a sub txn under
>  > > a top txn.
>  >
>  > Maybe. But what is the actual problem here?
>  >
>  > OTOH, if cached connection should not be close in the middle of transaction, postgres_fdw_disconnect() also should
bedisallowed to be executed during transaction?
 
> 
> +1. Yeah that makes sense. We can avoid closing the connection if (entry->xact_depth > 0). I will modify it in
disconnect_cached_connections().
> 
>  > >> Could you tell me why ConnectionHash needs to be destroyed?
>  > >
>  > > Say, in a session there are hundreds of different foreign server
>  > > connections made and if users want to disconnect all of them with the
>  > > new function and don't want any further foreign connections in that
>  > > session, they can do it. But then why keep the cache just lying around
>  > > and holding those many entries? Instead we can destroy the cache and
>  > > if necessary it will be allocated later on next foreign server
>  > > connections.
>  > >
>  > > IMHO, it is better to destroy the cache in case of disconnect all,
>  > > hoping to save memory, thinking that (next time if required) the cache
>  > > allocation doesn't take much time. Thoughts?
>  >
>  > Ok, but why is ConnectionHash destroyed only when "all" is true? Even when "all" is false, for example, the
followingquery can disconnect all the cached connections. Even in this case, i.e., whenever there are no cached
connections,ConnectionHash should be destroyed?
 
>  >
>  >      SELECT postgres_fdw_disconnect(srvname) FROM pg_foreign_server ;
> 
> +1. I can check all the cache entries to see if there are any active connections, in the same loop where I try to
findthe cache entry for the given foreign server,  if none exists, then I destroy the cache. Thoughts?
 

+1

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Mon, Dec 14, 2020 at 9:38 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2020/12/12 15:05, Bharath Rupireddy wrote:
> > On Sat, Dec 12, 2020 at 12:19 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
> >  > I was thinking that in the case of drop of user mapping or server, hash_search(ConnnectionHash) in GetConnection() cannot find the cached connection entry invalidated by that drop. Because "user->umid" used as hash key is changed. So I was thinking that that invalidated connection will not be closed nor reconnected.
> >  >
> >
> > You are right in saying that the connection leaks.
> >
> > Use case 1:
> > 1) Run foreign query in session1 with server1, user mapping1
> > 2) Drop user mapping1 in another session2, invalidation message gets logged which will have to be processed by other sessions
> > 3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via pgfdw_inval_callback(). Whatever may be the type of foreign query (select, update, explain, delete, insert, analyze etc.), upon next call to GetUserMapping() from postgres_fdw.c, the cache lookup fails(with ERROR:  user mapping not found for "XXXX") since the user mapping1 has been dropped in session2 and the query will also fail before reaching GetConnection() where the connections associated with invalidated entries would have got disconnected.
> >
> > So, the connection associated with invalidated entry would remain until the local session exits which is a problem to solve.
> >
> > Use case 2:
> > 1) Run foreign query in session1 with server1, user mapping1
> > 2) Try to drop foreign server1, then we would not be allowed to do so because of dependency. If we use CASCADE, then the dependent user mapping1 and foreign tables get dropped too [1].
> > 3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via pgfdw_inval_callback(), it fails because there is no foreign table and user mapping1.
> >
> > But, note that the connection remains open in session1, which is again a problem to solve.
> >
> > To solve the above connection leak problem, it looks like the right place to close all the invalid connections is pgfdw_xact_callback(), once registered, which gets called at the end of every txn in the current session(by then all the sub txns also would have been finished). Note that if there are too many invalidated entries, then one of the following txn has to bear running this extra code, but that's okay than having leaked connections. Thoughts? If okay, I can code a separate patch.
>
> Thanks for further analysis! Sounds good. Also +1 for making it as separate patch. Maybe only this patch needs to be back-patched.

Thanks. Yeah once agreed on the fix, +1 to back patch. Shall I start a separate thread for connection leak issue and patch, so that others might have different thoughts??

> > static void
> > pgfdw_xact_callback(XactEvent event, void *arg)
> > {
> >      HASH_SEQ_STATUS scan;
> >      ConnCacheEntry *entry;
> > *     /* HERE WE CAN LOOK FOR ALL INVALIDATED ENTRIES AND DISCONNECT THEM*/*
>
> This may cause the connection to be closed before sending COMMIT TRANSACTION command to the foreign server, i.e., the connection is closed in the middle of the transaction. So as you explained before, we should avoid that? If this my understanding is right, probably the connection should be closed after COMMIT TRANSACTION command is sent to the foreign server. What about changing the following code in pgfdw_xact_callback() so that it closes the connection even when it's marked as invalidated?

You are right! I'm posting what I have in my mind for fixing this connection leak problem.

/* tracks whether any work is needed in callback functions */
static bool xact_got_connection = false;
/* tracks whether there exists at least one invalid connection in the connection cache */
static bool invalid_connections_exist = false;

static void
pgfdw_inval_callback(Datum arg, int cacheid, uint32 hashvalue)
{

        /* hashvalue == 0 means a cache reset, must clear all state */
        if (hashvalue == 0 ||
            (cacheid == FOREIGNSERVEROID &&
             entry->server_hashvalue == hashvalue) ||
            (cacheid == USERMAPPINGOID &&
             entry->mapping_hashvalue == hashvalue))
        {
            entry->invalidated = true;
            invalid_connections_exist = true;
    }

static void
pgfdw_xact_callback(XactEvent event, void *arg)
{
    HASH_SEQ_STATUS scan;
    ConnCacheEntry *entry;

    /* Quick exit if no connections were touched in this transaction or there are no invalid connections in the cache. */
    if (!xact_got_connection && !invalid_connections_exist)
        return;

        /*
         * If the connection isn't in a good idle state, discard it to
         * recover. Next GetConnection will open a new connection.
         */
        if (PQstatus(entry->conn) != CONNECTION_OK ||
            PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
            entry->changing_xact_state ||
            entry->invalidated)
        {
            elog(DEBUG3, "discarding connection %p", entry->conn);
            disconnect_pg_server(entry);
        }

/*
* Regardless of the event type, we can now mark ourselves as out of the
* transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
* this saves a useless scan of the hashtable during COMMIT or PREPARE.)
*/
xact_got_connection = false;

/* We are done with closing all the invalidated connections so reset. */
invalid_connections_exist = false;
}

> > And we can also extend postgres_fdw_disconnect() something like.
> >
> > postgres_fdw_disconnect(bool invalid_only) --> default for invalid_only false. disconnects all connections. when invalid_only is set to true then disconnects only invalid connections.
> > postgres_fdw_disconnect('server_name') --> disconnections connections associated with the specified foreign server
> >
> > Having said this, I'm not in favour of invalid_only flag, because if we choose to change the code in pgfdw_xact_callback to solve connection leak problem, we may not need this invalid_only flag at all, because at the end of txn (even for the txns in which the queries fail with error, pgfdw_xact_callback gets called), all the existing invalid connections get disconnected. Thoughts?
>
> +1 not to have invalid_only flag. On the other hand, I think that postgres_fdw_get_connections() should return all the cached connections including invalidated ones. Otherwise, the number of connections observed via postgres_fdw_get_connections() may be different from the number of connections actually established, and which would be confusing to users.
>

If postgres_fdw_get_connections() has to return invalidated connections, I have few things mentioned in [1] to be clarified. Thoughts? Please have a look at the below comment before we decide to show up the invalid entries or not.


> BTW, even after fixing the connection-leak issue, postgres_fdw_get_connections() may see invalidated cached connections when it's called during the transaction.

We will not output if the invalidated entry has no active connection[2], so if we fix the connection leak issue with the above discussed fix i.e closing all the invalidated connections at the end of next xact, there are less chances that we will output invalidated entries in the postgres_fdw_get_connections() output. Only case we may show up invalidated connections(which have active connections entry->conn) in the postgres_fdw_get_connections() output is as follows:

1) say we have few cached active connections exists in session 1
2) drop the user mapping (in another session) associated with any of the cached connections to make that entry invalid
3) run select * from postgres_fdw_get_connections(); in session 1.  At the start of the xact, the invalidation message gets processed and the corresponding entry gets marked as invalid. If we allow invalid connections (that have entry->conn) to show up in the output, then we show them in the result of the query. At the end of xact, we close these invalid connections, in this case, user might think that he still have invalid connections active.

If the query ran in 3) is not postgres_fdw_get_connections() and something else, then postgres_fdw_get_connections() will never get to show invalid connections as they would have closed the connections.

IMO, better not choose the invalid connections to show up in the postgres_fdw_get_connections() output, if we fix the connection leak issue with the above discussed fix i.e closing all the invalidated connections at the end of next xact

[2]
+Datum
+postgres_fdw_get_connections(PG_FUNCTION_ARGS)
+{
+ ArrayBuildState *astate = NULL;
+
+ if (ConnectionHash)
+ {
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry;
+
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ Form_pg_user_mapping umap;
+ HeapTuple umaptup;
+ Form_pg_foreign_server fsrv;
+ HeapTuple fsrvtup;
+
+ /* We only look for active and open remote connections. */
+ if (!entry->conn)
+ continue;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


On 2020/12/14 14:36, Bharath Rupireddy wrote:
> On Mon, Dec 14, 2020 at 9:38 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>
wrote:
>  > On 2020/12/12 15:05, Bharath Rupireddy wrote:
>  > > On Sat, Dec 12, 2020 at 12:19 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>
<mailto:masao.fujii@oss.nttdata.com<mailto:masao.fujii@oss.nttdata.com>>> wrote:
 
>  > >  > I was thinking that in the case of drop of user mapping or server, hash_search(ConnnectionHash) in
GetConnection()cannot find the cached connection entry invalidated by that drop. Because "user->umid" used as hash key
ischanged. So I was thinking that that invalidated connection will not be closed nor reconnected.
 
>  > >  >
>  > >
>  > > You are right in saying that the connection leaks.
>  > >
>  > > Use case 1:
>  > > 1) Run foreign query in session1 with server1, user mapping1
>  > > 2) Drop user mapping1 in another session2, invalidation message gets logged which will have to be processed by
othersessions
 
>  > > 3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via
pgfdw_inval_callback().Whatever may be the type of foreign query (select, update, explain, delete, insert, analyze
etc.),upon next call to GetUserMapping() from postgres_fdw.c, the cache lookup fails(with ERROR:  user mapping not
foundfor "XXXX") since the user mapping1 has been dropped in session2 and the query will also fail before reaching
GetConnection()where the connections associated with invalidated entries would have got disconnected.
 
>  > >
>  > > So, the connection associated with invalidated entry would remain until the local session exits which is a
problemto solve.
 
>  > >
>  > > Use case 2:
>  > > 1) Run foreign query in session1 with server1, user mapping1
>  > > 2) Try to drop foreign server1, then we would not be allowed to do so because of dependency. If we use CASCADE,
thenthe dependent user mapping1 and foreign tables get dropped too [1].
 
>  > > 3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via
pgfdw_inval_callback(),it fails because there is no foreign table and user mapping1.
 
>  > >
>  > > But, note that the connection remains open in session1, which is again a problem to solve.
>  > >
>  > > To solve the above connection leak problem, it looks like the right place to close all the invalid connections
ispgfdw_xact_callback(), once registered, which gets called at the end of every txn in the current session(by then all
thesub txns also would have been finished). Note that if there are too many invalidated entries, then one of the
followingtxn has to bear running this extra code, but that's okay than having leaked connections. Thoughts? If okay, I
cancode a separate patch.
 
>  >
>  > Thanks for further analysis! Sounds good. Also +1 for making it as separate patch. Maybe only this patch needs to
beback-patched.
 
> 
> Thanks. Yeah once agreed on the fix, +1 to back patch. Shall I start a separate thread for connection leak issue and
patch,so that others might have different thoughts??
 

Yes, of course!


> 
>  > > static void
>  > > pgfdw_xact_callback(XactEvent event, void *arg)
>  > > {
>  > >      HASH_SEQ_STATUS scan;
>  > >      ConnCacheEntry *entry;
>  > > *     /* HERE WE CAN LOOK FOR ALL INVALIDATED ENTRIES AND DISCONNECT THEM*/*
>  >
>  > This may cause the connection to be closed before sending COMMIT TRANSACTION command to the foreign server, i.e.,
theconnection is closed in the middle of the transaction. So as you explained before, we should avoid that? If this my
understandingis right, probably the connection should be closed after COMMIT TRANSACTION command is sent to the foreign
server.What about changing the following code in pgfdw_xact_callback() so that it closes the connection even when it's
markedas invalidated?
 
> 
> You are right! I'm posting what I have in my mind for fixing this connection leak problem.
> 
> /* tracks whether any work is needed in callback functions */
> static bool xact_got_connection = false;
> /* tracks whether there exists at least one invalid connection in the connection cache */
> *static bool invalid_connections_exist = false;*
> 
> static void
> pgfdw_inval_callback(Datum arg, int cacheid, uint32 hashvalue)
> {
> 
>          /* hashvalue == 0 means a cache reset, must clear all state */
>          if (hashvalue == 0 ||
>              (cacheid == FOREIGNSERVEROID &&
>               entry->server_hashvalue == hashvalue) ||
>              (cacheid == USERMAPPINGOID &&
>               entry->mapping_hashvalue == hashvalue))
>          {
>              entry->invalidated = true;
> *            invalid_connections_exist = true;*
>      }
> 
> static void
> pgfdw_xact_callback(XactEvent event, void *arg)
> {
>      HASH_SEQ_STATUS scan;
>      ConnCacheEntry *entry;
> 
>      /* Quick exit if no connections were touched in this transaction or there are no invalid connections in the
cache.*/
 
>      if (!xact_got_connection *&& !invalid_connections_exist)*
>          return;
> 
>          /*
>           * If the connection isn't in a good idle state, discard it to
>           * recover. Next GetConnection will open a new connection.
>           */
>          if (PQstatus(entry->conn) != CONNECTION_OK ||
>              PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
>              entry->changing_xact_state ||
> *            entry->invalidated)*
>          {
>              elog(DEBUG3, "discarding connection %p", entry->conn);
>              disconnect_pg_server(entry);
>          }
> 
> /*
> * Regardless of the event type, we can now mark ourselves as out of the
> * transaction.  (Note: if we are here during PRE_COMMIT or PRE_PREPARE,
> * this saves a useless scan of the hashtable during COMMIT or PREPARE.)
> */
> xact_got_connection = false;
> 
> /* We are done with closing all the invalidated connections so reset. */
> *invalid_connections_exist = false;*
> }
> 
>  > > And we can also extend postgres_fdw_disconnect() something like.
>  > >
>  > > postgres_fdw_disconnect(bool invalid_only) --> default for invalid_only false. disconnects all connections. when
invalid_onlyis set to true then disconnects only invalid connections.
 
>  > > postgres_fdw_disconnect('server_name') --> disconnections connections associated with the specified foreign
server
>  > >
>  > > Having said this, I'm not in favour of invalid_only flag, because if we choose to change the code in
pgfdw_xact_callbackto solve connection leak problem, we may not need this invalid_only flag at all, because at the end
oftxn (even for the txns in which the queries fail with error, pgfdw_xact_callback gets called), all the existing
invalidconnections get disconnected. Thoughts?
 
>  >
>  > +1 not to have invalid_only flag. On the other hand, I think that postgres_fdw_get_connections() should return all
thecached connections including invalidated ones. Otherwise, the number of connections observed via
postgres_fdw_get_connections()may be different from the number of connections actually established, and which would be
confusingto users.
 
>  >
> 
> If postgres_fdw_get_connections() has to return invalidated connections, I have few things mentioned in [1] to be
clarified.Thoughts? Please have a look at the below comment before we decide to show up the invalid entries or not.
 
> 
> [1] - https://www.postgresql.org/message-id/CALj2ACUv%3DArQXs0U9PM3YXKCeSzJ1KxRokDY0g_0aGy--kDScA%40mail.gmail.com
<https://www.postgresql.org/message-id/CALj2ACUv%3DArQXs0U9PM3YXKCeSzJ1KxRokDY0g_0aGy--kDScA%40mail.gmail.com>

I was thinking to display the records having the columns for server name and boolean flag indicating whether it's
invalidatedor not. But I'm not sure if this is the best design for now. Probably we should revisit this after
determininghow to fix the connection-leak issue.
 


> 
>  > BTW, even after fixing the connection-leak issue, postgres_fdw_get_connections() may see invalidated cached
connectionswhen it's called during the transaction.
 
> 
> We will not output if the invalidated entry has no active connection[2], so if we fix the connection leak issue with
theabove discussed fix i.e closing all the invalidated connections at the end of next xact, there are less chances that
wewill output invalidated entries in the postgres_fdw_get_connections() output. Only case we may show up invalidated
connections(whichhave active connections entry->conn) in the postgres_fdw_get_connections() output is as follows:
 
> 
> 1) say we have few cached active connections exists in session 1
> 2) drop the user mapping (in another session) associated with any of the cached connections to make that entry
invalid
> 3) run select * from postgres_fdw_get_connections(); in session 1.  At the start of the xact, the invalidation
messagegets processed and the corresponding entry gets marked as invalid. If we allow invalid connections (that have
entry->conn)to show up in the output, then we show them in the result of the query. At the end of xact, we close these
invalidconnections, in this case, user might think that he still have invalid connections active.
 

What about the case where the transaction started at the above 1) at session 1, and postgres_fdw_get_connections() in
theabove 3) is called within that transaction at session 1? In this case, postgres_fdw_get_connections() can return
eveninvalidated connections?
 


> 
> If the query ran in 3) is not postgres_fdw_get_connections() and something else, then postgres_fdw_get_connections()
willnever get to show invalid connections as they would have closed the connections.
 
> 
> IMO, better not choose the invalid connections to show up in the postgres_fdw_get_connections() output, if we fix the
connectionleak issue with the above discussed fix i.e closing all the invalidated connections at the end of next xact
 
> 
> [2]
> +Datum
> +postgres_fdw_get_connections(PG_FUNCTION_ARGS)
> +{
> + ArrayBuildState *astate = NULL;
> +
> + if (ConnectionHash)
> + {
> + HASH_SEQ_STATUS scan;
> + ConnCacheEntry *entry;
> +
> + hash_seq_init(&scan, ConnectionHash);
> + while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
> + {
> + Form_pg_user_mapping umap;
> + HeapTuple umaptup;
> + Form_pg_foreign_server fsrv;
> + HeapTuple fsrvtup;
> +
> + /* We only look for active and open remote connections. */
> + if (!entry->conn)
> + continue;
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
> 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Mon, Dec 14, 2020 at 8:03 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > We will not output if the invalidated entry has no active connection[2], so if we fix the connection leak issue
withthe above discussed fix i.e closing all the invalidated connections at the end of next xact, there are less chances
thatwe will output invalidated entries in the postgres_fdw_get_connections() output. Only case we may show up
invalidatedconnections(which have active connections entry->conn) in the postgres_fdw_get_connections() output is as
follows:
> >
> > 1) say we have few cached active connections exists in session 1
> > 2) drop the user mapping (in another session) associated with any of the cached connections to make that entry
invalid
> > 3) run select * from postgres_fdw_get_connections(); in session 1.  At the start of the xact, the invalidation
messagegets processed and the corresponding entry gets marked as invalid. If we allow invalid connections (that have
entry->conn)to show up in the output, then we show them in the result of the query. At the end of xact, we close these
invalidconnections, in this case, user might think that he still have invalid connections active. 
>
> What about the case where the transaction started at the above 1) at session 1, and postgres_fdw_get_connections() in
theabove 3) is called within that transaction at session 1? In this case, postgres_fdw_get_connections() can return
eveninvalidated connections? 

In that case, since the user mapping would have been dropped in
another session and we are in the middle of a txn in session 1, the
entries would not get marked as invalid until the invalidation message
gets processed by the session 1 which may happen if the session 1
opens a sub txn, if not then for postgres_fdw_get_connections() the
entries will still be active as they would not have been marked as
invalid yet and postgres_fdw_get_connections() would return them in
the output.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Mon, Dec 14, 2020, 9:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Dec 14, 2020 at 8:03 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > > We will not output if the invalidated entry has no active connection[2], so if we fix the connection leak issue
withthe above discussed fix i.e closing all the invalidated connections at the end of next xact, there are less chances
thatwe will output invalidated entries in the postgres_fdw_get_connections() output. Only case we may show up
invalidatedconnections(which have active connections entry->conn) in the postgres_fdw_get_connections() output is as
follows:
> > >
> > > 1) say we have few cached active connections exists in session 1
> > > 2) drop the user mapping (in another session) associated with any of the cached connections to make that entry
invalid
> > > 3) run select * from postgres_fdw_get_connections(); in session 1.  At the start of the xact, the invalidation
messagegets processed and the corresponding entry gets marked as invalid. If we allow invalid connections (that have
entry->conn)to show up in the output, then we show them in the result of the query. At the end of xact, we close these
invalidconnections, in this case, user might think that he still have invalid connections active. 
> >
> > What about the case where the transaction started at the above 1) at session 1, and postgres_fdw_get_connections()
inthe above 3) is called within that transaction at session 1? In this case, postgres_fdw_get_connections() can return
eveninvalidated connections? 
>
> In that case, since the user mapping would have been dropped in
> another session and we are in the middle of a txn in session 1, the
> entries would not get marked as invalid until the invalidation message
> gets processed by the session 1 which may happen if the session 1
> opens a sub txn, if not then for postgres_fdw_get_connections() the
> entries will still be active as they would not have been marked as
> invalid yet and postgres_fdw_get_connections() would return them in
> the output.

One more point for the above scenario: if the user mapping is dropped
in another session, then cache lookup for that entry in the
postgres_fdw_get_connections() returns a null tuple which I plan to
not throw an error, but just to skip in that case and continue. But if
the user mapping is not dropped in another session but altered, then
postgres_fdw_get_connections() still can show that in the output.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2020/12/15 1:40, Bharath Rupireddy wrote:
> On Mon, Dec 14, 2020, 9:47 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> On Mon, Dec 14, 2020 at 8:03 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>> We will not output if the invalidated entry has no active connection[2], so if we fix the connection leak issue
withthe above discussed fix i.e closing all the invalidated connections at the end of next xact, there are less chances
thatwe will output invalidated entries in the postgres_fdw_get_connections() output. Only case we may show up
invalidatedconnections(which have active connections entry->conn) in the postgres_fdw_get_connections() output is as
follows:
>>>>
>>>> 1) say we have few cached active connections exists in session 1
>>>> 2) drop the user mapping (in another session) associated with any of the cached connections to make that entry
invalid
>>>> 3) run select * from postgres_fdw_get_connections(); in session 1.  At the start of the xact, the invalidation
messagegets processed and the corresponding entry gets marked as invalid. If we allow invalid connections (that have
entry->conn)to show up in the output, then we show them in the result of the query. At the end of xact, we close these
invalidconnections, in this case, user might think that he still have invalid connections active.
 
>>>
>>> What about the case where the transaction started at the above 1) at session 1, and postgres_fdw_get_connections()
inthe above 3) is called within that transaction at session 1? In this case, postgres_fdw_get_connections() can return
eveninvalidated connections?
 
>>
>> In that case, since the user mapping would have been dropped in
>> another session and we are in the middle of a txn in session 1, the
>> entries would not get marked as invalid until the invalidation message
>> gets processed by the session 1 which may happen if the session 1

Yes, and this can happen by other commands, for example, CREATE TABLE.


>> opens a sub txn, if not then for postgres_fdw_get_connections() the
>> entries will still be active as they would not have been marked as
>> invalid yet and postgres_fdw_get_connections() would return them in
>> the output.
> 
> One more point for the above scenario: if the user mapping is dropped
> in another session, then cache lookup for that entry in the
> postgres_fdw_get_connections() returns a null tuple which I plan to
> not throw an error, but just to skip in that case and continue. But if
> the user mapping is not dropped in another session but altered, then
> postgres_fdw_get_connections() still can show that in the output.

Yes, so *if* we really want to return even connection invalidated by drop of
user mapping, the cached connection entry may need to store not only
user mapping id but also server id so that we can get the server name without
user mapping entry.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Mon, Dec 14, 2020 at 11:00 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> > One more point for the above scenario: if the user mapping is dropped
> > in another session, then cache lookup for that entry in the
> > postgres_fdw_get_connections() returns a null tuple which I plan to
> > not throw an error, but just to skip in that case and continue. But if
> > the user mapping is not dropped in another session but altered, then
> > postgres_fdw_get_connections() still can show that in the output.
>
> Yes, so *if* we really want to return even connection invalidated by drop of
> user mapping, the cached connection entry may need to store not only
> user mapping id but also server id so that we can get the server name without
> user mapping entry.

We can do that, but what happens if the foreign server itself get
dropped with cascade option in another session, use case is as
follows:

1) Run a foreign query in session 1 with server 1, user mapping 1
2) Try to drop foreign server 1, then we would not be allowed to do so
because of dependency, if we use CASCADE, then the dependent user
mapping 1 and foreign tables get dropped too.
3) Run the postgres_fdw_get_connections(), at the start of txn, the
cached entry gets invalidated via pgfdw_inval_callback() and we try to
use the stored server id of the invalid entry (for which the foreign
server would have been dropped) and lookup in sys catalogues, so again
a null tuple is returned.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Mon, Dec 14, 2020 at 8:03 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2020/12/14 14:36, Bharath Rupireddy wrote:
> > On Mon, Dec 14, 2020 at 9:38 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>
wrote:
> >  > On 2020/12/12 15:05, Bharath Rupireddy wrote:
> >  > > On Sat, Dec 12, 2020 at 12:19 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>
<mailto:masao.fujii@oss.nttdata.com<mailto:masao.fujii@oss.nttdata.com>>> wrote: 
> >  > >  > I was thinking that in the case of drop of user mapping or server, hash_search(ConnnectionHash) in
GetConnection()cannot find the cached connection entry invalidated by that drop. Because "user->umid" used as hash key
ischanged. So I was thinking that that invalidated connection will not be closed nor reconnected. 
> >  > >  >
> >  > >
> >  > > You are right in saying that the connection leaks.
> >  > >
> >  > > Use case 1:
> >  > > 1) Run foreign query in session1 with server1, user mapping1
> >  > > 2) Drop user mapping1 in another session2, invalidation message gets logged which will have to be processed by
othersessions 
> >  > > 3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via
pgfdw_inval_callback().Whatever may be the type of foreign query (select, update, explain, delete, insert, analyze
etc.),upon next call to GetUserMapping() from postgres_fdw.c, the cache lookup fails(with ERROR:  user mapping not
foundfor "XXXX") since the user mapping1 has been dropped in session2 and the query will also fail before reaching
GetConnection()where the connections associated with invalidated entries would have got disconnected. 
> >  > >
> >  > > So, the connection associated with invalidated entry would remain until the local session exits which is a
problemto solve. 
> >  > >
> >  > > Use case 2:
> >  > > 1) Run foreign query in session1 with server1, user mapping1
> >  > > 2) Try to drop foreign server1, then we would not be allowed to do so because of dependency. If we use
CASCADE,then the dependent user mapping1 and foreign tables get dropped too [1]. 
> >  > > 3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via
pgfdw_inval_callback(),it fails because there is no foreign table and user mapping1. 
> >  > >
> >  > > But, note that the connection remains open in session1, which is again a problem to solve.
> >  > >
> >  > > To solve the above connection leak problem, it looks like the right place to close all the invalid connections
ispgfdw_xact_callback(), once registered, which gets called at the end of every txn in the current session(by then all
thesub txns also would have been finished). Note that if there are too many invalidated entries, then one of the
followingtxn has to bear running this extra code, but that's okay than having leaked connections. Thoughts? If okay, I
cancode a separate patch. 
> >  >
> >  > Thanks for further analysis! Sounds good. Also +1 for making it as separate patch. Maybe only this patch needs
tobe back-patched. 
> >
> > Thanks. Yeah once agreed on the fix, +1 to back patch. Shall I start a separate thread for connection leak issue
andpatch, so that others might have different thoughts?? 
>
> Yes, of course!

Thanks. I posted the patch in a separate thread[1] for fixing the
connection leak problem.

[1] - https://www.postgresql.org/message-id/flat/CALj2ACVNcGH_6qLY-4_tXz8JLvA%2B4yeBThRfxMz7Oxbk1aHcpQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Mon, Dec 14, 2020 at 11:13 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> On Mon, Dec 14, 2020 at 11:00 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
> > > One more point for the above scenario: if the user mapping is dropped
> > > in another session, then cache lookup for that entry in the
> > > postgres_fdw_get_connections() returns a null tuple which I plan to
> > > not throw an error, but just to skip in that case and continue. But if
> > > the user mapping is not dropped in another session but altered, then
> > > postgres_fdw_get_connections() still can show that in the output.
> >
> > Yes, so *if* we really want to return even connection invalidated by drop of
> > user mapping, the cached connection entry may need to store not only
> > user mapping id but also server id so that we can get the server name without
> > user mapping entry.
>
> We can do that, but what happens if the foreign server itself get
> dropped with cascade option in another session, use case is as
> follows:
>
> 1) Run a foreign query in session 1 with server 1, user mapping 1
> 2) Try to drop foreign server 1, then we would not be allowed to do so
> because of dependency, if we use CASCADE, then the dependent user
> mapping 1 and foreign tables get dropped too.
> 3) Run the postgres_fdw_get_connections(), at the start of txn, the
> cached entry gets invalidated via pgfdw_inval_callback() and we try to
> use the stored server id of the invalid entry (for which the foreign
> server would have been dropped) and lookup in sys catalogues, so again
> a null tuple is returned.

Hi,

Any further thoughts on this would be really helpful.

Discussion here is on the point - whether to show up the invalidated
connections in the output of the new postgres_fdw_get_connections()
function? If we were to show, then because of the solution we proposed
for the connection leak problem in [1], will the invalidated entries
be shown every time?

[1] -
https://www.postgresql.org/message-id/flat/CALj2ACVNcGH_6qLY-4_tXz8JLvA%2B4yeBThRfxMz7Oxbk1aHcpQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Hi

> Discussion here is on the point - whether to show up the invalidated
> connections in the output of the new postgres_fdw_get_connections()
> function? If we were to show, then because of the solution we proposed for
> the connection leak problem in [1], will the invalidated entries be shown
> every time?

IMO, we introduced the function postgres_fdw_get_connections to decide 
whether there are too many connections exists and we should disconnect them.

If User decide to disconnect, we have two cases:
1. user decide to disconnect one of them, 
   I think it’s ok for user to disconnect invalidated connection, so we'd better list the invalidated connections.

2. User decide to disconnect all of them. In this case, 
   It seems postgres_fdw_disconnect will disconnect both invalidated and not connections,
   And we should let user realize what connections they are disconnecting, so we should list the invalidated
connections.

Based on the above two cases, Personlly, I think we can list the invalidated connections.

-----
I took a look into the patch, and have a little issue:

+bool disconnect_cached_connections(uint32 hashvalue, bool all)
+    if (all)
+    {
+        hash_destroy(ConnectionHash);
+        ConnectionHash = NULL;
+        result = true;
+    }

If disconnect_cached_connections is called to disconnect all the connections, 
should we reset the 'xact_got_connection' flag ?


> [1] -
> https://www.postgresql.org/message-id/flat/CALj2ACVNcGH_6qLY-4_tXz8JLv
> A%2B4yeBThRfxMz7Oxbk1aHcpQ%40mail.gmail.com

The patch about connection leak looks good to me.
And I have a same issue about the new 'have_invalid_connections' flag,
If we disconnect all the connections, should we reset the flag ?

Best regards,
houzj




On Thu, Dec 17, 2020 at 5:38 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
> > Discussion here is on the point - whether to show up the invalidated
> > connections in the output of the new postgres_fdw_get_connections()
> > function? If we were to show, then because of the solution we proposed for
> > the connection leak problem in [1], will the invalidated entries be shown
> > every time?
>
> IMO, we introduced the function postgres_fdw_get_connections to decide
> whether there are too many connections exists and we should disconnect them.
>
> If User decide to disconnect, we have two cases:
> 1. user decide to disconnect one of them,
>    I think it’s ok for user to disconnect invalidated connection, so we'd better list the invalidated connections.
>
> 2. User decide to disconnect all of them. In this case,
>    It seems postgres_fdw_disconnect will disconnect both invalidated and not connections,
>    And we should let user realize what connections they are disconnecting, so we should list the invalidated
connections.
>
> Based on the above two cases, Personlly, I think we can list the invalidated connections.

I will do that. So, the output will have a list of pairs like
(server_name, true/false), true/false is for valid/invalid connection.

> -----
> I took a look into the patch, and have a little issue:
>
> +bool disconnect_cached_connections(uint32 hashvalue, bool all)
> +       if (all)
> +       {
> +               hash_destroy(ConnectionHash);
> +               ConnectionHash = NULL;
> +               result = true;
> +       }
>
> If disconnect_cached_connections is called to disconnect all the connections,
> should we reset the 'xact_got_connection' flag ?

I think we must allow postgres_fdw_disconnect() to disconnect the
particular/all connections only when the corresponding entries have no
open txns or connections are not being used in that txn, otherwise
not. We may end up closing/disconnecting the connection that's still
being in use because entry->xact_dept can even go more than 1 for sub
txns. See use case [1].

+        if ((all || entry->server_hashvalue == hashvalue) &&
entry->xact_depth == 0 &&
+            entry->conn)
+        {
+            disconnect_pg_server(entry);
+            result = true;
+        }

Thoughts?

And to reset the 'xact_got_connection' flag: I think we should reset
it only when we close all the connections i.e. when all the
connections are at entry->xact_depth = 0, otherwise not. Same for
have_invalid_connections flag as well.

[1] -
BEGIN;
SELECT 1 FROM ft1 LIMIT 1;  --> server 1 entry->xact_depth is 1
SAVEPOINT s;
SELECT 1 FROM ft1 LIMIT 1; --> entry->xact_depth becomes 2
SELECT postgres_fdw_disconnect()/postgres_fdw_disconnect('server 1');
--> I think we should not close the connection as it's txn is still
open.
COMMIT;

> > [1] -
> > https://www.postgresql.org/message-id/flat/CALj2ACVNcGH_6qLY-4_tXz8JLv
> > A%2B4yeBThRfxMz7Oxbk1aHcpQ%40mail.gmail.com
>
> The patch about connection leak looks good to me.
> And I have a same issue about the new 'have_invalid_connections' flag,
> If we disconnect all the connections, should we reset the flag ?

Yes as mentioned in the above comment.

Thanks for reviewing the connection leak patch. It will be good if the
review comments for the connection leak flag is provided separately in
that thread. I added it to commitfest -
https://commitfest.postgresql.org/31/2882/.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On 2020-12-17 18:02, Bharath Rupireddy wrote:
> On Thu, Dec 17, 2020 at 5:38 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> 
> wrote:
>> I took a look into the patch, and have a little issue:
>> 
>> +bool disconnect_cached_connections(uint32 hashvalue, bool all)
>> +       if (all)
>> +       {
>> +               hash_destroy(ConnectionHash);
>> +               ConnectionHash = NULL;
>> +               result = true;
>> +       }
>> 
>> If disconnect_cached_connections is called to disconnect all the 
>> connections,
>> should we reset the 'xact_got_connection' flag ?
> 
> I think we must allow postgres_fdw_disconnect() to disconnect the
> particular/all connections only when the corresponding entries have no
> open txns or connections are not being used in that txn, otherwise
> not. We may end up closing/disconnecting the connection that's still
> being in use because entry->xact_dept can even go more than 1 for sub
> txns. See use case [1].
> 
> +        if ((all || entry->server_hashvalue == hashvalue) &&
> entry->xact_depth == 0 &&
> +            entry->conn)
> +        {
> +            disconnect_pg_server(entry);
> +            result = true;
> +        }
> 
> Thoughts?
> 

I think that you are right. Actually, I was thinking about much more 
simple solution to this problem --- just restrict 
postgres_fdw_disconnect() to run only *outside* of explicit transaction 
block. This should protect everyone from closing its underlying 
connections, but seems to be a bit more restrictive than you propose.

Just thought, that if we start closing fdw connections in the open xact 
block:

1) Close a couple of them.
2) Found one with xact_depth > 0 and error out.
3) End up in the mixed state: some of connections were closed, but some 
them not, and it cannot be rolled back with the xact.

In other words, I have some doubts about allowing to call a 
non-transactional by its nature function in the transaction block.


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



On Thu, Dec 17, 2020 at 10:32 PM Alexey Kondratov
<a.kondratov@postgrespro.ru> wrote:
> On 2020-12-17 18:02, Bharath Rupireddy wrote:
> > On Thu, Dec 17, 2020 at 5:38 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com>
> > wrote:
> >> I took a look into the patch, and have a little issue:
> >>
> >> +bool disconnect_cached_connections(uint32 hashvalue, bool all)
> >> +       if (all)
> >> +       {
> >> +               hash_destroy(ConnectionHash);
> >> +               ConnectionHash = NULL;
> >> +               result = true;
> >> +       }
> >>
> >> If disconnect_cached_connections is called to disconnect all the
> >> connections,
> >> should we reset the 'xact_got_connection' flag ?
> >
> > I think we must allow postgres_fdw_disconnect() to disconnect the
> > particular/all connections only when the corresponding entries have no
> > open txns or connections are not being used in that txn, otherwise
> > not. We may end up closing/disconnecting the connection that's still
> > being in use because entry->xact_dept can even go more than 1 for sub
> > txns. See use case [1].
> >
> > +        if ((all || entry->server_hashvalue == hashvalue) &&
> > entry->xact_depth == 0 &&
> > +            entry->conn)
> > +        {
> > +            disconnect_pg_server(entry);
> > +            result = true;
> > +        }
> >
> > Thoughts?
> >
>
> I think that you are right. Actually, I was thinking about much more
> simple solution to this problem --- just restrict
> postgres_fdw_disconnect() to run only *outside* of explicit transaction
> block. This should protect everyone from closing its underlying
> connections, but seems to be a bit more restrictive than you propose.

Agree that it's restrictive from a usability point of view. I think
having entry->xact_depth == 0 should be enough to protect from closing
any connections that are currently in use.

Say the user has called postgres_fdw_disconnect('myserver1'), if it's
currently in use in that xact, then we can return false or even go
further and issue a warning along with false. Also if
postgres_fdw_disconnect() is called for closing all connections and
any one of the connections are currently in use in the xact, then also
we can return: true and a warning if atleast one connection is closed
or false and a warning if all the connections are in use.

The warning message can be something like - for the first case -
"could not close the server connection as it is in use" and for the
second case - "could not close some of the connections as they are in
use".

Thoughts?

> Just thought, that if we start closing fdw connections in the open xact
> block:
>
> 1) Close a couple of them.
> 2) Found one with xact_depth > 0 and error out.
> 3) End up in the mixed state: some of connections were closed, but some
> them not, and it cannot be rolled back with the xact.

We don't error out, but we may issue a warning (if agreed on the above
reponse) and return false, but definitely not an error.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Hi,

I'm posting a v4-0001 patch for the new functions
postgres_fdw_get_connections() and postgres_fdw_disconnect().  In this
patch, I tried to address the review comments provided upthread.

At a high level, the changes include:
1) Storing the foreign server id in the cache entry which will help to
fetch the server name associated with it easily.
2) postgres_fdw_get_connections now returns an open connection server
name and true or false to indicate whether it's valid or not.
3) postgres_fdw_get_connections can issue a warning when the cache
look up for server name returns null i.e. the foreign server is
dropped. Please see the comments before postgres_fdw_get_connections
in which situations this is possible.
4) postgres_fdw_disconnect('myserver') disconnects the open connection
only when it's not being used in the current xact. If it's used, then
false is returned and a warning is issued.
5) postgres_fdw_disconnect() disconnects all the connections only when
they are not being used in the current xact. If at least one
connection that's being used exists, then it issues a warning and
returns true if at least one open connection gets closed otherwise
false. If there are no connections made yet or connection cache is
empty, then also false is returned.
6) postgres_fdw_disconnect can discard the entire cache if there is no
active connection.

Thoughts?

Below things are still pending which I plan to post new patches after
the v4-0001 is reviewed:
1) changing the version of postgres_fdw--1.0.sql to postgres_fdw--1.1.sql
2) 0002 and 0003 patches having keep_connections GUC and
keep_connection server level option.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment
On 2020-12-30 09:10, Bharath Rupireddy wrote:
> Hi,
> 
> I'm posting a v4-0001 patch for the new functions
> postgres_fdw_get_connections() and postgres_fdw_disconnect().  In this
> patch, I tried to address the review comments provided upthread.
> 
> Thoughts?
> 

I still have some doubts that it is worth of allowing to call 
postgres_fdw_disconnect() in the explicit transaction block, since it 
adds a lot of things to care and check for. But otherwise current logic 
looks solid.

+                 errdetail("Such connections get closed either in the next use or 
at the end of the current transaction.")
+                 : errdetail("Such connection gets closed either in the next use or 
at the end of the current transaction.")));

Does it really have a chance to get closed on the next use? If foreign 
server is dropped then user mapping should be dropped as well (either 
with CASCADE or manually), but we do need user mapping for a local cache 
lookup. That way, if I understand all the discussion up-thread 
correctly, we can only close such connections at the end of xact, do we?

+ * This function returns false if the cache doesn't exist.
+ * When the cache exists:

I think that this will be corrected later by pg_indent, but still. In 
this comment section following points 1) and 2) have a different 
combination of tabs/spaces.


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company
Attachment
On Wed, Dec 30, 2020 at 5:20 PM Alexey Kondratov
<a.kondratov@postgrespro.ru> wrote:
>
> On 2020-12-30 09:10, Bharath Rupireddy wrote:
> > Hi,
> >
> > I'm posting a v4-0001 patch for the new functions
> > postgres_fdw_get_connections() and postgres_fdw_disconnect().  In this
> > patch, I tried to address the review comments provided upthread.
> >
> > Thoughts?
> >
>
> I still have some doubts that it is worth of allowing to call
> postgres_fdw_disconnect() in the explicit transaction block, since it
> adds a lot of things to care and check for. But otherwise current logic
> looks solid.
>
> +                                errdetail("Such connections get closed either in the next use or
> at the end of the current transaction.")
> +                                : errdetail("Such connection gets closed either in the next use or
> at the end of the current transaction.")));
>
> Does it really have a chance to get closed on the next use? If foreign
> server is dropped then user mapping should be dropped as well (either
> with CASCADE or manually), but we do need user mapping for a local cache
> lookup. That way, if I understand all the discussion up-thread
> correctly, we can only close such connections at the end of xact, do we?

The next use of such a connection in the following query whose foreign
server would have been dropped fails because of the cascading that can
happen to drop the user mapping and the foreign table as well. During
the start of the next query such connection will be marked as
invalidated because xact_depth of that connection is > 1 and when the
fail happens, txn gets aborted due to which pgfdw_xact_callback gets
called and in that the connection gets closed. To make it more clear,
please have a look at the scenarios [1].

I still feel the detailed message "Such connections get closed either
in the next use or at the end of the current transaction" is
appropriate. Please have a closer look at the possible use cases [1].

And IMO, anyone dropping a foreign server inside an explicit txn block
in which the foreign server was used is extremely rare, so still
showing this message and allowing postgres_fdw_disconnect() in
explicit txn block is useful. For all other cases the
postgres_fdw_disconnect behaves as expected.

Thoughts?

[1]
case 1:
BEGIN;
SELECT 1 FROM f1 LIMIT 1;  --> xact_depth becomes 1
DROP SERVER loopback1 CASCADE; --> drop cascades to the user mapping
and the foreign table and the connection gets invalidated in
pgfdw_inval_callback because xact_depth is 1
SELECT 1 FROM f1 LIMIT 1; --> since the failure occurs for this query
and txn is aborted, the connection gets closed in pgfdw_xact_callback.
SELECT * FROM postgres_fdw_get_connections(); --> txn was aborted
SELECT * FROM postgres_fdw_disconnect(); --> txn was aborted
COMMIT;

case 2:
BEGIN;
SELECT 1 FROM f1 LIMIT 1;  --> xact_depth becomes 1
DROP SERVER loopback1 CASCADE; --> drop cascades to the user mapping
and the foreign table and the connection gets invalidated in
pgfdw_inval_callback because xact_depth is 1
SELECT * FROM postgres_fdw_get_connections(); --> shows the above
warning because foreign server name can not be fetched
SELECT * FROM postgres_fdw_disconnect(); --> the connection can not be
closed here as well because xact_depth is 1, then it issues a warning
"cannot close any connection because they are still in use"
COMMIT; --> finally the connection gets closed here in pgfdw_xact_callback.

case 3:
SELECT 1 FROM f1 LIMIT 1;
BEGIN;
DROP SERVER loopback1 CASCADE; --> drop cascades to the user mapping
and the foreign table and the connection gets closed in
pgfdw_inval_callback because xact_depth is 0
SELECT 1 FROM f1 LIMIT 1; --> since the failure occurs for this query
and the connection was closed previously then the txn gets aborted
SELECT * FROM postgres_fdw_get_connections(); --> txn was aborted
SELECT * FROM postgres_fdw_disconnect(); --> txn was aborted
COMMIT;

> + * This function returns false if the cache doesn't exist.
> + * When the cache exists:
>
> I think that this will be corrected later by pg_indent, but still. In
> this comment section following points 1) and 2) have a different
> combination of tabs/spaces.

I can change that in the next version.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On 2020-12-30 17:59, Bharath Rupireddy wrote:
> On Wed, Dec 30, 2020 at 5:20 PM Alexey Kondratov
> <a.kondratov@postgrespro.ru> wrote:
>> 
>> On 2020-12-30 09:10, Bharath Rupireddy wrote:
>> I still have some doubts that it is worth of allowing to call
>> postgres_fdw_disconnect() in the explicit transaction block, since it
>> adds a lot of things to care and check for. But otherwise current 
>> logic
>> looks solid.
>> 
>> +                                errdetail("Such connections get 
>> closed either in the next use or
>> at the end of the current transaction.")
>> +                                : errdetail("Such connection gets 
>> closed either in the next use or
>> at the end of the current transaction.")));
>> 
>> Does it really have a chance to get closed on the next use? If foreign
>> server is dropped then user mapping should be dropped as well (either
>> with CASCADE or manually), but we do need user mapping for a local 
>> cache
>> lookup. That way, if I understand all the discussion up-thread
>> correctly, we can only close such connections at the end of xact, do 
>> we?
> 
> The next use of such a connection in the following query whose foreign
> server would have been dropped fails because of the cascading that can
> happen to drop the user mapping and the foreign table as well. During
> the start of the next query such connection will be marked as
> invalidated because xact_depth of that connection is > 1 and when the
> fail happens, txn gets aborted due to which pgfdw_xact_callback gets
> called and in that the connection gets closed. To make it more clear,
> please have a look at the scenarios [1].
> 

In my understanding 'connection gets closed either in the next use' 
means that connection will be closed next time someone will try to use 
it, i.e. GetConnection() will be called and it closes this connection 
because of a bad state. However, if foreign server is dropped 
GetConnection() cannot lookup the connection because it needs a user 
mapping oid as a key.

I had a look on your scenarios. IIUC, under **next use** you mean a 
select attempt from a table belonging to the same foreign server, which 
leads to a transaction abort and connection gets closed in the xact 
callback. Sorry, maybe I am missing something, but this just confirms 
that such connections only get closed in the xact callback (taking into 
account your recently committed patch [1]), so 'next use' looks 
misleading.

[1] 
https://www.postgresql.org/message-id/8b2aa1aa-c638-12a8-cb56-ea0f0a5019cf%40oss.nttdata.com


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



On Wed, Dec 30, 2020 at 11:11 PM Alexey Kondratov
<a.kondratov@postgrespro.ru> wrote:
> On 2020-12-30 17:59, Bharath Rupireddy wrote:
> > On Wed, Dec 30, 2020 at 5:20 PM Alexey Kondratov
> > <a.kondratov@postgrespro.ru> wrote:
> >>
> >> On 2020-12-30 09:10, Bharath Rupireddy wrote:
> >> I still have some doubts that it is worth of allowing to call
> >> postgres_fdw_disconnect() in the explicit transaction block, since it
> >> adds a lot of things to care and check for. But otherwise current
> >> logic
> >> looks solid.
> >>
> >> +                                errdetail("Such connections get
> >> closed either in the next use or
> >> at the end of the current transaction.")
> >> +                                : errdetail("Such connection gets
> >> closed either in the next use or
> >> at the end of the current transaction.")));
> >>
> >> Does it really have a chance to get closed on the next use? If foreign
> >> server is dropped then user mapping should be dropped as well (either
> >> with CASCADE or manually), but we do need user mapping for a local
> >> cache
> >> lookup. That way, if I understand all the discussion up-thread
> >> correctly, we can only close such connections at the end of xact, do
> >> we?
> >
> > The next use of such a connection in the following query whose foreign
> > server would have been dropped fails because of the cascading that can
> > happen to drop the user mapping and the foreign table as well. During
> > the start of the next query such connection will be marked as
> > invalidated because xact_depth of that connection is > 1 and when the
> > fail happens, txn gets aborted due to which pgfdw_xact_callback gets
> > called and in that the connection gets closed. To make it more clear,
> > please have a look at the scenarios [1].
> >
>
> In my understanding 'connection gets closed either in the next use'
> means that connection will be closed next time someone will try to use
> it, i.e. GetConnection() will be called and it closes this connection
> because of a bad state. However, if foreign server is dropped
> GetConnection() cannot lookup the connection because it needs a user
> mapping oid as a key.

Right. We don't reach GetConnection(). The look up in either
GetForeignTable() or GetUserMapping() or GetForeignServer() fails (and
so the query) depending one which one gets called first.

> I had a look on your scenarios. IIUC, under **next use** you mean a
> select attempt from a table belonging to the same foreign server, which
> leads to a transaction abort and connection gets closed in the xact
> callback. Sorry, maybe I am missing something, but this just confirms
> that such connections only get closed in the xact callback (taking into
> account your recently committed patch [1]), so 'next use' looks
> misleading.
>
> [1]
> https://www.postgresql.org/message-id/8b2aa1aa-c638-12a8-cb56-ea0f0a5019cf%40oss.nttdata.com

Right. I meant the "next use" as the select attempt on a foreign table
with that foreign server. If no select query is run, then at the end
of the current txn that connection gets closed. Yes internally such
connection gets closed in pgfdw_xact_callback.

If the errdetail("Such connections get closed either in the next use
or at the end of the current transaction.") looks confusing, how about

1) errdetail("Such connection gets discarded while closing the remote
transaction.")/errdetail("Such connections get discarded while closing
the remote transaction.")
2) errdetail("Such connection is discarded at the end of remote
transaction.")/errdetail("Such connections are discarded at the end of
remote transaction.")

I prefer 2)  Thoughts?

Because we already print a message in pgfdw_xact_callback -
elog(DEBUG3, "closing remote transaction on connection %p"

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Thu, Dec 31, 2020 at 8:29 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> Right. I meant the "next use" as the select attempt on a foreign table
> with that foreign server. If no select query is run, then at the end
> of the current txn that connection gets closed. Yes internally such
> connection gets closed in pgfdw_xact_callback.
>
> If the errdetail("Such connections get closed either in the next use
> or at the end of the current transaction.") looks confusing, how about
>
> 1) errdetail("Such connection gets discarded while closing the remote
> transaction.")/errdetail("Such connections get discarded while closing
> the remote transaction.")
> 2) errdetail("Such connection is discarded at the end of remote
> transaction.")/errdetail("Such connections are discarded at the end of
> remote transaction.")
>
> I prefer 2)  Thoughts?
>
> Because we already print a message in pgfdw_xact_callback -
> elog(DEBUG3, "closing remote transaction on connection %p"

I changed the message to "Such connection is discarded at the end of
remote transaction.".

I'm attaching v5 patch set i.e. all the patches 0001 ( for new
functions), 0002 ( for GUC) and 0003 (for server level option). I have
also made the changes for increasing the version of
postgres_fdw--1.0.sql from 1.0 to 1.1.

I have no open points from my end. Please consider the v5 patch set
for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment
Hi, Bharath:

Happy new year.

+       appendStringInfo(&buf, "(%s, %s)", server->servername,
+                        entry->invalidated ? "false" : "true");

Is it better to use 'invalidated' than 'false' in the string ?

For the first if block of postgres_fdw_disconnect():

+        * Check if the connection associated with the given foreign server is
+        * in use i.e. entry->xact_depth > 0. Since we can not close it, so
+        * error out.
+        */
+       if (is_in_use)
+           ereport(WARNING,

since is_in_use is only set in the if (server) block, I think the above warning can be moved into that block.

Cheers

On Fri, Jan 1, 2021 at 2:04 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, Dec 31, 2020 at 8:29 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> Right. I meant the "next use" as the select attempt on a foreign table
> with that foreign server. If no select query is run, then at the end
> of the current txn that connection gets closed. Yes internally such
> connection gets closed in pgfdw_xact_callback.
>
> If the errdetail("Such connections get closed either in the next use
> or at the end of the current transaction.") looks confusing, how about
>
> 1) errdetail("Such connection gets discarded while closing the remote
> transaction.")/errdetail("Such connections get discarded while closing
> the remote transaction.")
> 2) errdetail("Such connection is discarded at the end of remote
> transaction.")/errdetail("Such connections are discarded at the end of
> remote transaction.")
>
> I prefer 2)  Thoughts?
>
> Because we already print a message in pgfdw_xact_callback -
> elog(DEBUG3, "closing remote transaction on connection %p"

I changed the message to "Such connection is discarded at the end of
remote transaction.".

I'm attaching v5 patch set i.e. all the patches 0001 ( for new
functions), 0002 ( for GUC) and 0003 (for server level option). I have
also made the changes for increasing the version of
postgres_fdw--1.0.sql from 1.0 to 1.1.

I have no open points from my end. Please consider the v5 patch set
for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Thanks for taking a look at the patches.

On Fri, Jan 1, 2021 at 9:35 PM Zhihong Yu <zyu@yugabyte.com> wrote:
> Happy new year.
>
> +       appendStringInfo(&buf, "(%s, %s)", server->servername,
> +                        entry->invalidated ? "false" : "true");
>
> Is it better to use 'invalidated' than 'false' in the string ?

This point was earlier discussed in [1] and [2], but the agreement was
on having true/false [2] because of a simple reason specified in [1],
that is when some users have foreign server names as invalid or valid,
then the output is difficult to interpret which one is what. With
having true/false, it's easier. IMO, let's keep the true/false as is,
since it's also suggested in [2].

[1] - https://www.postgresql.org/message-id/CALj2ACUv%3DArQXs0U9PM3YXKCeSzJ1KxRokDY0g_0aGy--kDScA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/6da38393-6ae5-4d87-2690-11c932123403%40oss.nttdata.com

> For the first if block of postgres_fdw_disconnect():
>
> +        * Check if the connection associated with the given foreign server is
> +        * in use i.e. entry->xact_depth > 0. Since we can not close it, so
> +        * error out.
> +        */
> +       if (is_in_use)
> +           ereport(WARNING,
>
> since is_in_use is only set in the if (server) block, I think the above warning can be moved into that block.

Modified that a bit. Since we error out when no server object is
found, then no need of keeping the code in else part. We could save on
some indentation

+        if (!server)
+            ereport(ERROR,
+                    (errcode(ERRCODE_CONNECTION_DOES_NOT_EXIST),
+                     errmsg("foreign server \"%s\" does not exist",
servername)));
+
+        hashvalue = GetSysCacheHashValue1(FOREIGNSERVEROID,
+                                          ObjectIdGetDatum(server->serverid));
+        result = disconnect_cached_connections(hashvalue, false, &is_in_use);
+
+        /*
+         * Check if the connection associated with the given foreign server is
+         * in use i.e. entry->xact_depth > 0. Since we can not close it, so
+         * error out.
+         */
+        if (is_in_use)
+            ereport(WARNING,
+                    (errmsg("cannot close the connection because it
is still in use")));

Attaching v6 patch set. Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment
On Sat, Jan 2, 2021 at 10:53 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> Thanks for taking a look at the patches.
>
> On Fri, Jan 1, 2021 at 9:35 PM Zhihong Yu <zyu@yugabyte.com> wrote:
> > Happy new year.
> >
> > +       appendStringInfo(&buf, "(%s, %s)", server->servername,
> > +                        entry->invalidated ? "false" : "true");
> >
> > Is it better to use 'invalidated' than 'false' in the string ?
>
> This point was earlier discussed in [1] and [2], but the agreement was
> on having true/false [2] because of a simple reason specified in [1],
> that is when some users have foreign server names as invalid or valid,
> then the output is difficult to interpret which one is what. With
> having true/false, it's easier. IMO, let's keep the true/false as is,
> since it's also suggested in [2].
>
> [1] - https://www.postgresql.org/message-id/CALj2ACUv%3DArQXs0U9PM3YXKCeSzJ1KxRokDY0g_0aGy--kDScA%40mail.gmail.com
> [2] - https://www.postgresql.org/message-id/6da38393-6ae5-4d87-2690-11c932123403%40oss.nttdata.com
>
> > For the first if block of postgres_fdw_disconnect():
> >
> > +        * Check if the connection associated with the given foreign server is
> > +        * in use i.e. entry->xact_depth > 0. Since we can not close it, so
> > +        * error out.
> > +        */
> > +       if (is_in_use)
> > +           ereport(WARNING,
> >
> > since is_in_use is only set in the if (server) block, I think the above warning can be moved into that block.
>
> Modified that a bit. Since we error out when no server object is
> found, then no need of keeping the code in else part. We could save on
> some indentation
>
> +        if (!server)
> +            ereport(ERROR,
> +                    (errcode(ERRCODE_CONNECTION_DOES_NOT_EXIST),
> +                     errmsg("foreign server \"%s\" does not exist",
> servername)));
> +
> +        hashvalue = GetSysCacheHashValue1(FOREIGNSERVEROID,
> +                                          ObjectIdGetDatum(server->serverid));
> +        result = disconnect_cached_connections(hashvalue, false, &is_in_use);
> +
> +        /*
> +         * Check if the connection associated with the given foreign server is
> +         * in use i.e. entry->xact_depth > 0. Since we can not close it, so
> +         * error out.
> +         */
> +        if (is_in_use)
> +            ereport(WARNING,
> +                    (errmsg("cannot close the connection because it
> is still in use")));
>
> Attaching v6 patch set. Please have a look.

I'm sorry for the mess. I missed adding the new files into the v6-0001
patch. Please ignore the v6 patch set and consder the v7 patch set for
further review. Note that 0002 and 0003 patches have no difference
from v5 patch set.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment
On Sat, Jan 2, 2021 at 11:19 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> I'm sorry for the mess. I missed adding the new files into the v6-0001
> patch. Please ignore the v6 patch set and consder the v7 patch set for
> further review. Note that 0002 and 0003 patches have no difference
> from v5 patch set.

It seems like cf bot was failing on v7 patches. On Linux, it fails
while building documentation in 0001 patch, I corrected that.  On
FreeBSD, it fails in one of the test cases I added, since it was
unstable, I corrected it now.

Attaching v8 patch set. Hopefully, cf bot will be happy with v8.

Please consider the v8 patch set for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2021/01/05 16:56, Bharath Rupireddy wrote:
> On Sat, Jan 2, 2021 at 11:19 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> I'm sorry for the mess. I missed adding the new files into the v6-0001
>> patch. Please ignore the v6 patch set and consder the v7 patch set for
>> further review. Note that 0002 and 0003 patches have no difference
>> from v5 patch set.
> 
> It seems like cf bot was failing on v7 patches. On Linux, it fails
> while building documentation in 0001 patch, I corrected that.  On
> FreeBSD, it fails in one of the test cases I added, since it was
> unstable, I corrected it now.
> 
> Attaching v8 patch set. Hopefully, cf bot will be happy with v8.
> 
> Please consider the v8 patch set for further review.

Thanks for the patch!

-DATA = postgres_fdw--1.0.sql
+DATA = postgres_fdw--1.1.sql postgres_fdw--1.0--1.1.sql

Shouldn't we leave 1.0.sql as it is and create 1.0--1.1.sql so that
we can run the followings?

     CREATE EXTENSION postgres_fdw VERSION "1.0";
     ALTER EXTENSION postgres_fdw UPDATE TO "1.1";


+<sect2>
+  <title>Functions</title>

The document format for functions should be consistent with
that in other contrib module like pgstattuple?


+   When called in the local session, it returns an array with each element as a
+   pair of the foreign server names of all the open connections that are
+   previously made to the foreign servers and <literal>true</literal> or
+   <literal>false</literal> to show whether or not the connection is valid.

We thought that the information about whether the connection is valid or
not was useful to, for example, identify and close explicitly the long-living
invalid connections because they were useless. But thanks to the recent
bug fix for connection leak issue, that information would be no longer
so helpful for us? False is returned only when the connection is used in
this local transaction but it's marked as invalidated. In this case that
connection cannot be explicitly closed because it's used in this transaction.
It will be closed at the end of transaction. Thought?


I guess that you made postgres_fdw_get_connections() return the array
because the similar function dblink_get_connections() does that. But
isn't it more convenient to make that return the set of records?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Thu, Jan 7, 2021 at 9:49 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2021/01/05 16:56, Bharath Rupireddy wrote:
> > Attaching v8 patch set. Hopefully, cf bot will be happy with v8.
> >
> > Please consider the v8 patch set for further review.
> -DATA = postgres_fdw--1.0.sql
> +DATA = postgres_fdw--1.1.sql postgres_fdw--1.0--1.1.sql
>
> Shouldn't we leave 1.0.sql as it is and create 1.0--1.1.sql so that
> we can run the followings?
>
>      CREATE EXTENSION postgres_fdw VERSION "1.0";
>      ALTER EXTENSION postgres_fdw UPDATE TO "1.1";

Yes we can. In that case, to use the new functions users have to
update postgres_fdw to 1.1, in that case, do we need to mention in the
documentation that to make use of the new functions, update
postgres_fdw to version 1.1?

With the above change, the contents of postgres_fdw--1.0.sql remain as
is and in postgres_fdw--1.0--1.1.sql we will have only the new
function statements:

/* contrib/postgres_fdw/postgres_fdw--1.0--1.1.sql */

-- complain if script is sourced in psql, rather than via ALTER EXTENSION
\echo Use "ALTER EXTENSION postgres_fdw UPDATE TO '1.1'" to load this
file. \quit

CREATE FUNCTION postgres_fdw_get_connections ()
RETURNS text[]
AS 'MODULE_PATHNAME','postgres_fdw_get_connections'
LANGUAGE C STRICT PARALLEL RESTRICTED;

CREATE FUNCTION postgres_fdw_disconnect ()
RETURNS bool
AS 'MODULE_PATHNAME','postgres_fdw_disconnect'
LANGUAGE C STRICT PARALLEL RESTRICTED;

CREATE FUNCTION postgres_fdw_disconnect (text)
RETURNS bool
AS 'MODULE_PATHNAME','postgres_fdw_disconnect'
LANGUAGE C STRICT PARALLEL RESTRICTED;

> +<sect2>
> +  <title>Functions</title>
>
> The document format for functions should be consistent with
> that in other contrib module like pgstattuple?

pgstattuple has so many columns to show up in output because of that
they have a table listing all the output columns and their types. The
new functions introduced here have only one or none input and an
output. I think, we don't need a table listing the input and output
names and types.

IMO, we can have something similar to what pg_visibility_map has for
functions, and also an example for each function showing how it can be
used. Thoughts?

> +   When called in the local session, it returns an array with each element as a
> +   pair of the foreign server names of all the open connections that are
> +   previously made to the foreign servers and <literal>true</literal> or
> +   <literal>false</literal> to show whether or not the connection is valid.
>
> We thought that the information about whether the connection is valid or
> not was useful to, for example, identify and close explicitly the long-living
> invalid connections because they were useless. But thanks to the recent
> bug fix for connection leak issue, that information would be no longer
> so helpful for us? False is returned only when the connection is used in
> this local transaction but it's marked as invalidated. In this case that
> connection cannot be explicitly closed because it's used in this transaction.
> It will be closed at the end of transaction. Thought?

Yes, connection's validity can be false only when the connection gets
invalidated and postgres_fdw_get_connections is called within an
explicit txn i.e. begin; commit;. In implicit txn, since the
invalidated connections get closed either during invalidation callback
or at the end of txn, postgres_fdw_get_connections will always show
valid connections. Having said that, I still feel we need the
true/false for valid/invalid in the output of the
postgres_fdw_get_connections, otherwise we might miss giving
connection validity information to the user in a very narrow use case
of explicit txn. If required, we can issue a warning whenever we see
an invalid connection saying "invalid connections connections are
discarded at the end of remote transaction". Thoughts?

> I guess that you made postgres_fdw_get_connections() return the array
> because the similar function dblink_get_connections() does that. But
> isn't it more convenient to make that return the set of records?

Yes, for postgres_fdw_get_connections we can return a set of records
of (server_name, valid). To do so, I can refer to dblink_get_pkey.
Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/01/07 17:21, Bharath Rupireddy wrote:
> On Thu, Jan 7, 2021 at 9:49 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> On 2021/01/05 16:56, Bharath Rupireddy wrote:
>>> Attaching v8 patch set. Hopefully, cf bot will be happy with v8.
>>>
>>> Please consider the v8 patch set for further review.
>> -DATA = postgres_fdw--1.0.sql
>> +DATA = postgres_fdw--1.1.sql postgres_fdw--1.0--1.1.sql
>>
>> Shouldn't we leave 1.0.sql as it is and create 1.0--1.1.sql so that
>> we can run the followings?
>>
>>       CREATE EXTENSION postgres_fdw VERSION "1.0";
>>       ALTER EXTENSION postgres_fdw UPDATE TO "1.1";
> 
> Yes we can. In that case, to use the new functions users have to
> update postgres_fdw to 1.1, in that case, do we need to mention in the
> documentation that to make use of the new functions, update
> postgres_fdw to version 1.1?

But since postgres_fdw.control indicates that the default version is 1.1,
"CREATE EXTENSION postgres_fdw" installs v1.1. So basically the users
don't need to update postgres_fdw from v1.0 to v1.1. Only the users of
v1.0 need to update that to v1.1 to use new functions. No?


> 
> With the above change, the contents of postgres_fdw--1.0.sql remain as
> is and in postgres_fdw--1.0--1.1.sql we will have only the new
> function statements:

Yes.


> 
> /* contrib/postgres_fdw/postgres_fdw--1.0--1.1.sql */
> 
> -- complain if script is sourced in psql, rather than via ALTER EXTENSION
> \echo Use "ALTER EXTENSION postgres_fdw UPDATE TO '1.1'" to load this
> file. \quit
> 
> CREATE FUNCTION postgres_fdw_get_connections ()
> RETURNS text[]
> AS 'MODULE_PATHNAME','postgres_fdw_get_connections'
> LANGUAGE C STRICT PARALLEL RESTRICTED;
> 
> CREATE FUNCTION postgres_fdw_disconnect ()
> RETURNS bool
> AS 'MODULE_PATHNAME','postgres_fdw_disconnect'
> LANGUAGE C STRICT PARALLEL RESTRICTED;
> 
> CREATE FUNCTION postgres_fdw_disconnect (text)
> RETURNS bool
> AS 'MODULE_PATHNAME','postgres_fdw_disconnect'
> LANGUAGE C STRICT PARALLEL RESTRICTED;
> 
>> +<sect2>
>> +  <title>Functions</title>
>>
>> The document format for functions should be consistent with
>> that in other contrib module like pgstattuple?
> 
> pgstattuple has so many columns to show up in output because of that
> they have a table listing all the output columns and their types. The
> new functions introduced here have only one or none input and an
> output. I think, we don't need a table listing the input and output
> names and types.
> 
> IMO, we can have something similar to what pg_visibility_map has for
> functions, and also an example for each function showing how it can be
> used. Thoughts?

Sounds good.


> 
>> +   When called in the local session, it returns an array with each element as a
>> +   pair of the foreign server names of all the open connections that are
>> +   previously made to the foreign servers and <literal>true</literal> or
>> +   <literal>false</literal> to show whether or not the connection is valid.
>>
>> We thought that the information about whether the connection is valid or
>> not was useful to, for example, identify and close explicitly the long-living
>> invalid connections because they were useless. But thanks to the recent
>> bug fix for connection leak issue, that information would be no longer
>> so helpful for us? False is returned only when the connection is used in
>> this local transaction but it's marked as invalidated. In this case that
>> connection cannot be explicitly closed because it's used in this transaction.
>> It will be closed at the end of transaction. Thought?
> 
> Yes, connection's validity can be false only when the connection gets
> invalidated and postgres_fdw_get_connections is called within an
> explicit txn i.e. begin; commit;. In implicit txn, since the
> invalidated connections get closed either during invalidation callback
> or at the end of txn, postgres_fdw_get_connections will always show
> valid connections. Having said that, I still feel we need the
> true/false for valid/invalid in the output of the
> postgres_fdw_get_connections, otherwise we might miss giving
> connection validity information to the user in a very narrow use case
> of explicit txn.

Understood. I withdraw my suggestion and am fine to display
valid/invalid information.


> If required, we can issue a warning whenever we see
> an invalid connection saying "invalid connections connections are
> discarded at the end of remote transaction". Thoughts?

IMO it's overkill to emit such warinng message because that
situation is normal one. OTOH, it seems worth documenting that.


> 
>> I guess that you made postgres_fdw_get_connections() return the array
>> because the similar function dblink_get_connections() does that. But
>> isn't it more convenient to make that return the set of records?
> 
> Yes, for postgres_fdw_get_connections we can return a set of records
> of (server_name, valid). To do so, I can refer to dblink_get_pkey.
> Thoughts?

Yes.

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Fri, Jan 8, 2021 at 7:29 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2021/01/07 17:21, Bharath Rupireddy wrote:
> > On Thu, Jan 7, 2021 at 9:49 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >> On 2021/01/05 16:56, Bharath Rupireddy wrote:
> >>> Attaching v8 patch set. Hopefully, cf bot will be happy with v8.
> >>>
> >>> Please consider the v8 patch set for further review.
> >> -DATA = postgres_fdw--1.0.sql
> >> +DATA = postgres_fdw--1.1.sql postgres_fdw--1.0--1.1.sql
> >>
> >> Shouldn't we leave 1.0.sql as it is and create 1.0--1.1.sql so that
> >> we can run the followings?
> >>
> >>       CREATE EXTENSION postgres_fdw VERSION "1.0";
> >>       ALTER EXTENSION postgres_fdw UPDATE TO "1.1";
> >
> > Yes we can. In that case, to use the new functions users have to
> > update postgres_fdw to 1.1, in that case, do we need to mention in the
> > documentation that to make use of the new functions, update
> > postgres_fdw to version 1.1?
>
> But since postgres_fdw.control indicates that the default version is 1.1,
> "CREATE EXTENSION postgres_fdw" installs v1.1. So basically the users
> don't need to update postgres_fdw from v1.0 to v1.1. Only the users of
> v1.0 need to update that to v1.1 to use new functions. No?

It works this way:
scenario 1:
1) create extension postgres_fdw;   --> this is run before our feature
i.e default_version 1.0
2) after the feature i..e default_version 1.1, users can run alter
extension postgres_fdw update to "1.1"; which gets the new functions
from postgres_fdw--1.0--1.1.sql.

scenario 2:
1) create extension postgres_fdw;   --> this is run after our feature
i.e default_version 1.1, then the new functions will be installed with
create extension itself, no need to run alter update to get the
functions,

I will make the changes and post a new patch set soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Fri, Jan 8, 2021 at 9:55 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> I will make the changes and post a new patch set soon.

Attaching v9 patch set that has addressed the review comments on the
disconnect function returning setof records, documentation changes,
and postgres_fdw--1.0-1.1.sql changes.

Please consider the v9 patch set for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2021/01/09 10:12, Bharath Rupireddy wrote:
> On Fri, Jan 8, 2021 at 9:55 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> I will make the changes and post a new patch set soon.
> 
> Attaching v9 patch set that has addressed the review comments on the
> disconnect function returning setof records, documentation changes,
> and postgres_fdw--1.0-1.1.sql changes.
> 
> Please consider the v9 patch set for further review.

Thanks for updating the patch! I reviewed only 0001 patch.

+    /*
+     * Quick exit if the cache has been destroyed in
+     * disconnect_cached_connections.
+     */
+    if (!ConnectionHash)
+        return;

This code is not necessary at least in pgfdw_xact_callback() and
pgfdw_subxact_callback()? Because those functions check
"if (!xact_got_connection)" before checking the above condition.

-    if (!HeapTupleIsValid(tup))
-        elog(ERROR, "cache lookup failed for user mapping %u", entry->key);
-    umform = (Form_pg_user_mapping) GETSTRUCT(tup);
-    server = GetForeignServer(umform->umserver);
-    ReleaseSysCache(tup);
+    server = GetForeignServer(entry->serverid);

What about applying only the change about serverid, as a separate patch at
first? This change itself is helpful to get rid of error "cache lookup failed"
in pgfdw_reject_incomplete_xact_state_change(). Patch attached.

+        server = GetForeignServerExtended(entry->serverid, true);

Since the type of second argument in GetForeignServerExtended() is bits16,
it's invalid to specify "true" there?

+    if (no_server_conn_cnt > 0)
+    {
+        ereport(WARNING,
+                (errmsg_plural("found an active connection for which the foreign server would have been dropped",
+                               "found some active connections for which the foreign servers would have been dropped",
+                               no_server_conn_cnt),
+                 no_server_conn_cnt > 1 ?
+                 errdetail("Such connections are discarded at the end of remote transaction.")
+                 : errdetail("Such connection is discarded at the end of remote transaction.")));

At least for me, I like returning such connections with "NULL" in server_name
column and "false" in valid column, rather than emitting a warning. Because
which would enable us to count the number of actual foreign connections
easily by using SQL, for example.

+     * During the first call, we initialize the function context, get the list
+     * of active connections using get_connections and store this in the
+     * function's memory context so that it can live multiple calls.
+     */
+    if (SRF_IS_FIRSTCALL())

I guess that you used value-per-call mode to make the function return
a set result since you refered to dblink_get_pkey(). But isn't it better to
use materialize mode like dblink_get_notify() does rather than
value-per-call because this function returns not so many records? ISTM
that we can simplify postgres_fdw_get_connections() by using materialize mode.

+        hash_destroy(ConnectionHash);
+        ConnectionHash = NULL;

If GetConnection() is called after ConnectionHash is destroyed,
it initialize the hashtable and registers some callback functions again
even though the same function have already been registered. This causes
same function to be registered as a callback more than once. This is
a bug.

+CREATE FUNCTION postgres_fdw_disconnect ()

Do we really want postgres_fdw_disconnect() with no argument?
IMO postgres_fdw_disconnect() with the server name specified is enough.
But I'd like to hear the opinion about that.


Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment
On Thu, Jan 14, 2021 at 3:52 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> -       if (!HeapTupleIsValid(tup))
> -               elog(ERROR, "cache lookup failed for user mapping %u", entry->key);
> -       umform = (Form_pg_user_mapping) GETSTRUCT(tup);
> -       server = GetForeignServer(umform->umserver);
> -       ReleaseSysCache(tup);
> +       server = GetForeignServer(entry->serverid);
>
> What about applying only the change about serverid, as a separate patch at
> first? This change itself is helpful to get rid of error "cache lookup failed"
> in pgfdw_reject_incomplete_xact_state_change(). Patch attached.

Right, we can get rid of the "cache lookup failed for user mapping"
error and also storing server oid in the cache entry is helpful for
the new functions we are going to introduce.

serverid_v1.patch looks good to me. Both make check and make
check-world passes on my system.

I will respond to other comments soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/01/14 20:36, Bharath Rupireddy wrote:
> On Thu, Jan 14, 2021 at 3:52 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> -       if (!HeapTupleIsValid(tup))
>> -               elog(ERROR, "cache lookup failed for user mapping %u", entry->key);
>> -       umform = (Form_pg_user_mapping) GETSTRUCT(tup);
>> -       server = GetForeignServer(umform->umserver);
>> -       ReleaseSysCache(tup);
>> +       server = GetForeignServer(entry->serverid);
>>
>> What about applying only the change about serverid, as a separate patch at
>> first? This change itself is helpful to get rid of error "cache lookup failed"
>> in pgfdw_reject_incomplete_xact_state_change(). Patch attached.
> 
> Right, we can get rid of the "cache lookup failed for user mapping"
> error and also storing server oid in the cache entry is helpful for
> the new functions we are going to introduce.
> 
> serverid_v1.patch looks good to me. Both make check and make
> check-world passes on my system.

Thanks for the check! I pushed the patch.
  
> I will respond to other comments soon.

Thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Thu, Jan 14, 2021 at 3:52 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2021/01/09 10:12, Bharath Rupireddy wrote:
> > On Fri, Jan 8, 2021 at 9:55 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >> I will make the changes and post a new patch set soon.
> >
> > Attaching v9 patch set that has addressed the review comments on the
> > disconnect function returning setof records, documentation changes,
> > and postgres_fdw--1.0-1.1.sql changes.
> >
> > Please consider the v9 patch set for further review.
>
> Thanks for updating the patch! I reviewed only 0001 patch.
>
> +       /*
> +        * Quick exit if the cache has been destroyed in
> +        * disconnect_cached_connections.
> +        */
> +       if (!ConnectionHash)
> +               return;
>
> This code is not necessary at least in pgfdw_xact_callback() and
> pgfdw_subxact_callback()? Because those functions check
> "if (!xact_got_connection)" before checking the above condition.

Yes, if xact_got_connection is true, then ConnectionHash wouldn't have
been cleaned up in disconnect_cached_connections. +1 to remove that in
pgfdw_xact_callback and pgfdw_subxact_callback. But we need that check
in pgfdw_inval_callback, because we may reach there after
ConnectionHash is destroyed and set to NULL in
disconnect_cached_connections.

> +               server = GetForeignServerExtended(entry->serverid, true);
>
> Since the type of second argument in GetForeignServerExtended() is bits16,
> it's invalid to specify "true" there?

Yeah. I will change it to be something like below:
bits16        flags = FSV_MISSING_OK;
server = GetForeignServerExtended(entry->serverid, flags);

> +       if (no_server_conn_cnt > 0)
> +       {
> +               ereport(WARNING,
> +                               (errmsg_plural("found an active connection for which the foreign server would have
beendropped",
 
> +                                                          "found some active connections for which the foreign
serverswould have been dropped",
 
> +                                                          no_server_conn_cnt),
> +                                no_server_conn_cnt > 1 ?
> +                                errdetail("Such connections are discarded at the end of remote transaction.")
> +                                : errdetail("Such connection is discarded at the end of remote transaction.")));
>
> At least for me, I like returning such connections with "NULL" in server_name
> column and "false" in valid column, rather than emitting a warning. Because
> which would enable us to count the number of actual foreign connections
> easily by using SQL, for example.

+1. I was also of the similar opinion about this initially. I will change this.

> +        * During the first call, we initialize the function context, get the list
> +        * of active connections using get_connections and store this in the
> +        * function's memory context so that it can live multiple calls.
> +        */
> +       if (SRF_IS_FIRSTCALL())
>
> I guess that you used value-per-call mode to make the function return
> a set result since you refered to dblink_get_pkey(). But isn't it better to
> use materialize mode like dblink_get_notify() does rather than
> value-per-call because this function returns not so many records? ISTM
> that we can simplify postgres_fdw_get_connections() by using materialize mode.

Yeah. +1 I will change it to use materialize mode.

> +               hash_destroy(ConnectionHash);
> +               ConnectionHash = NULL;
>
> If GetConnection() is called after ConnectionHash is destroyed,
> it initialize the hashtable and registers some callback functions again
> even though the same function have already been registered. This causes
> same function to be registered as a callback more than once. This is
> a bug.

Yeah, we will register the same callbacks many times. I'm thinking to
have something like below:

static bool conn_cache_destroyed = false;

    if (!active_conn_exists)
    {
        hash_destroy(ConnectionHash);
        ConnectionHash = NULL;
        conn_cache_destroyed = true;
    }

        /*
         * Register callback functions that manage connection cleanup. This
         * should be done just once in each backend. We don't register the
         * callbacks again, if the connection cache is destroyed at least once
         * in the backend.
         */
        if (!conn_cache_destroyed)
        {
            RegisterXactCallback(pgfdw_xact_callback, NULL);
            RegisterSubXactCallback(pgfdw_subxact_callback, NULL);
            CacheRegisterSyscacheCallback(FOREIGNSERVEROID,
                                          pgfdw_inval_callback, (Datum) 0);
            CacheRegisterSyscacheCallback(USERMAPPINGOID,
                                          pgfdw_inval_callback, (Datum) 0);
        }

Thoughts?

> +CREATE FUNCTION postgres_fdw_disconnect ()
>
> Do we really want postgres_fdw_disconnect() with no argument?
> IMO postgres_fdw_disconnect() with the server name specified is enough.
> But I'd like to hear the opinion about that.

IMO, we should have that. Though a bit impractical use case, if we
have many connections which are not being used and want to disconnect
them at once, this function will be useful.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Sat, Jan 16, 2021 at 10:36 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> > > Please consider the v9 patch set for further review.
> >
> > Thanks for updating the patch! I reviewed only 0001 patch.

I addressed the review comments and attached v10 patch set. Please
consider it for further review.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment
Hi,

This patch introduces new function postgres_fdw_disconnect() when
called with a foreign server name discards the associated
connections with the server name.

I think the following would read better:

This patch introduces a new function postgres_fdw_disconnect(). When
called with a foreign server name, it discards the associated
connections with the server.

Please note the removal of the 'name' at the end - connection is with server, not server name.

+       if (is_in_use)
+           ereport(WARNING,
+                   (errmsg("cannot close the connection because it is still in use")));

It would be better to include servername in the message.

+               ereport(WARNING,
+                       (errmsg("cannot close all connections because some of them are still in use")));

I think showing the number of active connections would be more informative.
This can be achieved by changing active_conn_exists from bool to int (named active_conns, e.g.):

+       if (entry->conn && !active_conn_exists)
+           active_conn_exists = true;

Instead of setting the bool value, active_conns can be incremented.

Cheers

On Sat, Jan 16, 2021 at 11:39 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Sat, Jan 16, 2021 at 10:36 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> > > Please consider the v9 patch set for further review.
> >
> > Thanks for updating the patch! I reviewed only 0001 patch.

I addressed the review comments and attached v10 patch set. Please
consider it for further review.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Sun, Jan 17, 2021 at 11:30 PM Zhihong Yu <zyu@yugabyte.com> wrote:
> This patch introduces new function postgres_fdw_disconnect() when
> called with a foreign server name discards the associated
> connections with the server name.
>
> I think the following would read better:
>
> This patch introduces a new function postgres_fdw_disconnect(). When
> called with a foreign server name, it discards the associated
> connections with the server.

Thanks. I corrected the commit message.

> Please note the removal of the 'name' at the end - connection is with server, not server name.
>
> +       if (is_in_use)
> +           ereport(WARNING,
> +                   (errmsg("cannot close the connection because it is still in use")));
>
> It would be better to include servername in the message.

User would have provided the servername in
postgres_fdw_disconnect('myserver'), I don't think we need to emit the
warning again with the servername. The existing warning seems fine.

> +               ereport(WARNING,
> +                       (errmsg("cannot close all connections because some of them are still in use")));
>
> I think showing the number of active connections would be more informative.
> This can be achieved by changing active_conn_exists from bool to int (named active_conns, e.g.):
>
> +       if (entry->conn && !active_conn_exists)
> +           active_conn_exists = true;
>
> Instead of setting the bool value, active_conns can be incremented.

IMO, the number of active connections is not informative, because
users can not do anything with them. What's actually more informative
would be to list all the server names for which the connections are
active, instead of the warning - "cannot close all connections because
some of them are still in use". Having said that, I feel like it's an
overkill for now to do that. If required, we can enhance the warnings
in future. Thoughts?

Attaching v11 patch set, with changes only in 0001. The changes are
commit message correction and moved the warning related code to
disconnect_cached_connections from postgres_fdw_disconnect.

Please review v11 further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2021/01/18 12:33, Bharath Rupireddy wrote:
> On Sun, Jan 17, 2021 at 11:30 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>> This patch introduces new function postgres_fdw_disconnect() when
>> called with a foreign server name discards the associated
>> connections with the server name.
>>
>> I think the following would read better:
>>
>> This patch introduces a new function postgres_fdw_disconnect(). When
>> called with a foreign server name, it discards the associated
>> connections with the server.
> 
> Thanks. I corrected the commit message.
> 
>> Please note the removal of the 'name' at the end - connection is with server, not server name.
>>
>> +       if (is_in_use)
>> +           ereport(WARNING,
>> +                   (errmsg("cannot close the connection because it is still in use")));
>>
>> It would be better to include servername in the message.
> 
> User would have provided the servername in
> postgres_fdw_disconnect('myserver'), I don't think we need to emit the
> warning again with the servername. The existing warning seems fine.
> 
>> +               ereport(WARNING,
>> +                       (errmsg("cannot close all connections because some of them are still in use")));
>>
>> I think showing the number of active connections would be more informative.
>> This can be achieved by changing active_conn_exists from bool to int (named active_conns, e.g.):
>>
>> +       if (entry->conn && !active_conn_exists)
>> +           active_conn_exists = true;
>>
>> Instead of setting the bool value, active_conns can be incremented.
> 
> IMO, the number of active connections is not informative, because
> users can not do anything with them. What's actually more informative
> would be to list all the server names for which the connections are
> active, instead of the warning - "cannot close all connections because
> some of them are still in use". Having said that, I feel like it's an
> overkill for now to do that. If required, we can enhance the warnings
> in future. Thoughts?
> 
> Attaching v11 patch set, with changes only in 0001. The changes are
> commit message correction and moved the warning related code to
> disconnect_cached_connections from postgres_fdw_disconnect.
> 
> Please review v11 further.

Thanks for updating the patch!

The patch for postgres_fdw_get_connections() basically looks good to me.
So at first I'd like to push it. Attached is the patch that I extracted
postgres_fdw_get_connections() part from 0001 patch and tweaked.
Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment
On Mon, Jan 18, 2021 at 9:38 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > Please review v11 further.
>
> Thanks for updating the patch!
>
> The patch for postgres_fdw_get_connections() basically looks good to me.
> So at first I'd like to push it. Attached is the patch that I extracted
> postgres_fdw_get_connections() part from 0001 patch and tweaked.
> Thought?

Thanks.

We need to create the loopback3 with user mapping public, otherwise
the test might become unstable as shown below. Note that loopback and
loopback2 are not dropped in the test, so no problem with them.

 ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');
 DROP SERVER loopback3 CASCADE;
 NOTICE:  drop cascades to 2 other objects
-DETAIL:  drop cascades to user mapping for postgres on server loopback3
+DETAIL:  drop cascades to user mapping for bharath on server loopback3

Attaching v2 patch for postgres_fdw_get_connections. Please have a look.

I will post patches for the other function postgres_fdw_disconnect,
GUC and server level option later.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment
> We need to create the loopback3 with user mapping public, otherwise the
> test might become unstable as shown below. Note that loopback and
> loopback2 are not dropped in the test, so no problem with them.
> 
>  ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');  DROP
> SERVER loopback3 CASCADE;
>  NOTICE:  drop cascades to 2 other objects
> -DETAIL:  drop cascades to user mapping for postgres on server loopback3
> +DETAIL:  drop cascades to user mapping for bharath on server loopback3
> 
> Attaching v2 patch for postgres_fdw_get_connections. Please have a look.
Hi

I have a comment for the doc about postgres_fdw_get_connections.

+    <term><function>postgres_fdw_get_connections(OUT server_name text, OUT valid boolean) returns setof
record</function></term>
+    <listitem>
+     <para>
+      This function returns the foreign server names of all the open
+      connections that <filename>postgres_fdw</filename> established from
+      the local session to the foreign servers. It also returns whether
+      each connection is valid or not. <literal>false</literal> is returned
+      if the foreign server connection is used in the current local
+      transaction but its foreign server or user mapping is changed or
+      dropped, and then such invalid connection will be closed at
+      the end of that transaction. <literal>true</literal> is returned
+      otherwise. If there are no open connections, no record is returned.
+      Example usage of the function:

The doc seems does not memtion the case when the function returns NULL in server_name.
Users may be a little confused about why NULL was returned.

Best regards,
houzj




On 2021/01/18 13:46, Bharath Rupireddy wrote:
> On Mon, Jan 18, 2021 at 9:38 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> Please review v11 further.
>>
>> Thanks for updating the patch!
>>
>> The patch for postgres_fdw_get_connections() basically looks good to me.
>> So at first I'd like to push it. Attached is the patch that I extracted
>> postgres_fdw_get_connections() part from 0001 patch and tweaked.
>> Thought?
> 
> Thanks.
> 
> We need to create the loopback3 with user mapping public, otherwise
> the test might become unstable as shown below. Note that loopback and
> loopback2 are not dropped in the test, so no problem with them.
> 
>   ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');
>   DROP SERVER loopback3 CASCADE;
>   NOTICE:  drop cascades to 2 other objects
> -DETAIL:  drop cascades to user mapping for postgres on server loopback3
> +DETAIL:  drop cascades to user mapping for bharath on server loopback3
> 
> Attaching v2 patch for postgres_fdw_get_connections. Please have a look.

Thanks! You're right. I pushed the v2 patch.


> I will post patches for the other function postgres_fdw_disconnect,
> GUC and server level option later.

Thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2021/01/18 15:02, Hou, Zhijie wrote:
>> We need to create the loopback3 with user mapping public, otherwise the
>> test might become unstable as shown below. Note that loopback and
>> loopback2 are not dropped in the test, so no problem with them.
>>
>>   ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');  DROP
>> SERVER loopback3 CASCADE;
>>   NOTICE:  drop cascades to 2 other objects
>> -DETAIL:  drop cascades to user mapping for postgres on server loopback3
>> +DETAIL:  drop cascades to user mapping for bharath on server loopback3
>>
>> Attaching v2 patch for postgres_fdw_get_connections. Please have a look.
> Hi
> 
> I have a comment for the doc about postgres_fdw_get_connections.
> 
> +    <term><function>postgres_fdw_get_connections(OUT server_name text, OUT valid boolean) returns setof
record</function></term>
> +    <listitem>
> +     <para>
> +      This function returns the foreign server names of all the open
> +      connections that <filename>postgres_fdw</filename> established from
> +      the local session to the foreign servers. It also returns whether
> +      each connection is valid or not. <literal>false</literal> is returned
> +      if the foreign server connection is used in the current local
> +      transaction but its foreign server or user mapping is changed or
> +      dropped, and then such invalid connection will be closed at
> +      the end of that transaction. <literal>true</literal> is returned
> +      otherwise. If there are no open connections, no record is returned.
> +      Example usage of the function:
> 
> The doc seems does not memtion the case when the function returns NULL in server_name.
> Users may be a little confused about why NULL was returned.

Yes, so what about adding

     (Note that the returned server name of invalid connection is NULL if its server is dropped)

into the following (just after "dropped")?

+      if the foreign server connection is used in the current local
+      transaction but its foreign server or user mapping is changed or
+      dropped

Or better description?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Mon, Jan 18, 2021 at 11:58 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2021/01/18 15:02, Hou, Zhijie wrote:
> >> We need to create the loopback3 with user mapping public, otherwise the
> >> test might become unstable as shown below. Note that loopback and
> >> loopback2 are not dropped in the test, so no problem with them.
> >>
> >>   ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');  DROP
> >> SERVER loopback3 CASCADE;
> >>   NOTICE:  drop cascades to 2 other objects
> >> -DETAIL:  drop cascades to user mapping for postgres on server loopback3
> >> +DETAIL:  drop cascades to user mapping for bharath on server loopback3
> >>
> >> Attaching v2 patch for postgres_fdw_get_connections. Please have a look.
> > Hi
> >
> > I have a comment for the doc about postgres_fdw_get_connections.
> >
> > +    <term><function>postgres_fdw_get_connections(OUT server_name text, OUT valid boolean) returns setof
record</function></term>
> > +    <listitem>
> > +     <para>
> > +      This function returns the foreign server names of all the open
> > +      connections that <filename>postgres_fdw</filename> established from
> > +      the local session to the foreign servers. It also returns whether
> > +      each connection is valid or not. <literal>false</literal> is returned
> > +      if the foreign server connection is used in the current local
> > +      transaction but its foreign server or user mapping is changed or
> > +      dropped, and then such invalid connection will be closed at
> > +      the end of that transaction. <literal>true</literal> is returned
> > +      otherwise. If there are no open connections, no record is returned.
> > +      Example usage of the function:
> >
> > The doc seems does not memtion the case when the function returns NULL in server_name.
> > Users may be a little confused about why NULL was returned.
>
> Yes, so what about adding
>
>      (Note that the returned server name of invalid connection is NULL if its server is dropped)
>
> into the following (just after "dropped")?
>
> +      if the foreign server connection is used in the current local
> +      transaction but its foreign server or user mapping is changed or
> +      dropped
>
> Or better description?

+1 to add it after "dropped (Note ........)", how about as follows
with slight changes?

dropped (Note that server name of an invalid connection can be NULL if
the server is dropped), and then such .....

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/01/18 15:37, Bharath Rupireddy wrote:
> On Mon, Jan 18, 2021 at 11:58 AM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2021/01/18 15:02, Hou, Zhijie wrote:
>>>> We need to create the loopback3 with user mapping public, otherwise the
>>>> test might become unstable as shown below. Note that loopback and
>>>> loopback2 are not dropped in the test, so no problem with them.
>>>>
>>>>    ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');  DROP
>>>> SERVER loopback3 CASCADE;
>>>>    NOTICE:  drop cascades to 2 other objects
>>>> -DETAIL:  drop cascades to user mapping for postgres on server loopback3
>>>> +DETAIL:  drop cascades to user mapping for bharath on server loopback3
>>>>
>>>> Attaching v2 patch for postgres_fdw_get_connections. Please have a look.
>>> Hi
>>>
>>> I have a comment for the doc about postgres_fdw_get_connections.
>>>
>>> +    <term><function>postgres_fdw_get_connections(OUT server_name text, OUT valid boolean) returns setof
record</function></term>
>>> +    <listitem>
>>> +     <para>
>>> +      This function returns the foreign server names of all the open
>>> +      connections that <filename>postgres_fdw</filename> established from
>>> +      the local session to the foreign servers. It also returns whether
>>> +      each connection is valid or not. <literal>false</literal> is returned
>>> +      if the foreign server connection is used in the current local
>>> +      transaction but its foreign server or user mapping is changed or
>>> +      dropped, and then such invalid connection will be closed at
>>> +      the end of that transaction. <literal>true</literal> is returned
>>> +      otherwise. If there are no open connections, no record is returned.
>>> +      Example usage of the function:
>>>
>>> The doc seems does not memtion the case when the function returns NULL in server_name.
>>> Users may be a little confused about why NULL was returned.
>>
>> Yes, so what about adding
>>
>>       (Note that the returned server name of invalid connection is NULL if its server is dropped)
>>
>> into the following (just after "dropped")?
>>
>> +      if the foreign server connection is used in the current local
>> +      transaction but its foreign server or user mapping is changed or
>> +      dropped
>>
>> Or better description?
> 
> +1 to add it after "dropped (Note ........)", how about as follows
> with slight changes?
> 
> dropped (Note that server name of an invalid connection can be NULL if
> the server is dropped), and then such .....

Yes, I like this one. One question is; "should" or "is" is better than
"can" in this case because the server name of invalid connection is
always NULL when its server is dropped?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Mon, Jan 18, 2021 at 6:17 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > +1 to add it after "dropped (Note ........)", how about as follows
> > with slight changes?
> >
> > dropped (Note that server name of an invalid connection can be NULL if
> > the server is dropped), and then such .....
>
> Yes, I like this one. One question is; "should" or "is" is better than
> "can" in this case because the server name of invalid connection is
> always NULL when its server is dropped?

I think "dropped (Note that server name of an invalid connection will
be NULL if the server is dropped), and then such ....."

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Mon, Jan 18, 2021 at 11:44 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> > I will post patches for the other function postgres_fdw_disconnect,
> > GUC and server level option later.
>
> Thanks!

Attaching v12 patch set. 0001 is for postgres_fdw_disconnect()
function, 0002 is for keep_connections GUC and 0003 is for
keep_connection server level option.

Please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2021/01/18 23:14, Bharath Rupireddy wrote:
> On Mon, Jan 18, 2021 at 11:44 AM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>>> I will post patches for the other function postgres_fdw_disconnect,
>>> GUC and server level option later.
>>
>> Thanks!
> 
> Attaching v12 patch set. 0001 is for postgres_fdw_disconnect()
> function, 0002 is for keep_connections GUC and 0003 is for
> keep_connection server level option.

Thanks!

> 
> Please review it further.

+        server = GetForeignServerByName(servername, true);
+
+        if (!server)
+            ereport(ERROR,
+                    (errcode(ERRCODE_CONNECTION_DOES_NOT_EXIST),
+                     errmsg("foreign server \"%s\" does not exist", servername)));

ISTM we can simplify this code as follows.

     server = GetForeignServerByName(servername, false);


+    hash_seq_init(&scan, ConnectionHash);
+    while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))

When the server name is specified, even if its connection is successfully
closed, postgres_fdw_disconnect() scans through all the entries to check
whether there are active connections. But if "result" is true and
active_conn_exists is true, we can get out of this loop to avoid unnecessary
scans.


+    /*
+     * Destroy the cache if we discarded all active connections i.e. if there
+     * is no single active connection, which we can know while scanning the
+     * cached entries in the above loop. Destroying the cache is better than to
+     * keep it in the memory with all inactive entries in it to save some
+     * memory. Cache can get initialized on the subsequent queries to foreign
+     * server.

How much memory is assumed to be saved by destroying the cache in
many cases? I'm not sure if it's really worth destroying the cache to save
the memory.


+      a warning is issued and <literal>false</literal> is returned. <literal>false</literal>
+      is returned when there are no open connections. When there are some open
+      connections, but there is no connection for the given foreign server,
+      then <literal>false</literal> is returned. When no foreign server exists
+      with the given name, an error is emitted. Example usage of the function:

When a non-existent server name is specified, postgres_fdw_disconnect()
emits an error if there is at least one open connection, but just returns
false otherwise. At least for me, this behavior looks inconsitent and strange.
In that case, IMO the function always should emit an error.

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2021/01/18 22:03, Bharath Rupireddy wrote:
> On Mon, Jan 18, 2021 at 6:17 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> +1 to add it after "dropped (Note ........)", how about as follows
>>> with slight changes?
>>>
>>> dropped (Note that server name of an invalid connection can be NULL if
>>> the server is dropped), and then such .....
>>
>> Yes, I like this one. One question is; "should" or "is" is better than
>> "can" in this case because the server name of invalid connection is
>> always NULL when its server is dropped?
> 
> I think "dropped (Note that server name of an invalid connection will
> be NULL if the server is dropped), and then such ....."

Sounds good to me. So patch attached.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment
> >>> +1 to add it after "dropped (Note ........)", how about as follows
> >>> with slight changes?
> >>>
> >>> dropped (Note that server name of an invalid connection can be NULL
> >>> if the server is dropped), and then such .....
> >>
> >> Yes, I like this one. One question is; "should" or "is" is better
> >> than "can" in this case because the server name of invalid connection
> >> is always NULL when its server is dropped?
> >
> > I think "dropped (Note that server name of an invalid connection will
> > be NULL if the server is dropped), and then such ....."
> 
> Sounds good to me. So patch attached.

+1

Best regards,
houzj



On Mon, Jan 18, 2021 at 9:11 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > Attaching v12 patch set. 0001 is for postgres_fdw_disconnect()
> > function, 0002 is for keep_connections GUC and 0003 is for
> > keep_connection server level option.
>
> Thanks!
>
> >
> > Please review it further.
>
> +               server = GetForeignServerByName(servername, true);
> +
> +               if (!server)
> +                       ereport(ERROR,
> +                                       (errcode(ERRCODE_CONNECTION_DOES_NOT_EXIST),
> +                                        errmsg("foreign server \"%s\" does not exist", servername)));
>
> ISTM we can simplify this code as follows.
>
>      server = GetForeignServerByName(servername, false);

Done.

> +       hash_seq_init(&scan, ConnectionHash);
> +       while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
>
> When the server name is specified, even if its connection is successfully
> closed, postgres_fdw_disconnect() scans through all the entries to check
> whether there are active connections. But if "result" is true and
> active_conn_exists is true, we can get out of this loop to avoid unnecessary
> scans.

My initial thought was that it's possible to have two entries with the
same foreign server name but with different user mappings, looks like
it's not possible. I tried associating a foreign server with two
different user mappings [1], then the cache entry is getting
associated initially with the user mapping that comes first in the
pg_user_mappings, if this user mapping is dropped then the cache entry
gets invalidated, so next time the second user mapping is used.

Since there's no way we can have two cache entries with the same
foreign server name, we can get out of the loop when we find the cache
entry match with the given server. I made the changes.

[1]
postgres=# select * from pg_user_mappings ;
 umid  | srvid |  srvname  | umuser | usename | umoptions
-------+-------+-----------+--------+---------+-----------
 16395 | 16394 | loopback1 |     10 | bharath |    -----> cache entry
is initially made with this user mapping.
 16399 | 16394 | loopback1 |      0 | public  |       -----> if the
above user mapping is dropped, then the cache entry is made with this
user mapping.

> +       /*
> +        * Destroy the cache if we discarded all active connections i.e. if there
> +        * is no single active connection, which we can know while scanning the
> +        * cached entries in the above loop. Destroying the cache is better than to
> +        * keep it in the memory with all inactive entries in it to save some
> +        * memory. Cache can get initialized on the subsequent queries to foreign
> +        * server.
>
> How much memory is assumed to be saved by destroying the cache in
> many cases? I'm not sure if it's really worth destroying the cache to save
> the memory.

I removed the cache destroying code, if somebody complains in
future(after the feature commit), we can really revisit then.

> +      a warning is issued and <literal>false</literal> is returned. <literal>false</literal>
> +      is returned when there are no open connections. When there are some open
> +      connections, but there is no connection for the given foreign server,
> +      then <literal>false</literal> is returned. When no foreign server exists
> +      with the given name, an error is emitted. Example usage of the function:
>
> When a non-existent server name is specified, postgres_fdw_disconnect()
> emits an error if there is at least one open connection, but just returns
> false otherwise. At least for me, this behavior looks inconsitent and strange.
> In that case, IMO the function always should emit an error.

Done.

Attaching v13 patch set, please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2021/01/19 9:53, Hou, Zhijie wrote:
>>>>> +1 to add it after "dropped (Note ........)", how about as follows
>>>>> with slight changes?
>>>>>
>>>>> dropped (Note that server name of an invalid connection can be NULL
>>>>> if the server is dropped), and then such .....
>>>>
>>>> Yes, I like this one. One question is; "should" or "is" is better
>>>> than "can" in this case because the server name of invalid connection
>>>> is always NULL when its server is dropped?
>>>
>>> I think "dropped (Note that server name of an invalid connection will
>>> be NULL if the server is dropped), and then such ....."
>>
>> Sounds good to me. So patch attached.
> 
> +1

Thanks! I pushed the patch.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2021/01/19 12:09, Bharath Rupireddy wrote:
> On Mon, Jan 18, 2021 at 9:11 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> Attaching v12 patch set. 0001 is for postgres_fdw_disconnect()
>>> function, 0002 is for keep_connections GUC and 0003 is for
>>> keep_connection server level option.
>>
>> Thanks!
>>
>>>
>>> Please review it further.
>>
>> +               server = GetForeignServerByName(servername, true);
>> +
>> +               if (!server)
>> +                       ereport(ERROR,
>> +                                       (errcode(ERRCODE_CONNECTION_DOES_NOT_EXIST),
>> +                                        errmsg("foreign server \"%s\" does not exist", servername)));
>>
>> ISTM we can simplify this code as follows.
>>
>>       server = GetForeignServerByName(servername, false);
> 
> Done.
> 
>> +       hash_seq_init(&scan, ConnectionHash);
>> +       while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
>>
>> When the server name is specified, even if its connection is successfully
>> closed, postgres_fdw_disconnect() scans through all the entries to check
>> whether there are active connections. But if "result" is true and
>> active_conn_exists is true, we can get out of this loop to avoid unnecessary
>> scans.
> 
> My initial thought was that it's possible to have two entries with the
> same foreign server name but with different user mappings, looks like
> it's not possible. I tried associating a foreign server with two
> different user mappings [1], then the cache entry is getting
> associated initially with the user mapping that comes first in the
> pg_user_mappings, if this user mapping is dropped then the cache entry
> gets invalidated, so next time the second user mapping is used.
> 
> Since there's no way we can have two cache entries with the same
> foreign server name, we can get out of the loop when we find the cache
> entry match with the given server. I made the changes.

So, furthermore, we can use hash_search() to find the target cached
connection, instead of using hash_seq_search(), when the server name
is given. This would simplify the code a bit more? Of course,
hash_seq_search() is necessary when closing all the connections, though.


> 
> [1]
> postgres=# select * from pg_user_mappings ;
>   umid  | srvid |  srvname  | umuser | usename | umoptions
> -------+-------+-----------+--------+---------+-----------
>   16395 | 16394 | loopback1 |     10 | bharath |    -----> cache entry
> is initially made with this user mapping.
>   16399 | 16394 | loopback1 |      0 | public  |       -----> if the
> above user mapping is dropped, then the cache entry is made with this
> user mapping.
> 
>> +       /*
>> +        * Destroy the cache if we discarded all active connections i.e. if there
>> +        * is no single active connection, which we can know while scanning the
>> +        * cached entries in the above loop. Destroying the cache is better than to
>> +        * keep it in the memory with all inactive entries in it to save some
>> +        * memory. Cache can get initialized on the subsequent queries to foreign
>> +        * server.
>>
>> How much memory is assumed to be saved by destroying the cache in
>> many cases? I'm not sure if it's really worth destroying the cache to save
>> the memory.
> 
> I removed the cache destroying code, if somebody complains in
> future(after the feature commit), we can really revisit then.
> 
>> +      a warning is issued and <literal>false</literal> is returned. <literal>false</literal>
>> +      is returned when there are no open connections. When there are some open
>> +      connections, but there is no connection for the given foreign server,
>> +      then <literal>false</literal> is returned. When no foreign server exists
>> +      with the given name, an error is emitted. Example usage of the function:
>>
>> When a non-existent server name is specified, postgres_fdw_disconnect()
>> emits an error if there is at least one open connection, but just returns
>> false otherwise. At least for me, this behavior looks inconsitent and strange.
>> In that case, IMO the function always should emit an error.
> 
> Done.
> 
> Attaching v13 patch set, please review it further.

Thanks!

+ *    2) If no input argument is provided, then it tries to disconnect all the
+ *       connections.

I'm concerned that users can easily forget to specify the argument and
accidentally discard all the connections. So, IMO, to alleviate this situation,
what about changing the function name (only when closing all the connections)
to something postgres_fdw_disconnect_all(), like we have
pg_advisory_unlock_all() against pg_advisory_unlock()?

+            if (result)
+            {
+                /* We closed at least one connection, others are in use. */
+                ereport(WARNING,
+                        (errmsg("cannot close all connections because some of them are still in use")));
+            }

Sorry if this was already discussed upthread. Isn't it more helpful to
emit a warning for every connections that fail to be closed? For example,

WARNING:  cannot close connection for server "loopback1" because it is still in use
WARNING:  cannot close connection for server "loopback2" because it is still in use
WARNING:  cannot close connection for server "loopback3" because it is still in use
...

This enables us to identify easily which server connections cannot be
closed for now.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Wed, Jan 20, 2021 at 11:53 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> So, furthermore, we can use hash_search() to find the target cached
> connection, instead of using hash_seq_search(), when the server name
> is given. This would simplify the code a bit more? Of course,
> hash_seq_search() is necessary when closing all the connections, though.

Note that the cache entry key is user mapping oid and to use
hash_search() we need the user mapping oid. But in
postgres_fdw_disconnect we can get server oid and we can also get user
mapping id using GetUserMapping, but it requires
GetUserId()/CurrentUserId as an input, I doubt we will have problems
if CurrentUserId is changed somehow with the change of current user in
the session. And user mapping may be dropped but still the connection
can exist if it's in use, in that case GetUserMapping fails in cache
lookup.

And yes, disconnecting all connections requires hash_seq_search().

Keeping above in mind, I feel we can do hash_seq_search(), as we do
currently, even when the server name is given as input. This way, we
don't need to bother much on the above points.

Thoughts?

> + *     2) If no input argument is provided, then it tries to disconnect all the
> + *        connections.
>
> I'm concerned that users can easily forget to specify the argument and
> accidentally discard all the connections. So, IMO, to alleviate this situation,
> what about changing the function name (only when closing all the connections)
> to something postgres_fdw_disconnect_all(), like we have
> pg_advisory_unlock_all() against pg_advisory_unlock()?

+1. We will have two functions postgres_fdw_disconnect(server name),
postgres_fdw_disconnect_all.

> +                       if (result)
> +                       {
> +                               /* We closed at least one connection, others are in use. */
> +                               ereport(WARNING,
> +                                               (errmsg("cannot close all connections because some of them are still
inuse")));
 
> +                       }
>
> Sorry if this was already discussed upthread. Isn't it more helpful to
> emit a warning for every connections that fail to be closed? For example,
>
> WARNING:  cannot close connection for server "loopback1" because it is still in use
> WARNING:  cannot close connection for server "loopback2" because it is still in use
> WARNING:  cannot close connection for server "loopback3" because it is still in use
> ...
>
> This enables us to identify easily which server connections cannot be
> closed for now.

+1. Looks like pg_advisory_unlock is doing that. Given the fact that
still in use connections are possible only in explicit txns, we might
not have many still in use connections in the real world use case, so
I'm okay to change that way.

I will address all these comments and post an updated patch set soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/01/20 17:41, Bharath Rupireddy wrote:
> On Wed, Jan 20, 2021 at 11:53 AM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> So, furthermore, we can use hash_search() to find the target cached
>> connection, instead of using hash_seq_search(), when the server name
>> is given. This would simplify the code a bit more? Of course,
>> hash_seq_search() is necessary when closing all the connections, though.
> 
> Note that the cache entry key is user mapping oid and to use
> hash_search() we need the user mapping oid. But in
> postgres_fdw_disconnect we can get server oid and we can also get user
> mapping id using GetUserMapping, but it requires
> GetUserId()/CurrentUserId as an input, I doubt we will have problems
> if CurrentUserId is changed somehow with the change of current user in
> the session. And user mapping may be dropped but still the connection
> can exist if it's in use, in that case GetUserMapping fails in cache
> lookup.
> 
> And yes, disconnecting all connections requires hash_seq_search().
> 
> Keeping above in mind, I feel we can do hash_seq_search(), as we do
> currently, even when the server name is given as input. This way, we
> don't need to bother much on the above points.
> 
> Thoughts?

Thanks for explaining this! You're right. I'd withdraw my suggestion.

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Wed, Jan 20, 2021 at 3:24 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > Keeping above in mind, I feel we can do hash_seq_search(), as we do
> > currently, even when the server name is given as input. This way, we
> > don't need to bother much on the above points.
> >
> > Thoughts?
>
> Thanks for explaining this! You're right. I'd withdraw my suggestion.

Attaching v14 patch set with review comments addressed. Please review
it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2021/01/20 19:17, Bharath Rupireddy wrote:
> On Wed, Jan 20, 2021 at 3:24 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> Keeping above in mind, I feel we can do hash_seq_search(), as we do
>>> currently, even when the server name is given as input. This way, we
>>> don't need to bother much on the above points.
>>>
>>> Thoughts?
>>
>> Thanks for explaining this! You're right. I'd withdraw my suggestion.
> 
> Attaching v14 patch set with review comments addressed. Please review
> it further.

Thanks for updating the patch!

+ * It checks if the cache has a connection for the given foreign server that's
+ * not being used within current transaction, then returns true. If the
+ * connection is in use, then it emits a warning and returns false.

The comment also should mention the case where no open connection
for the given server is found? What about rewriting this to the following?

---------------------
If the cached connection for the given foreign server is found and has not
been used within current transaction yet, close the connection and return
true. Even when it's found, if it's already used, keep the connection, emit
a warning and return false. If it's not found, return false.
---------------------

+ * It returns true, if it closes at least one connection, otherwise false.
+ *
+ * It returns false, if the cache doesn't exit.

The above second comment looks redundant.

+    if (ConnectionHash)
+        result = disconnect_cached_connections(0, true);

Isn't it smarter to make disconnect_cached_connections() check
ConnectionHash and return false if it's NULL? If we do that, we can
simplify the code of postgres_fdw_disconnect() and _all().

+ * current transaction are disconnected. Otherwise, the unused entries with the
+ * given hashvalue are disconnected.

In the above second comment, a singular form should be used instead?
Because there must be no multiple entries with the given hashvalue.

+                server = GetForeignServer(entry->serverid);

This seems to cause an error "cache lookup failed"
if postgres_fdw_disconnect_all() is called when there is
a connection in use but its server is dropped. To avoid this error,
GetForeignServerExtended() with FSV_MISSING_OK should be used
instead, like postgres_fdw_get_connections() does?

+            if (entry->server_hashvalue == hashvalue &&
+                (entry->xact_depth > 0 || result))
+            {
+                hash_seq_term(&scan);
+                break;

entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
specifies 0 as hashvalue, ISTM that the above condition can be true
unexpectedly. Can we replace this condition with just "if (!all)"?

+-- Closes loopback connection, returns true and issues a warning as loopback2
+-- connection is still in use and can not be closed.
+SELECT * FROM postgres_fdw_disconnect_all();
+WARNING:  cannot close connection for server "loopback2" because it is still in use
+ postgres_fdw_disconnect_all
+-----------------------------
+ t
+(1 row)

After the above test, isn't it better to call postgres_fdw_get_connections()
to check that loopback is not output?

+WARNING:  cannot close connection for server "loopback" because it is still in use
+WARNING:  cannot close connection for server "loopback2" because it is still in use

Just in the case please let me confirm that the order of these warning
messages is always stable?

+   <varlistentry>
+    <term><function>postgres_fdw_disconnect(IN servername text) returns boolean</function></term>

I think that "IN" of "IN servername text" is not necessary.

I'd like to replace "servername" with "server_name" because
postgres_fdw_get_connections() uses "server_name" as the output
column name.

+    <listitem>
+     <para>
+      When called in local session with foreign server name as input, it
+      discards the unused open connection previously made to the foreign server
+      and returns <literal>true</literal>.

"unused open connection" sounds confusing to me. What about the following?

---------------------
This function discards the open connection that postgres_fdw established
from the local session to the foriegn server with the given name if it's not
used in the current local transaction yet, and then returns true. If it's
already used, the function doesn't discard the connection, emits
a warning and then returns false. If there is no open connection to
the given foreign server, false is returned. If no foreign server with
the given name is found, an error is emitted. Example usage of the function:
---------------------

+postgres=# SELECT * FROM postgres_fdw_disconnect('loopback1');

"SELECT postgres_fdw_disconnect('loopback1')" is more common?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Wed, Jan 20, 2021 at 6:58 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> + * It checks if the cache has a connection for the given foreign server that's
> + * not being used within current transaction, then returns true. If the
> + * connection is in use, then it emits a warning and returns false.
>
> The comment also should mention the case where no open connection
> for the given server is found? What about rewriting this to the following?
>
> ---------------------
> If the cached connection for the given foreign server is found and has not
> been used within current transaction yet, close the connection and return
> true. Even when it's found, if it's already used, keep the connection, emit
> a warning and return false. If it's not found, return false.
> ---------------------

Done.

> + * It returns true, if it closes at least one connection, otherwise false.
> + *
> + * It returns false, if the cache doesn't exit.
>
> The above second comment looks redundant.

Yes. "otherwise false" means it.

> +       if (ConnectionHash)
> +               result = disconnect_cached_connections(0, true);
>
> Isn't it smarter to make disconnect_cached_connections() check
> ConnectionHash and return false if it's NULL? If we do that, we can
> simplify the code of postgres_fdw_disconnect() and _all().

Done.

> + * current transaction are disconnected. Otherwise, the unused entries with the
> + * given hashvalue are disconnected.
>
> In the above second comment, a singular form should be used instead?
> Because there must be no multiple entries with the given hashvalue.

Rephrased the function comment a bit. Mentioned the _disconnect and
_disconnect_all in comments because we have said enough what each of
those two functions do.

+/*
+ * Workhorse to disconnect cached connections.
+ *
+ * This function disconnects either all unused connections when called from
+ * postgres_fdw_disconnect_all or a given foreign server unused connection when
+ * called from postgres_fdw_disconnect.
+ *
+ * This function returns true if at least one connection is disconnected,
+ * otherwise false.
+ */

> +                               server = GetForeignServer(entry->serverid);
>
> This seems to cause an error "cache lookup failed"
> if postgres_fdw_disconnect_all() is called when there is
> a connection in use but its server is dropped. To avoid this error,
> GetForeignServerExtended() with FSV_MISSING_OK should be used
> instead, like postgres_fdw_get_connections() does?

+1.  So, I changed it to GetForeignServerExtended, added an assertion
for invalidation  just like postgres_fdw_get_connections. I also added
a test case for this, we now emit a slightly different warning for
this case alone that is (errmsg("cannot close dropped server
connection because it is still in use")));. This warning looks okay as
we cannot show any other server name in the ouput and we know that
this rare case can exist when someone drops the server in an explicit
transaction.

> +                       if (entry->server_hashvalue == hashvalue &&
> +                               (entry->xact_depth > 0 || result))
> +                       {
> +                               hash_seq_term(&scan);
> +                               break;
>
> entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
> specifies 0 as hashvalue, ISTM that the above condition can be true
> unexpectedly. Can we replace this condition with just "if (!all)"?

I don't think so entry->server_hashvalue can be zero, because
GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
as hash value. I have not seen someone comparing hashvalue with an
expectation that it has 0 value, for instance see if (hashvalue == 0
|| riinfo->oidHashValue == hashvalue).

 Having if(!all) something like below there doesn't suffice because we
might call hash_seq_term, when some connection other than the given
foreign server connection is in use. Our intention to call
hash_seq_term is only when a given server is found and either it's in
use or is closed.

             if (!all && (entry->xact_depth > 0 || result))
            {
                hash_seq_term(&scan);
                break;
            }

Given the above points, the existing check looks good to me.

> +-- Closes loopback connection, returns true and issues a warning as loopback2
> +-- connection is still in use and can not be closed.
> +SELECT * FROM postgres_fdw_disconnect_all();
> +WARNING:  cannot close connection for server "loopback2" because it is still in use
> + postgres_fdw_disconnect_all
> +-----------------------------
> + t
> +(1 row)
>
> After the above test, isn't it better to call postgres_fdw_get_connections()
> to check that loopback is not output?

+1.

> +WARNING:  cannot close connection for server "loopback" because it is still in use
> +WARNING:  cannot close connection for server "loopback2" because it is still in use
>
> Just in the case please let me confirm that the order of these warning
> messages is always stable?

I think the order of the above warnings depends on how the connections
are stored in cache and we emit the warnings. Looks like new cached
connections are stored at the beginning of the cache always and the
warnings also will show up in that order i.e new entries to old
entries. I think it's stable and I didn't see cfbot complaining about
that on v14.

> +   <varlistentry>
> +    <term><function>postgres_fdw_disconnect(IN servername text) returns boolean</function></term>
>
> I think that "IN" of "IN servername text" is not necessary.

Done.

> I'd like to replace "servername" with "server_name" because
> postgres_fdw_get_connections() uses "server_name" as the output
> column name.

Done.

> +    <listitem>
> +     <para>
> +      When called in local session with foreign server name as input, it
> +      discards the unused open connection previously made to the foreign server
> +      and returns <literal>true</literal>.
>
> "unused open connection" sounds confusing to me. What about the following?
>
> ---------------------
> This function discards the open connection that postgres_fdw established
> from the local session to the foriegn server with the given name if it's not
> used in the current local transaction yet, and then returns true. If it's
> already used, the function doesn't discard the connection, emits
> a warning and then returns false. If there is no open connection to
> the given foreign server, false is returned. If no foreign server with
> the given name is found, an error is emitted. Example usage of the function:
> ---------------------

Done.

> +postgres=# SELECT * FROM postgres_fdw_disconnect('loopback1');
>
> "SELECT postgres_fdw_disconnect('loopback1')" is more common?

Done.

Attaching v15 patch set. Please consider it for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2021/01/21 12:00, Bharath Rupireddy wrote:
> On Wed, Jan 20, 2021 at 6:58 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> + * It checks if the cache has a connection for the given foreign server that's
>> + * not being used within current transaction, then returns true. If the
>> + * connection is in use, then it emits a warning and returns false.
>>
>> The comment also should mention the case where no open connection
>> for the given server is found? What about rewriting this to the following?
>>
>> ---------------------
>> If the cached connection for the given foreign server is found and has not
>> been used within current transaction yet, close the connection and return
>> true. Even when it's found, if it's already used, keep the connection, emit
>> a warning and return false. If it's not found, return false.
>> ---------------------
> 
> Done.
> 
>> + * It returns true, if it closes at least one connection, otherwise false.
>> + *
>> + * It returns false, if the cache doesn't exit.
>>
>> The above second comment looks redundant.
> 
> Yes. "otherwise false" means it.
> 
>> +       if (ConnectionHash)
>> +               result = disconnect_cached_connections(0, true);
>>
>> Isn't it smarter to make disconnect_cached_connections() check
>> ConnectionHash and return false if it's NULL? If we do that, we can
>> simplify the code of postgres_fdw_disconnect() and _all().
> 
> Done.
> 
>> + * current transaction are disconnected. Otherwise, the unused entries with the
>> + * given hashvalue are disconnected.
>>
>> In the above second comment, a singular form should be used instead?
>> Because there must be no multiple entries with the given hashvalue.
> 
> Rephrased the function comment a bit. Mentioned the _disconnect and
> _disconnect_all in comments because we have said enough what each of
> those two functions do.
> 
> +/*
> + * Workhorse to disconnect cached connections.
> + *
> + * This function disconnects either all unused connections when called from
> + * postgres_fdw_disconnect_all or a given foreign server unused connection when
> + * called from postgres_fdw_disconnect.
> + *
> + * This function returns true if at least one connection is disconnected,
> + * otherwise false.
> + */
> 
>> +                               server = GetForeignServer(entry->serverid);
>>
>> This seems to cause an error "cache lookup failed"
>> if postgres_fdw_disconnect_all() is called when there is
>> a connection in use but its server is dropped. To avoid this error,
>> GetForeignServerExtended() with FSV_MISSING_OK should be used
>> instead, like postgres_fdw_get_connections() does?
> 
> +1.  So, I changed it to GetForeignServerExtended, added an assertion
> for invalidation  just like postgres_fdw_get_connections. I also added
> a test case for this, we now emit a slightly different warning for
> this case alone that is (errmsg("cannot close dropped server
> connection because it is still in use")));. This warning looks okay as
> we cannot show any other server name in the ouput and we know that
> this rare case can exist when someone drops the server in an explicit
> transaction.
> 
>> +                       if (entry->server_hashvalue == hashvalue &&
>> +                               (entry->xact_depth > 0 || result))
>> +                       {
>> +                               hash_seq_term(&scan);
>> +                               break;
>>
>> entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
>> specifies 0 as hashvalue, ISTM that the above condition can be true
>> unexpectedly. Can we replace this condition with just "if (!all)"?
> 
> I don't think so entry->server_hashvalue can be zero, because
> GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
> as hash value. I have not seen someone comparing hashvalue with an
> expectation that it has 0 value, for instance see if (hashvalue == 0
> || riinfo->oidHashValue == hashvalue).
> 
>   Having if(!all) something like below there doesn't suffice because we
> might call hash_seq_term, when some connection other than the given
> foreign server connection is in use.

No because we check the following condition before reaching that code. No?

+        if ((all || entry->server_hashvalue == hashvalue) &&


I was thinking that "(entry->xact_depth > 0 || result))" condition is not
necessary because "result" is set to true when xact_depth <= 0 and that
condition always indicates true.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



> Attaching v15 patch set. Please consider it for further review.

Hi

I have some comments for the 0001 patch

In v15-0001-postgres_fdw-function-to-discard-cached-connecti

1.
+      If there is no open connection to the given foreign server, <literal>false</literal>
+      is returned. If no foreign server with the given name is found, an error

Do you think it's better add some testcases about:
    call postgres_fdw_disconnect and postgres_fdw_disconnect_all when there is no open connection to the given foreign
server

2.
+            /*
+             * For the given server, if we closed connection or it is still in
+             * use, then no need of scanning the cache further.
+             */
+            if (entry->server_hashvalue == hashvalue &&
+                (entry->xact_depth > 0 || result))
+            {
+                hash_seq_term(&scan);
+                break;
+            }

If I am not wrong, is the following condition always true ?
    (entry->xact_depth > 0 || result)

Best regards,
houzj




On Thu, Jan 21, 2021 at 10:06 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
 > >> +                       if (entry->server_hashvalue == hashvalue &&
> >> +                               (entry->xact_depth > 0 || result))
> >> +                       {
> >> +                               hash_seq_term(&scan);
> >> +                               break;
> >>
> >> entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
> >> specifies 0 as hashvalue, ISTM that the above condition can be true
> >> unexpectedly. Can we replace this condition with just "if (!all)"?
> >
> > I don't think so entry->server_hashvalue can be zero, because
> > GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
> > as hash value. I have not seen someone comparing hashvalue with an
> > expectation that it has 0 value, for instance see if (hashvalue == 0
> > || riinfo->oidHashValue == hashvalue).
> >
> >   Having if(!all) something like below there doesn't suffice because we
> > might call hash_seq_term, when some connection other than the given
> > foreign server connection is in use.
>
> No because we check the following condition before reaching that code. No?
>
> +               if ((all || entry->server_hashvalue == hashvalue) &&
>
>
> I was thinking that "(entry->xact_depth > 0 || result))" condition is not
> necessary because "result" is set to true when xact_depth <= 0 and that
> condition always indicates true.

I think that condition is too confusing. How about having a boolean
can_terminate_scan like below?

    while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
    {
        bool    can_terminate_scan = false;

        /*
         * Either disconnect given or all the active and not in use cached
         * connections.
         */
        if ((all || entry->server_hashvalue == hashvalue) &&
             entry->conn)
        {
            /* We cannot close connection that's in use, so issue a warning. */
            if (entry->xact_depth > 0)
            {
                ForeignServer *server;

                if (!all)
                    can_terminate_scan = true;

                server = GetForeignServerExtended(entry->serverid,
                                                  FSV_MISSING_OK);

                if (!server)
                {
                    /*
                     * If the server has been dropped in the current explicit
                     * transaction, then this entry would have been invalidated
                     * in pgfdw_inval_callback at the end of drop sever
                     * command. Note that this connection would not have been
                     * closed in pgfdw_inval_callback because it is still being
                     * used in the current explicit transaction. So, assert
                     * that here.
                     */
                    Assert(entry->invalidated);

                    ereport(WARNING,
                            (errmsg("cannot close dropped server
connection because it is still in use")));
                }
                else
                    ereport(WARNING,
                            (errmsg("cannot close connection for
server \"%s\" because it is still in use",
                             server->servername)));
            }
            else
            {
                elog(DEBUG3, "discarding connection %p", entry->conn);
                disconnect_pg_server(entry);
                result = true;

                if (!all)
                    can_terminate_scan = true;
            }

            /*
             * For the given server, if we closed connection or it is still in
             * use, then no need of scanning the cache further.
             */
            if (can_terminate_scan)
            {
                hash_seq_term(&scan);
                break;
            }
        }
    }

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Thu, Jan 21, 2021 at 11:15 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> > Attaching v15 patch set. Please consider it for further review.
>
> Hi
>
> I have some comments for the 0001 patch
>
> In v15-0001-postgres_fdw-function-to-discard-cached-connecti
>
> 1.
> +      If there is no open connection to the given foreign server, <literal>false</literal>
> +      is returned. If no foreign server with the given name is found, an error
>
> Do you think it's better add some testcases about:
>         call postgres_fdw_disconnect and postgres_fdw_disconnect_all when there is no open connection to the given
foreignserver
 

Do you mean a test case where foreign server exists but
postgres_fdw_disconnect() returns false because there's no connection
for that server?

> 2.
> +                       /*
> +                        * For the given server, if we closed connection or it is still in
> +                        * use, then no need of scanning the cache further.
> +                        */
> +                       if (entry->server_hashvalue == hashvalue &&
> +                               (entry->xact_depth > 0 || result))
> +                       {
> +                               hash_seq_term(&scan);
> +                               break;
> +                       }
>
> If I am not wrong, is the following condition always true ?
>         (entry->xact_depth > 0 || result)

It's not always true. But it seems like it's too confusing, please
have a look at the upthread suggestion to change this with
can_terminate_scan boolean.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



> > > Attaching v15 patch set. Please consider it for further review.
> >
> > Hi
> >
> > I have some comments for the 0001 patch
> >
> > In v15-0001-postgres_fdw-function-to-discard-cached-connecti
> >
> > 1.
> > +      If there is no open connection to the given foreign server,
> <literal>false</literal>
> > +      is returned. If no foreign server with the given name is found,
> > + an error
> >
> > Do you think it's better add some testcases about:
> >         call postgres_fdw_disconnect and postgres_fdw_disconnect_all
> > when there is no open connection to the given foreign server
> 
> Do you mean a test case where foreign server exists but
> postgres_fdw_disconnect() returns false because there's no connection for
> that server?


Yes, I read this from the doc, so I think it's better to test this.




> > 2.
> > +                       /*
> > +                        * For the given server, if we closed connection
> or it is still in
> > +                        * use, then no need of scanning the cache
> further.
> > +                        */
> > +                       if (entry->server_hashvalue == hashvalue &&
> > +                               (entry->xact_depth > 0 || result))
> > +                       {
> > +                               hash_seq_term(&scan);
> > +                               break;
> > +                       }
> >
> > If I am not wrong, is the following condition always true ?
> >         (entry->xact_depth > 0 || result)
> 
> It's not always true. But it seems like it's too confusing, please have
> a look at the upthread suggestion to change this with can_terminate_scan
> boolean.

Thanks for the remind, I will look at that.



Best regards,
houzj




On 2021/01/21 14:46, Bharath Rupireddy wrote:
> On Thu, Jan 21, 2021 at 10:06 AM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>   > >> +                       if (entry->server_hashvalue == hashvalue &&
>>>> +                               (entry->xact_depth > 0 || result))
>>>> +                       {
>>>> +                               hash_seq_term(&scan);
>>>> +                               break;
>>>>
>>>> entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
>>>> specifies 0 as hashvalue, ISTM that the above condition can be true
>>>> unexpectedly. Can we replace this condition with just "if (!all)"?
>>>
>>> I don't think so entry->server_hashvalue can be zero, because
>>> GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
>>> as hash value. I have not seen someone comparing hashvalue with an
>>> expectation that it has 0 value, for instance see if (hashvalue == 0
>>> || riinfo->oidHashValue == hashvalue).
>>>
>>>    Having if(!all) something like below there doesn't suffice because we
>>> might call hash_seq_term, when some connection other than the given
>>> foreign server connection is in use.
>>
>> No because we check the following condition before reaching that code. No?
>>
>> +               if ((all || entry->server_hashvalue == hashvalue) &&
>>
>>
>> I was thinking that "(entry->xact_depth > 0 || result))" condition is not
>> necessary because "result" is set to true when xact_depth <= 0 and that
>> condition always indicates true.
> 
> I think that condition is too confusing. How about having a boolean
> can_terminate_scan like below?

Thanks for thinking this. But at least for me, "if (!all)" looks not so confusing.
And the comment seems to explain why we can end the scan.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Thu, Jan 21, 2021 at 12:17 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> On 2021/01/21 14:46, Bharath Rupireddy wrote:
> > On Thu, Jan 21, 2021 at 10:06 AM Fujii Masao
> > <masao.fujii@oss.nttdata.com> wrote:
> >   > >> +                       if (entry->server_hashvalue == hashvalue &&
> >>>> +                               (entry->xact_depth > 0 || result))
> >>>> +                       {
> >>>> +                               hash_seq_term(&scan);
> >>>> +                               break;
> >>>>
> >>>> entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
> >>>> specifies 0 as hashvalue, ISTM that the above condition can be true
> >>>> unexpectedly. Can we replace this condition with just "if (!all)"?
> >>>
> >>> I don't think so entry->server_hashvalue can be zero, because
> >>> GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
> >>> as hash value. I have not seen someone comparing hashvalue with an
> >>> expectation that it has 0 value, for instance see if (hashvalue == 0
> >>> || riinfo->oidHashValue == hashvalue).
> >>>
> >>>    Having if(!all) something like below there doesn't suffice because we
> >>> might call hash_seq_term, when some connection other than the given
> >>> foreign server connection is in use.
> >>
> >> No because we check the following condition before reaching that code. No?
> >>
> >> +               if ((all || entry->server_hashvalue == hashvalue) &&
> >>
> >>
> >> I was thinking that "(entry->xact_depth > 0 || result))" condition is not
> >> necessary because "result" is set to true when xact_depth <= 0 and that
> >> condition always indicates true.
> >
> > I think that condition is too confusing. How about having a boolean
> > can_terminate_scan like below?
>
> Thanks for thinking this. But at least for me, "if (!all)" looks not so confusing.
> And the comment seems to explain why we can end the scan.

May I know if it's okay to have the boolean can_terminate_scan as shown in [1]?

[1] -
https://www.postgresql.org/message-id/flat/CALj2ACVx0%2BiOsrAA-wXbo3RLAKqUoNvvEd7foJ0vLwOdu8XjXw%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/01/21 16:16, Bharath Rupireddy wrote:
> On Thu, Jan 21, 2021 at 12:17 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> On 2021/01/21 14:46, Bharath Rupireddy wrote:
>>> On Thu, Jan 21, 2021 at 10:06 AM Fujii Masao
>>> <masao.fujii@oss.nttdata.com> wrote:
>>>    > >> +                       if (entry->server_hashvalue == hashvalue &&
>>>>>> +                               (entry->xact_depth > 0 || result))
>>>>>> +                       {
>>>>>> +                               hash_seq_term(&scan);
>>>>>> +                               break;
>>>>>>
>>>>>> entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
>>>>>> specifies 0 as hashvalue, ISTM that the above condition can be true
>>>>>> unexpectedly. Can we replace this condition with just "if (!all)"?
>>>>>
>>>>> I don't think so entry->server_hashvalue can be zero, because
>>>>> GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
>>>>> as hash value. I have not seen someone comparing hashvalue with an
>>>>> expectation that it has 0 value, for instance see if (hashvalue == 0
>>>>> || riinfo->oidHashValue == hashvalue).
>>>>>
>>>>>     Having if(!all) something like below there doesn't suffice because we
>>>>> might call hash_seq_term, when some connection other than the given
>>>>> foreign server connection is in use.
>>>>
>>>> No because we check the following condition before reaching that code. No?
>>>>
>>>> +               if ((all || entry->server_hashvalue == hashvalue) &&
>>>>
>>>>
>>>> I was thinking that "(entry->xact_depth > 0 || result))" condition is not
>>>> necessary because "result" is set to true when xact_depth <= 0 and that
>>>> condition always indicates true.
>>>
>>> I think that condition is too confusing. How about having a boolean
>>> can_terminate_scan like below?
>>
>> Thanks for thinking this. But at least for me, "if (!all)" looks not so confusing.
>> And the comment seems to explain why we can end the scan.
> 
> May I know if it's okay to have the boolean can_terminate_scan as shown in [1]?

My opinion is to check "!all", but if others prefer using such boolean flag,
I'd withdraw my opinion.

+        if ((all || entry->server_hashvalue == hashvalue) &&

What about making disconnect_cached_connections() accept serverid instead
of hashvalue, and perform the above comparison based on serverid? That is,
I'm thinking "if (all || entry->serverid == serverid)". If we do that, we can
simplify postgres_fdw_disconnect() a bit more by getting rid of the calculation
of hashvalue.

+        if ((all || entry->server_hashvalue == hashvalue) &&
+             entry->conn)

I think that it's better to make the check of "entry->conn" independent
like other functions in postgres_fdw/connection.c. What about adding
the following check before the above?

        /* Ignore cache entry if no open connection right now */
        if (entry->conn == NULL)
            continue;

+                    /*
+                     * If the server has been dropped in the current explicit
+                     * transaction, then this entry would have been invalidated
+                     * in pgfdw_inval_callback at the end of drop sever
+                     * command. Note that this connection would not have been
+                     * closed in pgfdw_inval_callback because it is still being
+                     * used in the current explicit transaction. So, assert
+                     * that here.
+                     */
+                    Assert(entry->invalidated);

As this comment explains, even when the connection is used in the transaction,
its server can be dropped in the same transaction. The connection can remain
until the end of transaction even though its server has been already dropped.
I'm now wondering if this behavior itself is problematic and should be forbid.
Of course, this is separate topic from this patch, though..

BTW, my just idea for that is;
1. change postgres_fdw_get_connections() return also serverid and xact_depth.
2. make postgres_fdw define the event trigger on DROP SERVER command so that
     an error is thrown if the connection to the server is still in use.
     The event trigger function uses postgres_fdw_get_connections() to check
     if the server connection is still in use or not.

I'm not sure if this just idea is really feasible or not, though...

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Thu, Jan 21, 2021 at 8:58 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> My opinion is to check "!all", but if others prefer using such boolean flag,
> I'd withdraw my opinion.

I'm really sorry, actually if (!all) is enough there, my earlier
understanding was wrong.

> +               if ((all || entry->server_hashvalue == hashvalue) &&
>
> What about making disconnect_cached_connections() accept serverid instead
> of hashvalue, and perform the above comparison based on serverid? That is,
> I'm thinking "if (all || entry->serverid == serverid)". If we do that, we can
> simplify postgres_fdw_disconnect() a bit more by getting rid of the calculation
> of hashvalue.

That's a good idea. I missed this point. Thanks.

> +               if ((all || entry->server_hashvalue == hashvalue) &&
> +                        entry->conn)
>
> I think that it's better to make the check of "entry->conn" independent
> like other functions in postgres_fdw/connection.c. What about adding
> the following check before the above?
>
>                 /* Ignore cache entry if no open connection right now */
>                 if (entry->conn == NULL)
>                         continue;

Done.

> +                                       /*
> +                                        * If the server has been dropped in the current explicit
> +                                        * transaction, then this entry would have been invalidated
> +                                        * in pgfdw_inval_callback at the end of drop sever
> +                                        * command. Note that this connection would not have been
> +                                        * closed in pgfdw_inval_callback because it is still being
> +                                        * used in the current explicit transaction. So, assert
> +                                        * that here.
> +                                        */
> +                                       Assert(entry->invalidated);
>
> As this comment explains, even when the connection is used in the transaction,
> its server can be dropped in the same transaction. The connection can remain
> until the end of transaction even though its server has been already dropped.
> I'm now wondering if this behavior itself is problematic and should be forbid.
> Of course, this is separate topic from this patch, though..
>
> BTW, my just idea for that is;
> 1. change postgres_fdw_get_connections() return also serverid and xact_depth.
> 2. make postgres_fdw define the event trigger on DROP SERVER command so that
>      an error is thrown if the connection to the server is still in use.
>      The event trigger function uses postgres_fdw_get_connections() to check
>      if the server connection is still in use or not.
>
> I'm not sure if this just idea is really feasible or not, though...

I'm not quite sure if we can create such a dependency i.e. blocking
"drop foreign server" when at least one session has an in use cached
connection on it? What if a user wants to drop a server from one
session, all other sessions one after the other keep having in-use
connections related to that server, (though this use case sounds
impractical) will the drop server ever be successful? Since we can
have hundreds of sessions in real world postgres environment, I don't
know if it's a good idea to create such dependency.

As you suggested, this point can be discussed in a separate thread and
if any of the approaches proposed by you above is finalized we can
extend postgres_fdw_get_connections anytime.

Thoughts?

Attaching v16 patch set, addressing above review comments and also
added a test case suggested upthread that postgres_fdw_disconnect()
with existing server name returns false that is when the cache doesn't
have active connection.

Please review the v16 patch set further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2021/01/22 1:17, Bharath Rupireddy wrote:
> On Thu, Jan 21, 2021 at 8:58 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> My opinion is to check "!all", but if others prefer using such boolean flag,
>> I'd withdraw my opinion.
> 
> I'm really sorry, actually if (!all) is enough there, my earlier
> understanding was wrong.
> 
>> +               if ((all || entry->server_hashvalue == hashvalue) &&
>>
>> What about making disconnect_cached_connections() accept serverid instead
>> of hashvalue, and perform the above comparison based on serverid? That is,
>> I'm thinking "if (all || entry->serverid == serverid)". If we do that, we can
>> simplify postgres_fdw_disconnect() a bit more by getting rid of the calculation
>> of hashvalue.
> 
> That's a good idea. I missed this point. Thanks.
> 
>> +               if ((all || entry->server_hashvalue == hashvalue) &&
>> +                        entry->conn)
>>
>> I think that it's better to make the check of "entry->conn" independent
>> like other functions in postgres_fdw/connection.c. What about adding
>> the following check before the above?
>>
>>                  /* Ignore cache entry if no open connection right now */
>>                  if (entry->conn == NULL)
>>                          continue;
> 
> Done.
> 
>> +                                       /*
>> +                                        * If the server has been dropped in the current explicit
>> +                                        * transaction, then this entry would have been invalidated
>> +                                        * in pgfdw_inval_callback at the end of drop sever
>> +                                        * command. Note that this connection would not have been
>> +                                        * closed in pgfdw_inval_callback because it is still being
>> +                                        * used in the current explicit transaction. So, assert
>> +                                        * that here.
>> +                                        */
>> +                                       Assert(entry->invalidated);
>>
>> As this comment explains, even when the connection is used in the transaction,
>> its server can be dropped in the same transaction. The connection can remain
>> until the end of transaction even though its server has been already dropped.
>> I'm now wondering if this behavior itself is problematic and should be forbid.
>> Of course, this is separate topic from this patch, though..
>>
>> BTW, my just idea for that is;
>> 1. change postgres_fdw_get_connections() return also serverid and xact_depth.
>> 2. make postgres_fdw define the event trigger on DROP SERVER command so that
>>       an error is thrown if the connection to the server is still in use.
>>       The event trigger function uses postgres_fdw_get_connections() to check
>>       if the server connection is still in use or not.
>>
>> I'm not sure if this just idea is really feasible or not, though...
> 
> I'm not quite sure if we can create such a dependency i.e. blocking
> "drop foreign server" when at least one session has an in use cached
> connection on it?

Maybe my explanation was not clear... I was thinking to prevent the server whose connection is used *within the current
transaction*from being dropped. IOW, I was thinking to forbid the drop of server if xact_depth of its connection is
morethan one. So one session can drop the server even when its connection is open in other session if it's not used
withinthe transaction (i.e., xact_depth == 0).
 

BTW, for now, if the connection is used within the transaction, other session cannot drop the corresponding server
becausethe transaction holds the lock on the relations that depend on the server. Only the session running that
transactioncan drop the server. This can cause the issue in discussion.
 

So, my just idea is to disallow even that session running the transaction to drop the server. This means that no
sessioncan drop the server while its connection is used within the transaction (xact_depth > 0).
 


> What if a user wants to drop a server from one
> session, all other sessions one after the other keep having in-use
> connections related to that server, (though this use case sounds
> impractical) will the drop server ever be successful? Since we can
> have hundreds of sessions in real world postgres environment, I don't
> know if it's a good idea to create such dependency.
> 
> As you suggested, this point can be discussed in a separate thread and
> if any of the approaches proposed by you above is finalized we can
> extend postgres_fdw_get_connections anytime.
> 
> Thoughts?

I will consider more before starting separate discussion!


> 
> Attaching v16 patch set, addressing above review comments and also
> added a test case suggested upthread that postgres_fdw_disconnect()
> with existing server name returns false that is when the cache doesn't
> have active connection.
> 
> Please review the v16 patch set further.

Thanks! Will review that later.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2021/01/22 3:29, Fujii Masao wrote:
> 
> 
> On 2021/01/22 1:17, Bharath Rupireddy wrote:
>> On Thu, Jan 21, 2021 at 8:58 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> My opinion is to check "!all", but if others prefer using such boolean flag,
>>> I'd withdraw my opinion.
>>
>> I'm really sorry, actually if (!all) is enough there, my earlier
>> understanding was wrong.
>>
>>> +               if ((all || entry->server_hashvalue == hashvalue) &&
>>>
>>> What about making disconnect_cached_connections() accept serverid instead
>>> of hashvalue, and perform the above comparison based on serverid? That is,
>>> I'm thinking "if (all || entry->serverid == serverid)". If we do that, we can
>>> simplify postgres_fdw_disconnect() a bit more by getting rid of the calculation
>>> of hashvalue.
>>
>> That's a good idea. I missed this point. Thanks.
>>
>>> +               if ((all || entry->server_hashvalue == hashvalue) &&
>>> +                        entry->conn)
>>>
>>> I think that it's better to make the check of "entry->conn" independent
>>> like other functions in postgres_fdw/connection.c. What about adding
>>> the following check before the above?
>>>
>>>                  /* Ignore cache entry if no open connection right now */
>>>                  if (entry->conn == NULL)
>>>                          continue;
>>
>> Done.
>>
>>> +                                       /*
>>> +                                        * If the server has been dropped in the current explicit
>>> +                                        * transaction, then this entry would have been invalidated
>>> +                                        * in pgfdw_inval_callback at the end of drop sever
>>> +                                        * command. Note that this connection would not have been
>>> +                                        * closed in pgfdw_inval_callback because it is still being
>>> +                                        * used in the current explicit transaction. So, assert
>>> +                                        * that here.
>>> +                                        */
>>> +                                       Assert(entry->invalidated);
>>>
>>> As this comment explains, even when the connection is used in the transaction,
>>> its server can be dropped in the same transaction. The connection can remain
>>> until the end of transaction even though its server has been already dropped.
>>> I'm now wondering if this behavior itself is problematic and should be forbid.
>>> Of course, this is separate topic from this patch, though..
>>>
>>> BTW, my just idea for that is;
>>> 1. change postgres_fdw_get_connections() return also serverid and xact_depth.
>>> 2. make postgres_fdw define the event trigger on DROP SERVER command so that
>>>       an error is thrown if the connection to the server is still in use.
>>>       The event trigger function uses postgres_fdw_get_connections() to check
>>>       if the server connection is still in use or not.
>>>
>>> I'm not sure if this just idea is really feasible or not, though...
>>
>> I'm not quite sure if we can create such a dependency i.e. blocking
>> "drop foreign server" when at least one session has an in use cached
>> connection on it?
> 
> Maybe my explanation was not clear... I was thinking to prevent the server whose connection is used *within the
currenttransaction* from being dropped. IOW, I was thinking to forbid the drop of server if xact_depth of its
connectionis more than one. So one session can drop the server even when its connection is open in other session if
it'snot used within the transaction (i.e., xact_depth == 0).
 
> 
> BTW, for now, if the connection is used within the transaction, other session cannot drop the corresponding server
becausethe transaction holds the lock on the relations that depend on the server. Only the session running that
transactioncan drop the server. This can cause the issue in discussion.
 
> 
> So, my just idea is to disallow even that session running the transaction to drop the server. This means that no
sessioncan drop the server while its connection is used within the transaction (xact_depth > 0).
 
> 
> 
>> What if a user wants to drop a server from one
>> session, all other sessions one after the other keep having in-use
>> connections related to that server, (though this use case sounds
>> impractical) will the drop server ever be successful? Since we can
>> have hundreds of sessions in real world postgres environment, I don't
>> know if it's a good idea to create such dependency.
>>
>> As you suggested, this point can be discussed in a separate thread and
>> if any of the approaches proposed by you above is finalized we can
>> extend postgres_fdw_get_connections anytime.
>>
>> Thoughts?
> 
> I will consider more before starting separate discussion!
> 
> 
>>
>> Attaching v16 patch set, addressing above review comments and also
>> added a test case suggested upthread that postgres_fdw_disconnect()
>> with existing server name returns false that is when the cache doesn't
>> have active connection.
>>
>> Please review the v16 patch set further.
> 
> Thanks! Will review that later.

+            /*
+             * For the given server, if we closed connection or it is still in
+             * use, then no need of scanning the cache further. We do this
+             * because the cache can not have multiple cache entries for a
+             * single foreign server.
+             */

On second thought, ISTM that single foreign server can have multiple cache
entries. For example,

CREATE ROLE foo1 SUPERUSER;
CREATE ROLE foo2 SUPERUSER;
CREATE EXTENSION postgres_fdw;
CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS (port '5432');
CREATE USER MAPPING FOR foo1 SERVER loopback OPTIONS (user 'postgres');
CREATE USER MAPPING FOR foo2 SERVER loopback OPTIONS (user 'postgres');
CREATE TABLE t (i int);
CREATE FOREIGN TABLE ft (i int) SERVER loopback OPTIONS (table_name 't');
SET SESSION AUTHORIZATION foo1;
SELECT * FROM ft;
SET SESSION AUTHORIZATION foo2;
SELECT * FROM ft;


Then you can see there are multiple open connections for the same server
as follows. So we need to scan all the entries even when the serverid is
specified.

SELECT * FROM postgres_fdw_get_connections();

  server_name | valid
-------------+-------
  loopback    | t
  loopback    | t
(2 rows)


This means that user (even non-superuser) can disconnect the connection
established by another user (superuser), by using postgres_fdw_disconnect_all().
Is this really OK?


+        if (all || (OidIsValid(serverid) && entry->serverid == serverid))
+        {

I don't think that "OidIsValid(serverid)" condition is necessary here.
But you're just concerned about the case where the caller mistakenly
specifies invalid oid and all=false? One idea to avoid that inconsistent
combination of inputs is to change disconnect_cached_connections()
as follows.

-disconnect_cached_connections(Oid serverid, bool all)
+disconnect_cached_connections(Oid serverid)
  {
      HASH_SEQ_STATUS    scan;
      ConnCacheEntry    *entry;
+    bool    all = !OidIsValid(serverid);


+                     * in pgfdw_inval_callback at the end of drop sever

Typo: "sever" should be "server".


+-- ===================================================================
+-- test postgres_fdw_disconnect function
+-- ===================================================================

This regression test is placed at the end of test file. But isn't it better
to place that just after the regression test "test connection invalidation
  cases" because they are related?


+    <screen>
+postgres=# SELECT * FROM postgres_fdw_disconnect('loopback1');
+ postgres_fdw_disconnect

The tag <screen> should start from the beginning.

As I commented upthread, what about replacing the example query with
"SELECT postgres_fdw_disconnect('loopback1');" because it's more common?


Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Fri, Jan 22, 2021 at 6:43 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >> Please review the v16 patch set further.
> >
> > Thanks! Will review that later.
>
> +                       /*
> +                        * For the given server, if we closed connection or it is still in
> +                        * use, then no need of scanning the cache further. We do this
> +                        * because the cache can not have multiple cache entries for a
> +                        * single foreign server.
> +                        */
>
> On second thought, ISTM that single foreign server can have multiple cache
> entries. For example,
>
> CREATE ROLE foo1 SUPERUSER;
> CREATE ROLE foo2 SUPERUSER;
> CREATE EXTENSION postgres_fdw;
> CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS (port '5432');
> CREATE USER MAPPING FOR foo1 SERVER loopback OPTIONS (user 'postgres');
> CREATE USER MAPPING FOR foo2 SERVER loopback OPTIONS (user 'postgres');
> CREATE TABLE t (i int);
> CREATE FOREIGN TABLE ft (i int) SERVER loopback OPTIONS (table_name 't');
> SET SESSION AUTHORIZATION foo1;
> SELECT * FROM ft;
> SET SESSION AUTHORIZATION foo2;
> SELECT * FROM ft;
>
>
> Then you can see there are multiple open connections for the same server
> as follows. So we need to scan all the entries even when the serverid is
> specified.
>
> SELECT * FROM postgres_fdw_get_connections();
>
>   server_name | valid
> -------------+-------
>   loopback    | t
>   loopback    | t
> (2 rows)

This is a great finding. Thanks a lot. I will remove
hash_seq_term(&scan); in disconnect_cached_connections and add this as
a test case for postgres_fdw_get_connections function, just to show
there can be multiple connections with a single server name.

> This means that user (even non-superuser) can disconnect the connection
> established by another user (superuser), by using postgres_fdw_disconnect_all().
> Is this really OK?

Yeah, connections can be discarded by non-super users using
postgres_fdw_disconnect_all and postgres_fdw_disconnect. Given the
fact that a non-super user requires a password to access foreign
tables [1], IMO a non-super user changing something related to a super
user makes no sense at all. If okay, we can have a check in
disconnect_cached_connections something like below:

+static bool
+disconnect_cached_connections(Oid serverid)
+{
+    HASH_SEQ_STATUS    scan;
+    ConnCacheEntry    *entry;
+    bool    all = !OidIsValid(serverid);
+    bool    result = false;
+
+    if (!superuser())
+        ereport(ERROR,
+                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+                 errmsg("must be superuser to discard open connections")));
+
+    if (!ConnectionHash)

Having said that, it looks like dblink_disconnect doesn't perform
superuser checks.

Thoughts?

[1]
SELECT * FROM ft1_nopw LIMIT 1;
ERROR:  password is required
DETAIL:  Non-superusers must provide a password in the user mapping.

> +               if (all || (OidIsValid(serverid) && entry->serverid == serverid))
> +               {
>
> I don't think that "OidIsValid(serverid)" condition is necessary here.
> But you're just concerned about the case where the caller mistakenly
> specifies invalid oid and all=false? One idea to avoid that inconsistent
> combination of inputs is to change disconnect_cached_connections()
> as follows.
>
> -disconnect_cached_connections(Oid serverid, bool all)
> +disconnect_cached_connections(Oid serverid)
>   {
>         HASH_SEQ_STATUS scan;
>         ConnCacheEntry  *entry;
> +       bool    all = !OidIsValid(serverid);

+1. Will change it.

> +                                        * in pgfdw_inval_callback at the end of drop sever
>
> Typo: "sever" should be "server".

+1. Will change it.

> +-- ===================================================================
> +-- test postgres_fdw_disconnect function
> +-- ===================================================================
>
> This regression test is placed at the end of test file. But isn't it better
> to place that just after the regression test "test connection invalidation
>   cases" because they are related?

+1. Will change it.

> +    <screen>
> +postgres=# SELECT * FROM postgres_fdw_disconnect('loopback1');
> + postgres_fdw_disconnect
>
> The tag <screen> should start from the beginning.

+1. Will change it.

> As I commented upthread, what about replacing the example query with
> "SELECT postgres_fdw_disconnect('loopback1');" because it's more common?

Sorry, I forgot to check that in the documentation earlier. +1. Will change it.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/01/23 13:40, Bharath Rupireddy wrote:
> On Fri, Jan 22, 2021 at 6:43 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>> Please review the v16 patch set further.
>>>
>>> Thanks! Will review that later.
>>
>> +                       /*
>> +                        * For the given server, if we closed connection or it is still in
>> +                        * use, then no need of scanning the cache further. We do this
>> +                        * because the cache can not have multiple cache entries for a
>> +                        * single foreign server.
>> +                        */
>>
>> On second thought, ISTM that single foreign server can have multiple cache
>> entries. For example,
>>
>> CREATE ROLE foo1 SUPERUSER;
>> CREATE ROLE foo2 SUPERUSER;
>> CREATE EXTENSION postgres_fdw;
>> CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS (port '5432');
>> CREATE USER MAPPING FOR foo1 SERVER loopback OPTIONS (user 'postgres');
>> CREATE USER MAPPING FOR foo2 SERVER loopback OPTIONS (user 'postgres');
>> CREATE TABLE t (i int);
>> CREATE FOREIGN TABLE ft (i int) SERVER loopback OPTIONS (table_name 't');
>> SET SESSION AUTHORIZATION foo1;
>> SELECT * FROM ft;
>> SET SESSION AUTHORIZATION foo2;
>> SELECT * FROM ft;
>>
>>
>> Then you can see there are multiple open connections for the same server
>> as follows. So we need to scan all the entries even when the serverid is
>> specified.
>>
>> SELECT * FROM postgres_fdw_get_connections();
>>
>>    server_name | valid
>> -------------+-------
>>    loopback    | t
>>    loopback    | t
>> (2 rows)
> 
> This is a great finding. Thanks a lot. I will remove
> hash_seq_term(&scan); in disconnect_cached_connections and add this as
> a test case for postgres_fdw_get_connections function, just to show
> there can be multiple connections with a single server name.
> 
>> This means that user (even non-superuser) can disconnect the connection
>> established by another user (superuser), by using postgres_fdw_disconnect_all().
>> Is this really OK?
> 
> Yeah, connections can be discarded by non-super users using
> postgres_fdw_disconnect_all and postgres_fdw_disconnect. Given the
> fact that a non-super user requires a password to access foreign
> tables [1], IMO a non-super user changing something related to a super
> user makes no sense at all. If okay, we can have a check in
> disconnect_cached_connections something like below:

Also like pg_terminate_backend(), we should disallow non-superuser to disconnect the connections established by other
non-superuserif the requesting user is not a member of the other? Or that's overkill because the target to discard is
justa connection and it can be established again if necessary?
 

For now I'm thinking that it might better to add the restriction like pg_terminate_backend() at first and relax that
laterif possible. But I'd like hear more opinions about this.
 


> 
> +static bool
> +disconnect_cached_connections(Oid serverid)
> +{
> +    HASH_SEQ_STATUS    scan;
> +    ConnCacheEntry    *entry;
> +    bool    all = !OidIsValid(serverid);
> +    bool    result = false;
> +
> +    if (!superuser())
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> +                 errmsg("must be superuser to discard open connections")));
> +
> +    if (!ConnectionHash)
> 
> Having said that, it looks like dblink_disconnect doesn't perform
> superuser checks.

Also non-superuser (set by SET ROLE or SET SESSION AUTHORIZATION) seems to be able to run SQL using the dblink
connectionestablished by superuser. If we didn't think that this is a problem, we also might not need to care about
issueeven for postgres_fdw.
 


Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Mon, Jan 25, 2021 at 1:20 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > Yeah, connections can be discarded by non-super users using
> > postgres_fdw_disconnect_all and postgres_fdw_disconnect. Given the
> > fact that a non-super user requires a password to access foreign
> > tables [1], IMO a non-super user changing something related to a super
> > user makes no sense at all. If okay, we can have a check in
> > disconnect_cached_connections something like below:
>
> Also like pg_terminate_backend(), we should disallow non-superuser to disconnect the connections established by other
non-superuserif the requesting user is not a member of the other? Or that's overkill because the target to discard is
justa connection and it can be established again if necessary? 

Yes, if required backends can establish the connection again. But my
worry is this - a non-super user disconnecting all or a given
connection created by a super user?

> For now I'm thinking that it might better to add the restriction like pg_terminate_backend() at first and relax that
laterif possible. But I'd like hear more opinions about this. 

I agree. If required we can lift it later, once we get the users using
these functions? Maybe we can have a comment near superchecks in
disconnect_cached_connections saying, we can lift this in future?

Do you want me to add these checks like in pg_signal_backend?

    /* Only allow superusers to signal superuser-owned backends. */
    if (superuser_arg(proc->roleId) && !superuser())
        return SIGNAL_BACKEND_NOSUPERUSER;

    /* Users can signal backends they have role membership in. */
    if (!has_privs_of_role(GetUserId(), proc->roleId) &&
        !has_privs_of_role(GetUserId(), DEFAULT_ROLE_SIGNAL_BACKENDID))
        return SIGNAL_BACKEND_NOPERMISSION;

or only below is enough?

+    /* Non-super users are not allowed to disconnect cached connections. */
+    if (!superuser())
+        ereport(ERROR,
+                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+                 errmsg("must be superuser to discard open connections")));

> > +static bool
> > +disconnect_cached_connections(Oid serverid)
> > +{
> > +    HASH_SEQ_STATUS    scan;
> > +    ConnCacheEntry    *entry;
> > +    bool    all = !OidIsValid(serverid);
> > +    bool    result = false;
> > +
> > +    if (!superuser())
> > +        ereport(ERROR,
> > +                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > +                 errmsg("must be superuser to discard open connections")));
> > +
> > +    if (!ConnectionHash)
> >
> > Having said that, it looks like dblink_disconnect doesn't perform
> > superuser checks.
>
> Also non-superuser (set by SET ROLE or SET SESSION AUTHORIZATION) seems to be able to run SQL using the dblink
connectionestablished by superuser. If we didn't think that this is a problem, we also might not need to care about
issueeven for postgres_fdw. 

IMO, we can have superuser checks for postgres_fdw new functions for now.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/01/25 18:13, Bharath Rupireddy wrote:
> On Mon, Jan 25, 2021 at 1:20 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> Yeah, connections can be discarded by non-super users using
>>> postgres_fdw_disconnect_all and postgres_fdw_disconnect. Given the
>>> fact that a non-super user requires a password to access foreign
>>> tables [1], IMO a non-super user changing something related to a super
>>> user makes no sense at all. If okay, we can have a check in
>>> disconnect_cached_connections something like below:
>>
>> Also like pg_terminate_backend(), we should disallow non-superuser to disconnect the connections established by
othernon-superuser if the requesting user is not a member of the other? Or that's overkill because the target to
discardis just a connection and it can be established again if necessary?
 
> 
> Yes, if required backends can establish the connection again. But my
> worry is this - a non-super user disconnecting all or a given
> connection created by a super user?

Yes, I was also worried about that. But I found that there are other similar cases, for example,

- a cursor that superuser declared can be closed by non-superuser (set by SET ROLE or SET SESSION AUTHORIZATION) in the
samesession.
 
- a prepared statement that superuser created can be deallocated by non-superuser in the same session.

This makes me think that it's OK even for non-superuser to disconnect the connections established by superuser in the
samesession. For now I've not found any real security issue by doing that yet. Thought? Am I missing something?
 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Mon, Jan 25, 2021 at 3:17 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > Yes, if required backends can establish the connection again. But my
> > worry is this - a non-super user disconnecting all or a given
> > connection created by a super user?
>
> Yes, I was also worried about that. But I found that there are other similar cases, for example,
>
> - a cursor that superuser declared can be closed by non-superuser (set by SET ROLE or SET SESSION AUTHORIZATION) in
thesame session.
 
> - a prepared statement that superuser created can be deallocated by non-superuser in the same session.
>
> This makes me think that it's OK even for non-superuser to disconnect the connections established by superuser in the
samesession. For now I've not found any real security issue by doing that yet. Thought? Am I missing something?
 

Oh, and added to that list is dblink_disconnect(). I don't know
whether there's any security risk if we allow non-superusers to
discard the super users connections. In this case, the super users
will just have to re make the connection.

> > For now I'm thinking that it might better to add the restriction like pg_terminate_backend() at first and relax
thatlater if possible. But I'd like hear more opinions about this.
 
>
> I agree. If required we can lift it later, once we get the users using
> these functions? Maybe we can have a comment near superchecks in
> disconnect_cached_connections saying, we can lift this in future?

Maybe we can do the opposite of the above that is not doing any
superuser checks in disconnect functions for now, and later if some
users complain we can add it? We can leave a comment there that "As of
now we don't see any security risks if a non-super user disconnects
the connections made by super users. If required, non-supers can be
disallowed to disconnct the connections" ?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/01/25 19:28, Bharath Rupireddy wrote:
> On Mon, Jan 25, 2021 at 3:17 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> Yes, if required backends can establish the connection again. But my
>>> worry is this - a non-super user disconnecting all or a given
>>> connection created by a super user?
>>
>> Yes, I was also worried about that. But I found that there are other similar cases, for example,
>>
>> - a cursor that superuser declared can be closed by non-superuser (set by SET ROLE or SET SESSION AUTHORIZATION) in
thesame session.
 
>> - a prepared statement that superuser created can be deallocated by non-superuser in the same session.
>>
>> This makes me think that it's OK even for non-superuser to disconnect the connections established by superuser in
thesame session. For now I've not found any real security issue by doing that yet. Thought? Am I missing something?
 
> 
> Oh, and added to that list is dblink_disconnect(). I don't know
> whether there's any security risk if we allow non-superusers to
> discard the super users connections.

I guess that's ok because superuser and nonsuperuser are running in the same session. That is, since this is the case
wheresuperuser switches to nonsuperuser intentionally, interactions between them is also intentional.
 

OTOH, if nonsuperuser in one session can affect superuser in another session that way, which would be problematic. So,
forexample, for now pg_stat_activity disallows nonsuperuser to see the query that superuser in another session is
running,from it.
 


> In this case, the super users
> will just have to re make the connection.
> 
>>> For now I'm thinking that it might better to add the restriction like pg_terminate_backend() at first and relax
thatlater if possible. But I'd like hear more opinions about this.
 
>>
>> I agree. If required we can lift it later, once we get the users using
>> these functions? Maybe we can have a comment near superchecks in
>> disconnect_cached_connections saying, we can lift this in future?
> 
> Maybe we can do the opposite of the above that is not doing any
> superuser checks in disconnect functions for now, and later if some
> users complain we can add it?

+1

> We can leave a comment there that "As of
> now we don't see any security risks if a non-super user disconnects
> the connections made by super users. If required, non-supers can be
> disallowed to disconnct the connections" ?

Yes. Also we should note that that's ok because they are in the same session.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Mon, Jan 25, 2021 at 7:20 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2021/01/25 19:28, Bharath Rupireddy wrote:
> > On Mon, Jan 25, 2021 at 3:17 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>> Yes, if required backends can establish the connection again. But my
> >>> worry is this - a non-super user disconnecting all or a given
> >>> connection created by a super user?
> >>
> >> Yes, I was also worried about that. But I found that there are other similar cases, for example,
> >>
> >> - a cursor that superuser declared can be closed by non-superuser (set by SET ROLE or SET SESSION AUTHORIZATION)
inthe same session.
 
> >> - a prepared statement that superuser created can be deallocated by non-superuser in the same session.
> >>
> >> This makes me think that it's OK even for non-superuser to disconnect the connections established by superuser in
thesame session. For now I've not found any real security issue by doing that yet. Thought? Am I missing something?
 
> >
> > Oh, and added to that list is dblink_disconnect(). I don't know
> > whether there's any security risk if we allow non-superusers to
> > discard the super users connections.
>
> I guess that's ok because superuser and nonsuperuser are running in the same session. That is, since this is the case
wheresuperuser switches to nonsuperuser intentionally, interactions between them is also intentional.
 
>
> OTOH, if nonsuperuser in one session can affect superuser in another session that way, which would be problematic.
So,for example, for now pg_stat_activity disallows nonsuperuser to see the query that superuser in another session is
running,from it.
 

Hmm, that makes sense.

> > In this case, the super users
> > will just have to re make the connection.
> >
> >>> For now I'm thinking that it might better to add the restriction like pg_terminate_backend() at first and relax
thatlater if possible. But I'd like hear more opinions about this.
 
> >>
> >> I agree. If required we can lift it later, once we get the users using
> >> these functions? Maybe we can have a comment near superchecks in
> >> disconnect_cached_connections saying, we can lift this in future?
> >
> > Maybe we can do the opposite of the above that is not doing any
> > superuser checks in disconnect functions for now, and later if some
> > users complain we can add it?
>
> +1

Thanks, will send the updated patch set soon.

> > We can leave a comment there that "As of
> > now we don't see any security risks if a non-super user disconnects
> > the connections made by super users. If required, non-supers can be
> > disallowed to disconnct the connections" ?
>
> Yes. Also we should note that that's ok because they are in the same session.

I will add this comment in disconnect_cached_connections so that we
don't lose track of it.

I will provide the updated patch set soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Mon, Jan 25, 2021 at 7:28 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> I will provide the updated patch set soon.

Attaching v17 patch set, please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2021/01/26 0:12, Bharath Rupireddy wrote:
> On Mon, Jan 25, 2021 at 7:28 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> I will provide the updated patch set soon.
> 
> Attaching v17 patch set, please review it further.

Thanks for updating the patch!

Attached is the tweaked version of the patch. I didn't change any logic,
but I updated some comments and docs. Also I added the regresssion test
to check that postgres_fdw_disconnect() closes multiple connections.
Barring any objection, I will commit this version.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment
On Tue, Jan 26, 2021 at 12:38 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> > Attaching v17 patch set, please review it further.
>
> Thanks for updating the patch!
>
> Attached is the tweaked version of the patch. I didn't change any logic,
> but I updated some comments and docs. Also I added the regresssion test
> to check that postgres_fdw_disconnect() closes multiple connections.
> Barring any objection, I will commit this version.

Thanks. The patch LGTM, except few typos:
1) in the commit message "a warning messsage is emitted." it's
"message" not "messsage".
2) in the documentation "+   a user mapping, the correspoinding
connections are closed." it's "corresponding" not "correspoinding".

I will post "keep_connections" GUC and "keep_connection" server level
option patches later.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/01/26 12:08, Bharath Rupireddy wrote:
> On Tue, Jan 26, 2021 at 12:38 AM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>>> Attaching v17 patch set, please review it further.
>>
>> Thanks for updating the patch!
>>
>> Attached is the tweaked version of the patch. I didn't change any logic,
>> but I updated some comments and docs. Also I added the regresssion test
>> to check that postgres_fdw_disconnect() closes multiple connections.
>> Barring any objection, I will commit this version.
> 
> Thanks. The patch LGTM, except few typos:
> 1) in the commit message "a warning messsage is emitted." it's
> "message" not "messsage".
> 2) in the documentation "+   a user mapping, the correspoinding
> connections are closed." it's "corresponding" not "correspoinding".

Thanks for the review! I fixed them and pushed the patch!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Fujii Masao <masao.fujii@oss.nttdata.com> writes:
> Thanks for the review! I fixed them and pushed the patch!

Buildfarm is very not happy ...

            regards, tom lane




On 2021/01/26 16:05, Tom Lane wrote:
> Fujii Masao <masao.fujii@oss.nttdata.com> writes:
>> Thanks for the review! I fixed them and pushed the patch!
> 
> Buildfarm is very not happy ...

Yes.... I'm investigating that.

  -- Return false as connections are still in use, warnings are issued.
  SELECT postgres_fdw_disconnect_all();
-WARNING:  cannot close dropped server connection because it is still in use
-WARNING:  cannot close connection for server "loopback" because it is still in use
  WARNING:  cannot close connection for server "loopback2" because it is still in use
+WARNING:  cannot close connection for server "loopback" because it is still in use
+WARNING:  cannot close dropped server connection because it is still in use

The cause of the regression test failure is that the order of warning messages
is not stable. So I'm thinking to set client_min_messages to ERROR temporarily
when doing the above test.

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Tue, Jan 26, 2021 at 12:54 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> On 2021/01/26 16:05, Tom Lane wrote:
> > Fujii Masao <masao.fujii@oss.nttdata.com> writes:
> >> Thanks for the review! I fixed them and pushed the patch!
> >
> > Buildfarm is very not happy ...
>
> Yes.... I'm investigating that.
>
>   -- Return false as connections are still in use, warnings are issued.
>   SELECT postgres_fdw_disconnect_all();
> -WARNING:  cannot close dropped server connection because it is still in use
> -WARNING:  cannot close connection for server "loopback" because it is still in use
>   WARNING:  cannot close connection for server "loopback2" because it is still in use
> +WARNING:  cannot close connection for server "loopback" because it is still in use
> +WARNING:  cannot close dropped server connection because it is still in use
>
> The cause of the regression test failure is that the order of warning messages
> is not stable. So I'm thinking to set client_min_messages to ERROR temporarily
> when doing the above test.

Looks like we do suppress warnings/notices by setting
client_min_messages to ERROR/WARNING. For instance, "suppress warning
that depends on wal_level" and  "Suppress NOTICE messages when
users/groups don't exist".

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/01/26 16:33, Bharath Rupireddy wrote:
> On Tue, Jan 26, 2021 at 12:54 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> On 2021/01/26 16:05, Tom Lane wrote:
>>> Fujii Masao <masao.fujii@oss.nttdata.com> writes:
>>>> Thanks for the review! I fixed them and pushed the patch!
>>>
>>> Buildfarm is very not happy ...
>>
>> Yes.... I'm investigating that.
>>
>>    -- Return false as connections are still in use, warnings are issued.
>>    SELECT postgres_fdw_disconnect_all();
>> -WARNING:  cannot close dropped server connection because it is still in use
>> -WARNING:  cannot close connection for server "loopback" because it is still in use
>>    WARNING:  cannot close connection for server "loopback2" because it is still in use
>> +WARNING:  cannot close connection for server "loopback" because it is still in use
>> +WARNING:  cannot close dropped server connection because it is still in use
>>
>> The cause of the regression test failure is that the order of warning messages
>> is not stable. So I'm thinking to set client_min_messages to ERROR temporarily
>> when doing the above test.
> 
> Looks like we do suppress warnings/notices by setting
> client_min_messages to ERROR/WARNING. For instance, "suppress warning
> that depends on wal_level" and  "Suppress NOTICE messages when
> users/groups don't exist".

Yes, so I pushed that change to stabilize the regression test.
Let's keep checking how the results of buildfarm members are changed.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2021/01/26 16:39, Fujii Masao wrote:
> 
> 
> On 2021/01/26 16:33, Bharath Rupireddy wrote:
>> On Tue, Jan 26, 2021 at 12:54 PM Fujii Masao
>> <masao.fujii@oss.nttdata.com> wrote:
>>> On 2021/01/26 16:05, Tom Lane wrote:
>>>> Fujii Masao <masao.fujii@oss.nttdata.com> writes:
>>>>> Thanks for the review! I fixed them and pushed the patch!
>>>>
>>>> Buildfarm is very not happy ...
>>>
>>> Yes.... I'm investigating that.
>>>
>>>    -- Return false as connections are still in use, warnings are issued.
>>>    SELECT postgres_fdw_disconnect_all();
>>> -WARNING:  cannot close dropped server connection because it is still in use
>>> -WARNING:  cannot close connection for server "loopback" because it is still in use
>>>    WARNING:  cannot close connection for server "loopback2" because it is still in use
>>> +WARNING:  cannot close connection for server "loopback" because it is still in use
>>> +WARNING:  cannot close dropped server connection because it is still in use
>>>
>>> The cause of the regression test failure is that the order of warning messages
>>> is not stable. So I'm thinking to set client_min_messages to ERROR temporarily
>>> when doing the above test.
>>
>> Looks like we do suppress warnings/notices by setting
>> client_min_messages to ERROR/WARNING. For instance, "suppress warning
>> that depends on wal_level" and  "Suppress NOTICE messages when
>> users/groups don't exist".
> 
> Yes, so I pushed that change to stabilize the regression test.
> Let's keep checking how the results of buildfarm members are changed.

+WARNING:  roles created by regression test cases should have names starting with "regress_"
  CREATE ROLE multi_conn_user2 SUPERUSER;
+WARNING:  roles created by regression test cases should have names starting with "regress_"

Hmm... another failure happened.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Tue, Jan 26, 2021 at 1:27 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > Yes, so I pushed that change to stabilize the regression test.
> > Let's keep checking how the results of buildfarm members are changed.

Sorry, I'm unfamiliar with checking the system status on the build
farm website - https://buildfarm.postgresql.org/cgi-bin/show_failures.pl.
I'm trying to figure that out.

> +WARNING:  roles created by regression test cases should have names starting with "regress_"
>   CREATE ROLE multi_conn_user2 SUPERUSER;
> +WARNING:  roles created by regression test cases should have names starting with "regress_"
>
> Hmm... another failure happened.

My bad. I should have caught that earlier. I will take care in future.

Attaching a patch to fix it.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2021/01/26 17:07, Bharath Rupireddy wrote:
> On Tue, Jan 26, 2021 at 1:27 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> Yes, so I pushed that change to stabilize the regression test.
>>> Let's keep checking how the results of buildfarm members are changed.
> 
> Sorry, I'm unfamiliar with checking the system status on the build
> farm website - https://buildfarm.postgresql.org/cgi-bin/show_failures.pl.
> I'm trying to figure that out.
> 
>> +WARNING:  roles created by regression test cases should have names starting with "regress_"
>>    CREATE ROLE multi_conn_user2 SUPERUSER;
>> +WARNING:  roles created by regression test cases should have names starting with "regress_"
>>
>> Hmm... another failure happened.
> 
> My bad. I should have caught that earlier. I will take care in future.
> 
> Attaching a patch to fix it.

Thanks for the patch! I also created that patch, confirmed that the test
successfully passed with -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS,
and pushed the patch.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Tue, Jan 26, 2021 at 1:55 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> Thanks for the patch! I also created that patch, confirmed that the test
> successfully passed with -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS,
> and pushed the patch.

Thanks a lot!

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Tue, Jan 26, 2021 at 8:38 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> I will post "keep_connections" GUC and "keep_connection" server level
> option patches later.

Attaching v19 patch set for "keep_connections" GUC and
"keep_connection" server level option. Please review them further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
> On Tue, Jan 26, 2021 at 1:55 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> Thanks for the patch! I also created that patch, confirmed that the test
>> successfully passed with -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS,
>> and pushed the patch.

> Thanks a lot!

Seems you're not out of the woods yet:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40

This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
telling us is that the patch's behavior is unstable in the face
of unexpected cache flushes.

            regards, tom lane



On Fri, Jan 29, 2021 at 1:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
> > On Tue, Jan 26, 2021 at 1:55 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >> Thanks for the patch! I also created that patch, confirmed that the test
> >> successfully passed with -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS,
> >> and pushed the patch.
>
> > Thanks a lot!
>
> Seems you're not out of the woods yet:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
>
> This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
> telling us is that the patch's behavior is unstable in the face
> of unexpected cache flushes.

Thanks a lot! It looks like the syscache invalidation messages are
generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
which pgfdw_inval_callback gets called many times in which the cached
entries are marked as invalid and closed if they are not used in the
txn. The new function postgres_fdw_get_connections outputs the
information of the cached connections such as name if the connection
is still open and their validity. Hence the output of the
postgres_fdw_get_connections became unstable in the buildfarm member.

I will further analyze making tests stable, meanwhile any suggestions
are welcome.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
> On Fri, Jan 29, 2021 at 1:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
>> This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
>> telling us is that the patch's behavior is unstable in the face
>> of unexpected cache flushes.

> Thanks a lot! It looks like the syscache invalidation messages are
> generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
> which pgfdw_inval_callback gets called many times in which the cached
> entries are marked as invalid and closed if they are not used in the
> txn. The new function postgres_fdw_get_connections outputs the
> information of the cached connections such as name if the connection
> is still open and their validity. Hence the output of the
> postgres_fdw_get_connections became unstable in the buildfarm member.
> I will further analyze making tests stable, meanwhile any suggestions
> are welcome.

I do not think you should regard this as "we need to hack the test
to make it stable".  I think you should regard this as "this is a
bug".  A cache flush should not cause user-visible state changes.
In particular, the above analysis implies that you think a cache
flush is equivalent to end-of-transaction, which it absolutely
is not.

Also, now that I've looked at pgfdw_inval_callback, it scares
the heck out of me.  Actually disconnecting a connection during
a cache inval callback seems quite unsafe --- what if that happens
while we're using the connection?

I fear this patch needs to be reverted and redesigned.

            regards, tom lane




On 2021/01/29 11:09, Tom Lane wrote:
> Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
>> On Fri, Jan 29, 2021 at 1:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
>>> This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
>>> telling us is that the patch's behavior is unstable in the face
>>> of unexpected cache flushes.
> 
>> Thanks a lot! It looks like the syscache invalidation messages are
>> generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
>> which pgfdw_inval_callback gets called many times in which the cached
>> entries are marked as invalid and closed if they are not used in the
>> txn. The new function postgres_fdw_get_connections outputs the
>> information of the cached connections such as name if the connection
>> is still open and their validity. Hence the output of the
>> postgres_fdw_get_connections became unstable in the buildfarm member.
>> I will further analyze making tests stable, meanwhile any suggestions
>> are welcome.
> 
> I do not think you should regard this as "we need to hack the test
> to make it stable".  I think you should regard this as "this is a
> bug".  A cache flush should not cause user-visible state changes.
> In particular, the above analysis implies that you think a cache
> flush is equivalent to end-of-transaction, which it absolutely
> is not.
> 
> Also, now that I've looked at pgfdw_inval_callback, it scares
> the heck out of me.  Actually disconnecting a connection during
> a cache inval callback seems quite unsafe --- what if that happens
> while we're using the connection?

If the connection is still used in the transaction, pgfdw_inval_callback()
marks it as invalidated and doesn't close it. So I was not thinking that
this is so unsafe.

The disconnection code in pgfdw_inval_callback() was added in commit
e3ebcca843 to fix connection leak issue, and it's back-patched. If this
change is really unsafe, we need to revert it immediately at least from back
branches because the next minor release is scheduled soon.

BTW, even if we change pgfdw_inval_callback() so that it doesn't close
the connection at all, ISTM that the results of postgres_fdw_get_connections()
would not be stable because entry->invalidated would vary based on
whether CLOBBER_CACHE_ALWAYS is used or not.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Fri, Jan 29, 2021 at 10:28 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> On 2021/01/29 11:09, Tom Lane wrote:
> > Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
> >> On Fri, Jan 29, 2021 at 1:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
> >>> This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
> >>> telling us is that the patch's behavior is unstable in the face
> >>> of unexpected cache flushes.
> >
> >> Thanks a lot! It looks like the syscache invalidation messages are
> >> generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
> >> which pgfdw_inval_callback gets called many times in which the cached
> >> entries are marked as invalid and closed if they are not used in the
> >> txn. The new function postgres_fdw_get_connections outputs the
> >> information of the cached connections such as name if the connection
> >> is still open and their validity. Hence the output of the
> >> postgres_fdw_get_connections became unstable in the buildfarm member.
> >> I will further analyze making tests stable, meanwhile any suggestions
> >> are welcome.
> >
> > I do not think you should regard this as "we need to hack the test
> > to make it stable".  I think you should regard this as "this is a
> > bug".  A cache flush should not cause user-visible state changes.
> > In particular, the above analysis implies that you think a cache
> > flush is equivalent to end-of-transaction, which it absolutely
> > is not.
> >
> > Also, now that I've looked at pgfdw_inval_callback, it scares
> > the heck out of me.  Actually disconnecting a connection during
> > a cache inval callback seems quite unsafe --- what if that happens
> > while we're using the connection?
>
> If the connection is still used in the transaction, pgfdw_inval_callback()
> marks it as invalidated and doesn't close it. So I was not thinking that
> this is so unsafe.
>
> The disconnection code in pgfdw_inval_callback() was added in commit
> e3ebcca843 to fix connection leak issue, and it's back-patched. If this
> change is really unsafe, we need to revert it immediately at least from back
> branches because the next minor release is scheduled soon.

I think we can remove disconnect_pg_server in pgfdw_inval_callback and
make entries only invalidated. Anyways, those connections can get
closed at the end of main txn in pgfdw_xact_callback. Thoughts?

If okay, I can make a patch for this.

> BTW, even if we change pgfdw_inval_callback() so that it doesn't close
> the connection at all, ISTM that the results of postgres_fdw_get_connections()
> would not be stable because entry->invalidated would vary based on
> whether CLOBBER_CACHE_ALWAYS is used or not.

Yes, after the above change (removing disconnect_pg_server in
pgfdw_inval_callback), our tests don't get stable because
postgres_fdw_get_connections shows the valid state of the connections.
I think we can change postgres_fdw_get_connections so that it only
shows the active connections server name but not valid state. Because,
the valid state is something dependent on the internal state change
and is not consistent with the user expectation but we are exposing it
to the user.  Thoughts?

If okay, I can work on the patch for this.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Fri, Jan 29, 2021 at 10:42 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> > > Also, now that I've looked at pgfdw_inval_callback, it scares
> > > the heck out of me.  Actually disconnecting a connection during
> > > a cache inval callback seems quite unsafe --- what if that happens
> > > while we're using the connection?
> >
> > If the connection is still used in the transaction, pgfdw_inval_callback()
> > marks it as invalidated and doesn't close it. So I was not thinking that
> > this is so unsafe.
> >
> > The disconnection code in pgfdw_inval_callback() was added in commit
> > e3ebcca843 to fix connection leak issue, and it's back-patched. If this
> > change is really unsafe, we need to revert it immediately at least from back
> > branches because the next minor release is scheduled soon.
>
> I think we can remove disconnect_pg_server in pgfdw_inval_callback and
> make entries only invalidated. Anyways, those connections can get
> closed at the end of main txn in pgfdw_xact_callback. Thoughts?
>
> If okay, I can make a patch for this.

Attaching a patch for this, which can be back patched.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2021/01/29 14:12, Bharath Rupireddy wrote:
> On Fri, Jan 29, 2021 at 10:28 AM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> On 2021/01/29 11:09, Tom Lane wrote:
>>> Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
>>>> On Fri, Jan 29, 2021 at 1:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
>>>>> This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
>>>>> telling us is that the patch's behavior is unstable in the face
>>>>> of unexpected cache flushes.
>>>
>>>> Thanks a lot! It looks like the syscache invalidation messages are
>>>> generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
>>>> which pgfdw_inval_callback gets called many times in which the cached
>>>> entries are marked as invalid and closed if they are not used in the
>>>> txn. The new function postgres_fdw_get_connections outputs the
>>>> information of the cached connections such as name if the connection
>>>> is still open and their validity. Hence the output of the
>>>> postgres_fdw_get_connections became unstable in the buildfarm member.
>>>> I will further analyze making tests stable, meanwhile any suggestions
>>>> are welcome.
>>>
>>> I do not think you should regard this as "we need to hack the test
>>> to make it stable".  I think you should regard this as "this is a
>>> bug".  A cache flush should not cause user-visible state changes.
>>> In particular, the above analysis implies that you think a cache
>>> flush is equivalent to end-of-transaction, which it absolutely
>>> is not.
>>>
>>> Also, now that I've looked at pgfdw_inval_callback, it scares
>>> the heck out of me.  Actually disconnecting a connection during
>>> a cache inval callback seems quite unsafe --- what if that happens
>>> while we're using the connection?
>>
>> If the connection is still used in the transaction, pgfdw_inval_callback()
>> marks it as invalidated and doesn't close it. So I was not thinking that
>> this is so unsafe.
>>
>> The disconnection code in pgfdw_inval_callback() was added in commit
>> e3ebcca843 to fix connection leak issue, and it's back-patched. If this
>> change is really unsafe, we need to revert it immediately at least from back
>> branches because the next minor release is scheduled soon.
> 
> I think we can remove disconnect_pg_server in pgfdw_inval_callback and
> make entries only invalidated. Anyways, those connections can get
> closed at the end of main txn in pgfdw_xact_callback. Thoughts?

But this revives the connection leak issue. So isn't it better to
to do that after we confirm that the current code is really unsafe?

> 
> If okay, I can make a patch for this.
> 
>> BTW, even if we change pgfdw_inval_callback() so that it doesn't close
>> the connection at all, ISTM that the results of postgres_fdw_get_connections()
>> would not be stable because entry->invalidated would vary based on
>> whether CLOBBER_CACHE_ALWAYS is used or not.
> 
> Yes, after the above change (removing disconnect_pg_server in
> pgfdw_inval_callback), our tests don't get stable because
> postgres_fdw_get_connections shows the valid state of the connections.
> I think we can change postgres_fdw_get_connections so that it only
> shows the active connections server name but not valid state. Because,
> the valid state is something dependent on the internal state change
> and is not consistent with the user expectation but we are exposing it
> to the user.  Thoughts?

I don't think that's enough because even the following simple
queries return the different results, depending on whether
CLOBBER_CACHE_ALWAYS is used or not.

     SELECT * FROM ft6;  -- ft6 is the foreign table
     SELECT server_name FROM postgres_fdw_get_connections();

When CLOBBER_CACHE_ALWAYS is used, postgres_fdw_get_connections()
returns no records because the connection is marked as invalidated,
and then closed at xact callback in SELECT query. Otherwise,
postgres_fdw_get_connections() returns at least one connection that
was established in the SELECT query.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Fri, Jan 29, 2021 at 10:55 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> On 2021/01/29 14:12, Bharath Rupireddy wrote:
> > On Fri, Jan 29, 2021 at 10:28 AM Fujii Masao
> > <masao.fujii@oss.nttdata.com> wrote:
> >> On 2021/01/29 11:09, Tom Lane wrote:
> >>> Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
> >>>> On Fri, Jan 29, 2021 at 1:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
> >>>>> This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
> >>>>> telling us is that the patch's behavior is unstable in the face
> >>>>> of unexpected cache flushes.
> >>>
> >>>> Thanks a lot! It looks like the syscache invalidation messages are
> >>>> generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
> >>>> which pgfdw_inval_callback gets called many times in which the cached
> >>>> entries are marked as invalid and closed if they are not used in the
> >>>> txn. The new function postgres_fdw_get_connections outputs the
> >>>> information of the cached connections such as name if the connection
> >>>> is still open and their validity. Hence the output of the
> >>>> postgres_fdw_get_connections became unstable in the buildfarm member.
> >>>> I will further analyze making tests stable, meanwhile any suggestions
> >>>> are welcome.
> >>>
> >>> I do not think you should regard this as "we need to hack the test
> >>> to make it stable".  I think you should regard this as "this is a
> >>> bug".  A cache flush should not cause user-visible state changes.
> >>> In particular, the above analysis implies that you think a cache
> >>> flush is equivalent to end-of-transaction, which it absolutely
> >>> is not.
> >>>
> >>> Also, now that I've looked at pgfdw_inval_callback, it scares
> >>> the heck out of me.  Actually disconnecting a connection during
> >>> a cache inval callback seems quite unsafe --- what if that happens
> >>> while we're using the connection?
> >>
> >> If the connection is still used in the transaction, pgfdw_inval_callback()
> >> marks it as invalidated and doesn't close it. So I was not thinking that
> >> this is so unsafe.
> >>
> >> The disconnection code in pgfdw_inval_callback() was added in commit
> >> e3ebcca843 to fix connection leak issue, and it's back-patched. If this
> >> change is really unsafe, we need to revert it immediately at least from back
> >> branches because the next minor release is scheduled soon.
> >
> > I think we can remove disconnect_pg_server in pgfdw_inval_callback and
> > make entries only invalidated. Anyways, those connections can get
> > closed at the end of main txn in pgfdw_xact_callback. Thoughts?
>
> But this revives the connection leak issue. So isn't it better to
> to do that after we confirm that the current code is really unsafe?

IMO, connections will not leak, because the invalidated connections
eventually will get closed in pgfdw_xact_callback at the main txn end.

IIRC, when we were finding a way to close the invalidated connections
so that they don't leaked, we had two options:

1) let those connections (whether currently being used in the xact or
not) get marked invalidated in pgfdw_inval_callback and closed in
pgfdw_xact_callback at the main txn end as shown below

        if (PQstatus(entry->conn) != CONNECTION_OK ||
            PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
            entry->changing_xact_state ||
            entry->invalidated).   ----> by adding this
        {
            elog(DEBUG3, "discarding connection %p", entry->conn);
            disconnect_pg_server(entry);
        }

2) close the unused connections right away in pgfdw_inval_callback
instead of marking them invalidated. Mark used connections as
invalidated in pgfdw_inval_callback and close them in
pgfdw_xact_callback at the main txn end.

We went with option (2) because we thought this would ease some burden
on pgfdw_xact_callback closing a lot of invalid connections at once.

Hope that's fine.

I will respond to postgres_fdw_get_connections issue separately.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Fri, Jan 29, 2021 at 11:08 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Jan 29, 2021 at 10:55 AM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
> > On 2021/01/29 14:12, Bharath Rupireddy wrote:
> > > On Fri, Jan 29, 2021 at 10:28 AM Fujii Masao
> > > <masao.fujii@oss.nttdata.com> wrote:
> > >> On 2021/01/29 11:09, Tom Lane wrote:
> > >>> Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
> > >>>> On Fri, Jan 29, 2021 at 1:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
> > >>>>> This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
> > >>>>> telling us is that the patch's behavior is unstable in the face
> > >>>>> of unexpected cache flushes.
> > >>>
> > >>>> Thanks a lot! It looks like the syscache invalidation messages are
> > >>>> generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
> > >>>> which pgfdw_inval_callback gets called many times in which the cached
> > >>>> entries are marked as invalid and closed if they are not used in the
> > >>>> txn. The new function postgres_fdw_get_connections outputs the
> > >>>> information of the cached connections such as name if the connection
> > >>>> is still open and their validity. Hence the output of the
> > >>>> postgres_fdw_get_connections became unstable in the buildfarm member.
> > >>>> I will further analyze making tests stable, meanwhile any suggestions
> > >>>> are welcome.
> > >>>
> > >>> I do not think you should regard this as "we need to hack the test
> > >>> to make it stable".  I think you should regard this as "this is a
> > >>> bug".  A cache flush should not cause user-visible state changes.
> > >>> In particular, the above analysis implies that you think a cache
> > >>> flush is equivalent to end-of-transaction, which it absolutely
> > >>> is not.
> > >>>
> > >>> Also, now that I've looked at pgfdw_inval_callback, it scares
> > >>> the heck out of me.  Actually disconnecting a connection during
> > >>> a cache inval callback seems quite unsafe --- what if that happens
> > >>> while we're using the connection?
> > >>
> > >> If the connection is still used in the transaction, pgfdw_inval_callback()
> > >> marks it as invalidated and doesn't close it. So I was not thinking that
> > >> this is so unsafe.
> > >>
> > >> The disconnection code in pgfdw_inval_callback() was added in commit
> > >> e3ebcca843 to fix connection leak issue, and it's back-patched. If this
> > >> change is really unsafe, we need to revert it immediately at least from back
> > >> branches because the next minor release is scheduled soon.
> > >
> > > I think we can remove disconnect_pg_server in pgfdw_inval_callback and
> > > make entries only invalidated. Anyways, those connections can get
> > > closed at the end of main txn in pgfdw_xact_callback. Thoughts?
> >
> > But this revives the connection leak issue. So isn't it better to
> > to do that after we confirm that the current code is really unsafe?
>
> IMO, connections will not leak, because the invalidated connections
> eventually will get closed in pgfdw_xact_callback at the main txn end.
>
> IIRC, when we were finding a way to close the invalidated connections
> so that they don't leaked, we had two options:
>
> 1) let those connections (whether currently being used in the xact or
> not) get marked invalidated in pgfdw_inval_callback and closed in
> pgfdw_xact_callback at the main txn end as shown below
>
>         if (PQstatus(entry->conn) != CONNECTION_OK ||
>             PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
>             entry->changing_xact_state ||
>             entry->invalidated).   ----> by adding this
>         {
>             elog(DEBUG3, "discarding connection %p", entry->conn);
>             disconnect_pg_server(entry);
>         }
>
> 2) close the unused connections right away in pgfdw_inval_callback
> instead of marking them invalidated. Mark used connections as
> invalidated in pgfdw_inval_callback and close them in
> pgfdw_xact_callback at the main txn end.
>
> We went with option (2) because we thought this would ease some burden
> on pgfdw_xact_callback closing a lot of invalid connections at once.

Also, see the original patch for the connection leak issue just does
option (1), see [1]. But in [2] and [3], we chose option (2).

I feel, we can go for option (1), with the patch attached in [1] i.e.
having have_invalid_connections whenever any connection gets invalided
so that we don't quickly exit in pgfdw_xact_callback and the
invalidated connections get closed properly. Thoughts?

static void
pgfdw_xact_callback(XactEvent event, void *arg)
{
    HASH_SEQ_STATUS scan;
    ConnCacheEntry *entry;

    /* Quick exit if no connections were touched in this transaction. */
    if (!xact_got_connection)
        return;

[1] https://www.postgresql.org/message-id/CALj2ACVNcGH_6qLY-4_tXz8JLvA%2B4yeBThRfxMz7Oxbk1aHcpQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/f57dd9c3-0664-5f4c-41f0-0713047ae7b7%40oss.nttdata.com
[3] https://www.postgresql.org/message-id/CALj2ACVNjV1%2B72f3nVCngC7RsGSiGXZQ2mAzYx_Dij7oJpV8iA%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Fri, Jan 29, 2021 at 10:55 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> >> BTW, even if we change pgfdw_inval_callback() so that it doesn't close
> >> the connection at all, ISTM that the results of postgres_fdw_get_connections()
> >> would not be stable because entry->invalidated would vary based on
> >> whether CLOBBER_CACHE_ALWAYS is used or not.
> >
> > Yes, after the above change (removing disconnect_pg_server in
> > pgfdw_inval_callback), our tests don't get stable because
> > postgres_fdw_get_connections shows the valid state of the connections.
> > I think we can change postgres_fdw_get_connections so that it only
> > shows the active connections server name but not valid state. Because,
> > the valid state is something dependent on the internal state change
> > and is not consistent with the user expectation but we are exposing it
> > to the user.  Thoughts?
>
> I don't think that's enough because even the following simple
> queries return the different results, depending on whether
> CLOBBER_CACHE_ALWAYS is used or not.
>
>      SELECT * FROM ft6;  -- ft6 is the foreign table
>      SELECT server_name FROM postgres_fdw_get_connections();
>
> When CLOBBER_CACHE_ALWAYS is used, postgres_fdw_get_connections()
> returns no records because the connection is marked as invalidated,
> and then closed at xact callback in SELECT query. Otherwise,
> postgres_fdw_get_connections() returns at least one connection that
> was established in the SELECT query.

Right. In that case, after changing postgres_fdw_get_connections() so
that it doesn't output the valid state of the connections at all, we
can have all the new function test cases inside an explicit txn block.
So even if the clobber cache invalidates the connections, they don't
get closed until the end of main xact, the tests will be stable.
Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/01/29 14:53, Bharath Rupireddy wrote:
> On Fri, Jan 29, 2021 at 10:55 AM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>>>> BTW, even if we change pgfdw_inval_callback() so that it doesn't close
>>>> the connection at all, ISTM that the results of postgres_fdw_get_connections()
>>>> would not be stable because entry->invalidated would vary based on
>>>> whether CLOBBER_CACHE_ALWAYS is used or not.
>>>
>>> Yes, after the above change (removing disconnect_pg_server in
>>> pgfdw_inval_callback), our tests don't get stable because
>>> postgres_fdw_get_connections shows the valid state of the connections.
>>> I think we can change postgres_fdw_get_connections so that it only
>>> shows the active connections server name but not valid state. Because,
>>> the valid state is something dependent on the internal state change
>>> and is not consistent with the user expectation but we are exposing it
>>> to the user.  Thoughts?
>>
>> I don't think that's enough because even the following simple
>> queries return the different results, depending on whether
>> CLOBBER_CACHE_ALWAYS is used or not.
>>
>>       SELECT * FROM ft6;  -- ft6 is the foreign table
>>       SELECT server_name FROM postgres_fdw_get_connections();
>>
>> When CLOBBER_CACHE_ALWAYS is used, postgres_fdw_get_connections()
>> returns no records because the connection is marked as invalidated,
>> and then closed at xact callback in SELECT query. Otherwise,
>> postgres_fdw_get_connections() returns at least one connection that
>> was established in the SELECT query.
> 
> Right. In that case, after changing postgres_fdw_get_connections() so
> that it doesn't output the valid state of the connections at all, we

You're thinking to get rid of "valid" column? Or hide it from the test query
(e.g., SELECT server_name from postgres_fdw_get_connections())?

> can have all the new function test cases inside an explicit txn block.
> So even if the clobber cache invalidates the connections, they don't
> get closed until the end of main xact, the tests will be stable.
> Thoughts?

Also if there are cached connections before starting that transaction,
they should be closed or established again before executing
postgres_fdw_get_connections(). Otherwise, those connections are
returned from postgres_fdw_get_connections() when
CLOBBER_CACHE_ALWAYS is not used, but not when it's used.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Fri, Jan 29, 2021 at 11:38 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> On 2021/01/29 14:53, Bharath Rupireddy wrote:
> > On Fri, Jan 29, 2021 at 10:55 AM Fujii Masao
> > <masao.fujii@oss.nttdata.com> wrote:
> >>>> BTW, even if we change pgfdw_inval_callback() so that it doesn't close
> >>>> the connection at all, ISTM that the results of postgres_fdw_get_connections()
> >>>> would not be stable because entry->invalidated would vary based on
> >>>> whether CLOBBER_CACHE_ALWAYS is used or not.
> >>>
> >>> Yes, after the above change (removing disconnect_pg_server in
> >>> pgfdw_inval_callback), our tests don't get stable because
> >>> postgres_fdw_get_connections shows the valid state of the connections.
> >>> I think we can change postgres_fdw_get_connections so that it only
> >>> shows the active connections server name but not valid state. Because,
> >>> the valid state is something dependent on the internal state change
> >>> and is not consistent with the user expectation but we are exposing it
> >>> to the user.  Thoughts?
> >>
> >> I don't think that's enough because even the following simple
> >> queries return the different results, depending on whether
> >> CLOBBER_CACHE_ALWAYS is used or not.
> >>
> >>       SELECT * FROM ft6;  -- ft6 is the foreign table
> >>       SELECT server_name FROM postgres_fdw_get_connections();
> >>
> >> When CLOBBER_CACHE_ALWAYS is used, postgres_fdw_get_connections()
> >> returns no records because the connection is marked as invalidated,
> >> and then closed at xact callback in SELECT query. Otherwise,
> >> postgres_fdw_get_connections() returns at least one connection that
> >> was established in the SELECT query.
> >
> > Right. In that case, after changing postgres_fdw_get_connections() so
> > that it doesn't output the valid state of the connections at all, we
>
> You're thinking to get rid of "valid" column? Or hide it from the test query
> (e.g., SELECT server_name from postgres_fdw_get_connections())?

I'm thinking we can get rid of the "valid" column from the
postgres_fdw_get_connections() function, not from the tests. Seems
like we are exposing some internal state(connection is valid or not)
which can change because of internal events. And also with the
existing postgres_fdw_get_connections(), the valid will always be true
if the user calls postgres_fdw_get_connections() outside an explicit
xact block, it can become false only when it's used in an explicit txn
block. So, the valid column may not be much useful for the user.
Thoughts?

> > can have all the new function test cases inside an explicit txn block.
> > So even if the clobber cache invalidates the connections, they don't
> > get closed until the end of main xact, the tests will be stable.
> > Thoughts?
>
> Also if there are cached connections before starting that transaction,
> they should be closed or established again before executing
> postgres_fdw_get_connections(). Otherwise, those connections are
> returned from postgres_fdw_get_connections() when
> CLOBBER_CACHE_ALWAYS is not used, but not when it's used.

Yes, we need to move the test to the place where cache wouldn't have
been initialized yet or no foreign connection has been made yet in the
session.

ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
\det+

<<<<<<<<<<<<MAY BE HERE>>>>>>>>>>>>

-- Test that alteration of server options causes reconnection
-- Remote's errors might be non-English, so hide them to ensure stable results
\set VERBOSITY terse
SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1;  -- should work
ALTER SERVER loopback OPTIONS (SET dbname 'no such database');
SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1;  -- should fail
DO $d$

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/01/29 14:46, Bharath Rupireddy wrote:
> On Fri, Jan 29, 2021 at 11:08 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> On Fri, Jan 29, 2021 at 10:55 AM Fujii Masao
>> <masao.fujii@oss.nttdata.com> wrote:
>>> On 2021/01/29 14:12, Bharath Rupireddy wrote:
>>>> On Fri, Jan 29, 2021 at 10:28 AM Fujii Masao
>>>> <masao.fujii@oss.nttdata.com> wrote:
>>>>> On 2021/01/29 11:09, Tom Lane wrote:
>>>>>> Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
>>>>>>> On Fri, Jan 29, 2021 at 1:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
>>>>>>>> This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
>>>>>>>> telling us is that the patch's behavior is unstable in the face
>>>>>>>> of unexpected cache flushes.
>>>>>>
>>>>>>> Thanks a lot! It looks like the syscache invalidation messages are
>>>>>>> generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
>>>>>>> which pgfdw_inval_callback gets called many times in which the cached
>>>>>>> entries are marked as invalid and closed if they are not used in the
>>>>>>> txn. The new function postgres_fdw_get_connections outputs the
>>>>>>> information of the cached connections such as name if the connection
>>>>>>> is still open and their validity. Hence the output of the
>>>>>>> postgres_fdw_get_connections became unstable in the buildfarm member.
>>>>>>> I will further analyze making tests stable, meanwhile any suggestions
>>>>>>> are welcome.
>>>>>>
>>>>>> I do not think you should regard this as "we need to hack the test
>>>>>> to make it stable".  I think you should regard this as "this is a
>>>>>> bug".  A cache flush should not cause user-visible state changes.
>>>>>> In particular, the above analysis implies that you think a cache
>>>>>> flush is equivalent to end-of-transaction, which it absolutely
>>>>>> is not.
>>>>>>
>>>>>> Also, now that I've looked at pgfdw_inval_callback, it scares
>>>>>> the heck out of me.  Actually disconnecting a connection during
>>>>>> a cache inval callback seems quite unsafe --- what if that happens
>>>>>> while we're using the connection?
>>>>>
>>>>> If the connection is still used in the transaction, pgfdw_inval_callback()
>>>>> marks it as invalidated and doesn't close it. So I was not thinking that
>>>>> this is so unsafe.
>>>>>
>>>>> The disconnection code in pgfdw_inval_callback() was added in commit
>>>>> e3ebcca843 to fix connection leak issue, and it's back-patched. If this
>>>>> change is really unsafe, we need to revert it immediately at least from back
>>>>> branches because the next minor release is scheduled soon.
>>>>
>>>> I think we can remove disconnect_pg_server in pgfdw_inval_callback and
>>>> make entries only invalidated. Anyways, those connections can get
>>>> closed at the end of main txn in pgfdw_xact_callback. Thoughts?
>>>
>>> But this revives the connection leak issue. So isn't it better to
>>> to do that after we confirm that the current code is really unsafe?
>>
>> IMO, connections will not leak, because the invalidated connections
>> eventually will get closed in pgfdw_xact_callback at the main txn end.
>>
>> IIRC, when we were finding a way to close the invalidated connections
>> so that they don't leaked, we had two options:
>>
>> 1) let those connections (whether currently being used in the xact or
>> not) get marked invalidated in pgfdw_inval_callback and closed in
>> pgfdw_xact_callback at the main txn end as shown below
>>
>>          if (PQstatus(entry->conn) != CONNECTION_OK ||
>>              PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
>>              entry->changing_xact_state ||
>>              entry->invalidated).   ----> by adding this
>>          {
>>              elog(DEBUG3, "discarding connection %p", entry->conn);
>>              disconnect_pg_server(entry);
>>          }
>>
>> 2) close the unused connections right away in pgfdw_inval_callback
>> instead of marking them invalidated. Mark used connections as
>> invalidated in pgfdw_inval_callback and close them in
>> pgfdw_xact_callback at the main txn end.
>>
>> We went with option (2) because we thought this would ease some burden
>> on pgfdw_xact_callback closing a lot of invalid connections at once.
> 
> Also, see the original patch for the connection leak issue just does
> option (1), see [1]. But in [2] and [3], we chose option (2).
> 
> I feel, we can go for option (1), with the patch attached in [1] i.e.
> having have_invalid_connections whenever any connection gets invalided
> so that we don't quickly exit in pgfdw_xact_callback and the
> invalidated connections get closed properly. Thoughts?

Before going for (1) or something, I'd like to understand what the actual
issue of (2), i.e., the current code is. Otherwise other approaches might
have the same issue.


Regarding (1), as far as I understand correctly, even when the transaction
doesn't use foreign tables at all, it needs to scan the connection cache
entries if necessary. I was thinking to avoid this. I guess that this doesn't
work with at least the postgres_fdw 2PC patch that Sawada-san is proposing
because with the patch the commit/rollback callback is performed only
for the connections used in the transaction.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Fri, Jan 29, 2021 at 11:54 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> >> IIRC, when we were finding a way to close the invalidated connections
> >> so that they don't leaked, we had two options:
> >>
> >> 1) let those connections (whether currently being used in the xact or
> >> not) get marked invalidated in pgfdw_inval_callback and closed in
> >> pgfdw_xact_callback at the main txn end as shown below
> >>
> >>          if (PQstatus(entry->conn) != CONNECTION_OK ||
> >>              PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
> >>              entry->changing_xact_state ||
> >>              entry->invalidated).   ----> by adding this
> >>          {
> >>              elog(DEBUG3, "discarding connection %p", entry->conn);
> >>              disconnect_pg_server(entry);
> >>          }
> >>
> >> 2) close the unused connections right away in pgfdw_inval_callback
> >> instead of marking them invalidated. Mark used connections as
> >> invalidated in pgfdw_inval_callback and close them in
> >> pgfdw_xact_callback at the main txn end.
> >>
> >> We went with option (2) because we thought this would ease some burden
> >> on pgfdw_xact_callback closing a lot of invalid connections at once.
> >
> > Also, see the original patch for the connection leak issue just does
> > option (1), see [1]. But in [2] and [3], we chose option (2).
> >
> > I feel, we can go for option (1), with the patch attached in [1] i.e.
> > having have_invalid_connections whenever any connection gets invalided
> > so that we don't quickly exit in pgfdw_xact_callback and the
> > invalidated connections get closed properly. Thoughts?
>
> Before going for (1) or something, I'd like to understand what the actual
> issue of (2), i.e., the current code is. Otherwise other approaches might
> have the same issue.

The problem with option (2) is that because of CLOBBER_CACHE_ALWAYS,
pgfdw_inval_callback is getting called many times and the connections
that are not used i..e xact_depth == 0, are getting disconnected
there, so we are not seeing the consistent results for
postgres_fdw_get_connectionstest cases. If the connections are being
used within the xact, then the valid option for those connections are
being shown as false again making postgres_fdw_get_connections output
inconsistent. This is what happened on the build farm member with
CLOBBER_CACHE_ALWAYS build.

So if we go with option (1), get rid of valid state from
postgres_fdw_get_connectionstest and having the test cases inside an
explicit xact block at the beginning of the postgres_fdw.sql test
file, we don't see CLOBBER_CACHE_ALWAYS inconsistencies. I'm not sure
if this is the correct way.

> Regarding (1), as far as I understand correctly, even when the transaction
> doesn't use foreign tables at all, it needs to scan the connection cache
> entries if necessary. I was thinking to avoid this. I guess that this doesn't
> work with at least the postgres_fdw 2PC patch that Sawada-san is proposing
> because with the patch the commit/rollback callback is performed only
> for the connections used in the transaction.

You mean to say, pgfdw_xact_callback will not get called when the xact
uses no foreign server connection or is it that pgfdw_xact_callback
gets called but exits quickly from it? I'm not sure what the 2PC patch
does.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/01/29 15:44, Bharath Rupireddy wrote:
> On Fri, Jan 29, 2021 at 11:54 AM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>>>> IIRC, when we were finding a way to close the invalidated connections
>>>> so that they don't leaked, we had two options:
>>>>
>>>> 1) let those connections (whether currently being used in the xact or
>>>> not) get marked invalidated in pgfdw_inval_callback and closed in
>>>> pgfdw_xact_callback at the main txn end as shown below
>>>>
>>>>           if (PQstatus(entry->conn) != CONNECTION_OK ||
>>>>               PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
>>>>               entry->changing_xact_state ||
>>>>               entry->invalidated).   ----> by adding this
>>>>           {
>>>>               elog(DEBUG3, "discarding connection %p", entry->conn);
>>>>               disconnect_pg_server(entry);
>>>>           }
>>>>
>>>> 2) close the unused connections right away in pgfdw_inval_callback
>>>> instead of marking them invalidated. Mark used connections as
>>>> invalidated in pgfdw_inval_callback and close them in
>>>> pgfdw_xact_callback at the main txn end.
>>>>
>>>> We went with option (2) because we thought this would ease some burden
>>>> on pgfdw_xact_callback closing a lot of invalid connections at once.
>>>
>>> Also, see the original patch for the connection leak issue just does
>>> option (1), see [1]. But in [2] and [3], we chose option (2).
>>>
>>> I feel, we can go for option (1), with the patch attached in [1] i.e.
>>> having have_invalid_connections whenever any connection gets invalided
>>> so that we don't quickly exit in pgfdw_xact_callback and the
>>> invalidated connections get closed properly. Thoughts?
>>
>> Before going for (1) or something, I'd like to understand what the actual
>> issue of (2), i.e., the current code is. Otherwise other approaches might
>> have the same issue.
> 
> The problem with option (2) is that because of CLOBBER_CACHE_ALWAYS,
> pgfdw_inval_callback is getting called many times and the connections
> that are not used i..e xact_depth == 0, are getting disconnected
> there, so we are not seeing the consistent results for
> postgres_fdw_get_connectionstest cases. If the connections are being
> used within the xact, then the valid option for those connections are
> being shown as false again making postgres_fdw_get_connections output
> inconsistent. This is what happened on the build farm member with
> CLOBBER_CACHE_ALWAYS build.

But if the issue is only the inconsistency of test results,
we can go with the option (2)? Even with (2), we can make the test
stable by removing "valid" column and executing
postgres_fdw_get_connections() within the transaction?

> 
> So if we go with option (1), get rid of valid state from
> postgres_fdw_get_connectionstest and having the test cases inside an
> explicit xact block at the beginning of the postgres_fdw.sql test
> file, we don't see CLOBBER_CACHE_ALWAYS inconsistencies. I'm not sure
> if this is the correct way.
> 
>> Regarding (1), as far as I understand correctly, even when the transaction
>> doesn't use foreign tables at all, it needs to scan the connection cache
>> entries if necessary. I was thinking to avoid this. I guess that this doesn't
>> work with at least the postgres_fdw 2PC patch that Sawada-san is proposing
>> because with the patch the commit/rollback callback is performed only
>> for the connections used in the transaction.
> 
> You mean to say, pgfdw_xact_callback will not get called when the xact
> uses no foreign server connection or is it that pgfdw_xact_callback
> gets called but exits quickly from it? I'm not sure what the 2PC patch
> does.

Maybe it's chance to review the patch! ;P

BTW his patch tries to add new callback interfaces for commit/rollback of
foreign transactions, and make postgres_fdw use them instead of
XactCallback. And those new interfaces are executed only when
the transaction has started the foreign transactions.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Fri, Jan 29, 2021 at 12:36 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> On 2021/01/29 15:44, Bharath Rupireddy wrote:
> > On Fri, Jan 29, 2021 at 11:54 AM Fujii Masao
> > <masao.fujii@oss.nttdata.com> wrote:
> >>>> IIRC, when we were finding a way to close the invalidated connections
> >>>> so that they don't leaked, we had two options:
> >>>>
> >>>> 1) let those connections (whether currently being used in the xact or
> >>>> not) get marked invalidated in pgfdw_inval_callback and closed in
> >>>> pgfdw_xact_callback at the main txn end as shown below
> >>>>
> >>>>           if (PQstatus(entry->conn) != CONNECTION_OK ||
> >>>>               PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
> >>>>               entry->changing_xact_state ||
> >>>>               entry->invalidated).   ----> by adding this
> >>>>           {
> >>>>               elog(DEBUG3, "discarding connection %p", entry->conn);
> >>>>               disconnect_pg_server(entry);
> >>>>           }
> >>>>
> >>>> 2) close the unused connections right away in pgfdw_inval_callback
> >>>> instead of marking them invalidated. Mark used connections as
> >>>> invalidated in pgfdw_inval_callback and close them in
> >>>> pgfdw_xact_callback at the main txn end.
> >>>>
> >>>> We went with option (2) because we thought this would ease some burden
> >>>> on pgfdw_xact_callback closing a lot of invalid connections at once.
> >>>
> >>> Also, see the original patch for the connection leak issue just does
> >>> option (1), see [1]. But in [2] and [3], we chose option (2).
> >>>
> >>> I feel, we can go for option (1), with the patch attached in [1] i.e.
> >>> having have_invalid_connections whenever any connection gets invalided
> >>> so that we don't quickly exit in pgfdw_xact_callback and the
> >>> invalidated connections get closed properly. Thoughts?
> >>
> >> Before going for (1) or something, I'd like to understand what the actual
> >> issue of (2), i.e., the current code is. Otherwise other approaches might
> >> have the same issue.
> >
> > The problem with option (2) is that because of CLOBBER_CACHE_ALWAYS,
> > pgfdw_inval_callback is getting called many times and the connections
> > that are not used i..e xact_depth == 0, are getting disconnected
> > there, so we are not seeing the consistent results for
> > postgres_fdw_get_connectionstest cases. If the connections are being
> > used within the xact, then the valid option for those connections are
> > being shown as false again making postgres_fdw_get_connections output
> > inconsistent. This is what happened on the build farm member with
> > CLOBBER_CACHE_ALWAYS build.
>
> But if the issue is only the inconsistency of test results,
> we can go with the option (2)? Even with (2), we can make the test
> stable by removing "valid" column and executing
> postgres_fdw_get_connections() within the transaction?

Hmmm, and we should have the tests at the start of the file
postgres_fdw.sql before even we make any foreign server connections.

If okay, I can prepare the patch and run with clobber cache build locally.

> >
> > So if we go with option (1), get rid of valid state from
> > postgres_fdw_get_connectionstest and having the test cases inside an
> > explicit xact block at the beginning of the postgres_fdw.sql test
> > file, we don't see CLOBBER_CACHE_ALWAYS inconsistencies. I'm not sure
> > if this is the correct way.
> >
> >> Regarding (1), as far as I understand correctly, even when the transaction
> >> doesn't use foreign tables at all, it needs to scan the connection cache
> >> entries if necessary. I was thinking to avoid this. I guess that this doesn't
> >> work with at least the postgres_fdw 2PC patch that Sawada-san is proposing
> >> because with the patch the commit/rollback callback is performed only
> >> for the connections used in the transaction.
> >
> > You mean to say, pgfdw_xact_callback will not get called when the xact
> > uses no foreign server connection or is it that pgfdw_xact_callback
> > gets called but exits quickly from it? I'm not sure what the 2PC patch
> > does.
>
> Maybe it's chance to review the patch! ;P
>
> BTW his patch tries to add new callback interfaces for commit/rollback of
> foreign transactions, and make postgres_fdw use them instead of
> XactCallback. And those new interfaces are executed only when
> the transaction has started the foreign transactions.

IMHO, it's better to keep it as a separate discussion. I will try to
review that patch later.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/01/29 16:12, Bharath Rupireddy wrote:
> On Fri, Jan 29, 2021 at 12:36 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> On 2021/01/29 15:44, Bharath Rupireddy wrote:
>>> On Fri, Jan 29, 2021 at 11:54 AM Fujii Masao
>>> <masao.fujii@oss.nttdata.com> wrote:
>>>>>> IIRC, when we were finding a way to close the invalidated connections
>>>>>> so that they don't leaked, we had two options:
>>>>>>
>>>>>> 1) let those connections (whether currently being used in the xact or
>>>>>> not) get marked invalidated in pgfdw_inval_callback and closed in
>>>>>> pgfdw_xact_callback at the main txn end as shown below
>>>>>>
>>>>>>            if (PQstatus(entry->conn) != CONNECTION_OK ||
>>>>>>                PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
>>>>>>                entry->changing_xact_state ||
>>>>>>                entry->invalidated).   ----> by adding this
>>>>>>            {
>>>>>>                elog(DEBUG3, "discarding connection %p", entry->conn);
>>>>>>                disconnect_pg_server(entry);
>>>>>>            }
>>>>>>
>>>>>> 2) close the unused connections right away in pgfdw_inval_callback
>>>>>> instead of marking them invalidated. Mark used connections as
>>>>>> invalidated in pgfdw_inval_callback and close them in
>>>>>> pgfdw_xact_callback at the main txn end.
>>>>>>
>>>>>> We went with option (2) because we thought this would ease some burden
>>>>>> on pgfdw_xact_callback closing a lot of invalid connections at once.
>>>>>
>>>>> Also, see the original patch for the connection leak issue just does
>>>>> option (1), see [1]. But in [2] and [3], we chose option (2).
>>>>>
>>>>> I feel, we can go for option (1), with the patch attached in [1] i.e.
>>>>> having have_invalid_connections whenever any connection gets invalided
>>>>> so that we don't quickly exit in pgfdw_xact_callback and the
>>>>> invalidated connections get closed properly. Thoughts?
>>>>
>>>> Before going for (1) or something, I'd like to understand what the actual
>>>> issue of (2), i.e., the current code is. Otherwise other approaches might
>>>> have the same issue.
>>>
>>> The problem with option (2) is that because of CLOBBER_CACHE_ALWAYS,
>>> pgfdw_inval_callback is getting called many times and the connections
>>> that are not used i..e xact_depth == 0, are getting disconnected
>>> there, so we are not seeing the consistent results for
>>> postgres_fdw_get_connectionstest cases. If the connections are being
>>> used within the xact, then the valid option for those connections are
>>> being shown as false again making postgres_fdw_get_connections output
>>> inconsistent. This is what happened on the build farm member with
>>> CLOBBER_CACHE_ALWAYS build.
>>
>> But if the issue is only the inconsistency of test results,
>> we can go with the option (2)? Even with (2), we can make the test
>> stable by removing "valid" column and executing
>> postgres_fdw_get_connections() within the transaction?
> 
> Hmmm, and we should have the tests at the start of the file
> postgres_fdw.sql before even we make any foreign server connections.

We don't need to move the test if we always call postgres_fdw_disconnect_all() just before starting new transaction and
callingpostgres_fdw_get_connections() as follows?
 

SELECT 1 FROM postgres_fdw_disconnect_all();
BEGIN;
...
SELECT * FROM postgres_fdw_get_connections();
...


> 
> If okay, I can prepare the patch and run with clobber cache build locally.

Many thanks!


> 
>>>
>>> So if we go with option (1), get rid of valid state from
>>> postgres_fdw_get_connectionstest and having the test cases inside an
>>> explicit xact block at the beginning of the postgres_fdw.sql test
>>> file, we don't see CLOBBER_CACHE_ALWAYS inconsistencies. I'm not sure
>>> if this is the correct way.
>>>
>>>> Regarding (1), as far as I understand correctly, even when the transaction
>>>> doesn't use foreign tables at all, it needs to scan the connection cache
>>>> entries if necessary. I was thinking to avoid this. I guess that this doesn't
>>>> work with at least the postgres_fdw 2PC patch that Sawada-san is proposing
>>>> because with the patch the commit/rollback callback is performed only
>>>> for the connections used in the transaction.
>>>
>>> You mean to say, pgfdw_xact_callback will not get called when the xact
>>> uses no foreign server connection or is it that pgfdw_xact_callback
>>> gets called but exits quickly from it? I'm not sure what the 2PC patch
>>> does.
>>
>> Maybe it's chance to review the patch! ;P
>>
>> BTW his patch tries to add new callback interfaces for commit/rollback of
>> foreign transactions, and make postgres_fdw use them instead of
>> XactCallback. And those new interfaces are executed only when
>> the transaction has started the foreign transactions.
> 
> IMHO, it's better to keep it as a separate discussion.

Yes, of course!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Fri, Jan 29, 2021 at 1:17 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >> But if the issue is only the inconsistency of test results,
> >> we can go with the option (2)? Even with (2), we can make the test
> >> stable by removing "valid" column and executing
> >> postgres_fdw_get_connections() within the transaction?
> >
> > Hmmm, and we should have the tests at the start of the file
> > postgres_fdw.sql before even we make any foreign server connections.
>
> We don't need to move the test if we always call postgres_fdw_disconnect_all() just before starting new transaction
andcalling postgres_fdw_get_connections() as follows?
 
>
> SELECT 1 FROM postgres_fdw_disconnect_all();
> BEGIN;
> ...
> SELECT * FROM postgres_fdw_get_connections();
> ...

Yes, that works, but we cannot show true/false for the
postgres_fdw_disconnect_all output.

I will post the patch soon. Thanks a lot.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Fri, Jan 29, 2021 at 1:24 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Jan 29, 2021 at 1:17 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > >> But if the issue is only the inconsistency of test results,
> > >> we can go with the option (2)? Even with (2), we can make the test
> > >> stable by removing "valid" column and executing
> > >> postgres_fdw_get_connections() within the transaction?
> > >
> > > Hmmm, and we should have the tests at the start of the file
> > > postgres_fdw.sql before even we make any foreign server connections.
> >
> > We don't need to move the test if we always call postgres_fdw_disconnect_all() just before starting new transaction
andcalling postgres_fdw_get_connections() as follows?
 
> >
> > SELECT 1 FROM postgres_fdw_disconnect_all();
> > BEGIN;
> > ...
> > SELECT * FROM postgres_fdw_get_connections();
> > ...
>
> Yes, that works, but we cannot show true/false for the
> postgres_fdw_disconnect_all output.
>
> I will post the patch soon. Thanks a lot.

Attaching a patch that has following changes: 1) Now,
postgres_fdw_get_connections will only return set of active
connections server names not their valid state 2) The functions
postgres_fdw_get_connections, postgres_fdw_disconnect and
postgres_fdw_disconnect_all are now being tested within an explicit
xact block, this way the tests are more stable even with clobber cache
always builds.

I tested the patch here on my development system with
-DCLOBBER_CACHE_ALWAYS configuration, the tests look consistent.

Please review the patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2021/01/29 19:45, Bharath Rupireddy wrote:
> On Fri, Jan 29, 2021 at 1:24 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> On Fri, Jan 29, 2021 at 1:17 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>> But if the issue is only the inconsistency of test results,
>>>>> we can go with the option (2)? Even with (2), we can make the test
>>>>> stable by removing "valid" column and executing
>>>>> postgres_fdw_get_connections() within the transaction?
>>>>
>>>> Hmmm, and we should have the tests at the start of the file
>>>> postgres_fdw.sql before even we make any foreign server connections.
>>>
>>> We don't need to move the test if we always call postgres_fdw_disconnect_all() just before starting new transaction
andcalling postgres_fdw_get_connections() as follows?
 
>>>
>>> SELECT 1 FROM postgres_fdw_disconnect_all();
>>> BEGIN;
>>> ...
>>> SELECT * FROM postgres_fdw_get_connections();
>>> ...
>>
>> Yes, that works, but we cannot show true/false for the
>> postgres_fdw_disconnect_all output.
>>
>> I will post the patch soon. Thanks a lot.
> 
> Attaching a patch that has following changes: 1) Now,
> postgres_fdw_get_connections will only return set of active
> connections server names not their valid state 2) The functions
> postgres_fdw_get_connections, postgres_fdw_disconnect and
> postgres_fdw_disconnect_all are now being tested within an explicit
> xact block, this way the tests are more stable even with clobber cache
> always builds.
> 
> I tested the patch here on my development system with
> -DCLOBBER_CACHE_ALWAYS configuration, the tests look consistent.
> 
> Please review the patch.

Thanks for the patch!

--- Return false as loopback2 connectin is closed already.
-SELECT postgres_fdw_disconnect('loopback2');
- postgres_fdw_disconnect
--------------------------
- f
-(1 row)
-
--- Return an error as there is no foreign server with given name.
-SELECT postgres_fdw_disconnect('unknownserver');
-ERROR:  server "unknownserver" does not exist

Why do we need to remove these? These seem to work fine even in
CLOBBER_CACHE_ALWAYS.

+            /*
+             * It doesn't make sense to show this entry in the output with a
+             * NULL server_name as it will be closed at the xact end.
+             */
+            continue;

-1 with this change because I still think that it's more useful to list
all the open connections.

This makes me think that more discussion would be necessary before
changing the interface of postgres_fdw_get_connections(). On the other
hand, we should address the issue ASAP to make the buildfarm member fine.
So at first I'd like to push only the change of regression test.
Patch attached. I tested it both with CLOBBER_CACHE_ALWAYS set and unset,
and the results were stable.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment
On Sat, Jan 30, 2021 at 12:14 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> +                       /*
> +                        * It doesn't make sense to show this entry in the output with a
> +                        * NULL server_name as it will be closed at the xact end.
> +                        */
> +                       continue;
>
> -1 with this change because I still think that it's more useful to list
> all the open connections.

If postgres_fdw_get_connections doesn't have a "valid" column, then I
thought it's better not showing server_name NULL in the output. Do you
think that we need to output some fixed strings for such connections
like "<unknown server>" or "<server doesn't exist>" or "<dropped
server>" or "<server information not available>"? I'm not sure whether
we are allowed to have fixed strings as column output.

> This makes me think that more discussion would be necessary before
> changing the interface of postgres_fdw_get_connections(). On the other
> hand, we should address the issue ASAP to make the buildfarm member fine.
> So at first I'd like to push only the change of regression test.
> Patch attached. I tested it both with CLOBBER_CACHE_ALWAYS set and unset,
> and the results were stable.

Thanks, the postgres_fdw.patch looks good to me.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/01/30 9:28, Bharath Rupireddy wrote:
> On Sat, Jan 30, 2021 at 12:14 AM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> +                       /*
>> +                        * It doesn't make sense to show this entry in the output with a
>> +                        * NULL server_name as it will be closed at the xact end.
>> +                        */
>> +                       continue;
>>
>> -1 with this change because I still think that it's more useful to list
>> all the open connections.
> 
> If postgres_fdw_get_connections doesn't have a "valid" column, then I
> thought it's better not showing server_name NULL in the output.

Or if we don't have strong reason to remove "valid" column,
the current design is enough?


> Do you
> think that we need to output some fixed strings for such connections
> like "<unknown server>" or "<server doesn't exist>" or "<dropped
> server>" or "<server information not available>"? I'm not sure whether
> we are allowed to have fixed strings as column output.
> 
>> This makes me think that more discussion would be necessary before
>> changing the interface of postgres_fdw_get_connections(). On the other
>> hand, we should address the issue ASAP to make the buildfarm member fine.
>> So at first I'd like to push only the change of regression test.
>> Patch attached. I tested it both with CLOBBER_CACHE_ALWAYS set and unset,
>> and the results were stable.
> 
> Thanks, the postgres_fdw.patch looks good to me.

Thanks for checking the patch! I pushed that.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2021/01/27 10:06, Bharath Rupireddy wrote:
> On Tue, Jan 26, 2021 at 8:38 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> I will post "keep_connections" GUC and "keep_connection" server level
>> option patches later.
> 
> Attaching v19 patch set for "keep_connections" GUC and
> "keep_connection" server level option. Please review them further.

These options are no longer necessary because we now support idle_session_timeout? If we want to disconnect the foreign
serverconnections that sit on idle to prevent them from eating up the connection capacities in the foriegn servers, we
canjust set idle_session_timeout in those foreign servers. If we want to avoid the cluster-wide setting of
idle_session_timeout,we can set that per role. One issue for this approach is that the connection entry remains even
afteridle_session_timeout happens. So postgres_fdw_get_connections() returns that connection even though it's actually
closedby the timeout. Which is confusing. But which doesn't cause any actual problem, right? When the foreign table is
accessedthe next time, that connection entry is dropped, an error is detected, and then new connection will be remade.
 

Sorry I've not read the past long discussion about this feature. If there is the consensus that these options are still
necessaryand useful even when we have idle_session_timeout, please correct me.
 

ISTM that it's intuitive (at least for me) to add this kind of option into the foreign server. But I'm not sure if it's
goodidea to expose the option as GUC. Also if there is the consensus about this, please correct me.
 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Mon, Feb 1, 2021 at 12:29 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2021/01/30 9:28, Bharath Rupireddy wrote:
> > On Sat, Jan 30, 2021 at 12:14 AM Fujii Masao
> > <masao.fujii@oss.nttdata.com> wrote:
> >> +                       /*
> >> +                        * It doesn't make sense to show this entry in the output with a
> >> +                        * NULL server_name as it will be closed at the xact end.
> >> +                        */
> >> +                       continue;
> >>
> >> -1 with this change because I still think that it's more useful to list
> >> all the open connections.
> >
> > If postgres_fdw_get_connections doesn't have a "valid" column, then I
> > thought it's better not showing server_name NULL in the output.
>
> Or if we don't have strong reason to remove "valid" column,
> the current design is enough?

My only worry was that the statement from [1] "A cache flush should
not cause user-visible state changes." But the newly added function
postgres_fdw_get_connections is VOLATILE which means that the results
returned by postgres_fdw_get_connections() is also VOLATILE. Isn't
this enough, so that users will not get surprised with different
results in case invalidations occur within the server by the time they
run the query subsequent times and see different results than what
they saw in the first run?

Thoughts?

[1] https://www.postgresql.org/message-id/flat/2724627.1611886184%40sss.pgh.pa.us


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



On Mon, Feb 1, 2021 at 12:43 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2021/01/27 10:06, Bharath Rupireddy wrote:
> > On Tue, Jan 26, 2021 at 8:38 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >> I will post "keep_connections" GUC and "keep_connection" server level
> >> option patches later.
> >
> > Attaching v19 patch set for "keep_connections" GUC and
> > "keep_connection" server level option. Please review them further.
>
> These options are no longer necessary because we now support idle_session_timeout? If we want to disconnect the
foreignserver connections that sit on idle to prevent them from eating up the connection capacities in the foriegn
servers,we can just set idle_session_timeout in those foreign servers. If we want to avoid the cluster-wide setting of
idle_session_timeout,we can set that per role. One issue for this approach is that the connection entry remains even
afteridle_session_timeout happens. So postgres_fdw_get_connections() returns that connection even though it's actually
closedby the timeout. Which is confusing. But which doesn't cause any actual problem, right? When the foreign table is
accessedthe next time, that connection entry is dropped, an error is detected, and then new connection will be remade. 

First of all, idle_session_timeout is by default 0 i.e. disabled,
there are chances that users may not use that and don't want to set it
just for not caching any foreign server connection. A simple use case
where server level option can be useful is that, users are accessing
foreign tables (may be not that frequently, once in a while) from a
long running local session using foreign servers and they don't want
to keep the local session cache those connections, then setting this
server level option, keep_connections to false makes their life
easier, without having to depend on setting idle_session_timeout on
the remote server.

And, just using idle_session_timeout on a remote server may not help
us completely. Because the remote session may go away, while we are
still using that cached connection in an explicit txn on the local
session. Our connection retry will also not work because we are in the
middle of an xact, so the local explicit txn gets aborted.

So, IMO, we can still have both server level option as well as
postgres_fdw contrib level GUC (to tell the local session that "I
don't want to keep any foreign connections active" instead of setting
keep_connection server level option for each foreign server).

> Sorry I've not read the past long discussion about this feature. If there is the consensus that these options are
stillnecessary and useful even when we have idle_session_timeout, please correct me. 
>
> ISTM that it's intuitive (at least for me) to add this kind of option into the foreign server. But I'm not sure if
it'sgood idea to expose the option as GUC. Also if there is the consensus about this, please correct me. 

See here [1].

[1] - https://www.postgresql.org/message-id/f58d1df4ae58f6cf3bfa560f923462e0%40postgrespro.ru

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/02/01 16:13, Bharath Rupireddy wrote:
> On Mon, Feb 1, 2021 at 12:29 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> On 2021/01/30 9:28, Bharath Rupireddy wrote:
>>> On Sat, Jan 30, 2021 at 12:14 AM Fujii Masao
>>> <masao.fujii@oss.nttdata.com> wrote:
>>>> +                       /*
>>>> +                        * It doesn't make sense to show this entry in the output with a
>>>> +                        * NULL server_name as it will be closed at the xact end.
>>>> +                        */
>>>> +                       continue;
>>>>
>>>> -1 with this change because I still think that it's more useful to list
>>>> all the open connections.
>>>
>>> If postgres_fdw_get_connections doesn't have a "valid" column, then I
>>> thought it's better not showing server_name NULL in the output.
>>
>> Or if we don't have strong reason to remove "valid" column,
>> the current design is enough?
> 
> My only worry was that the statement from [1] "A cache flush should
> not cause user-visible state changes."

If we follow this strictly, I'm afraid that postgres_fdw_get_connections()
itself would also be a problem because the cached connections are affected
by cache flush and postgres_fdw_get_connections() shows that to users.
I'm not sure if removing "valid" column is actually helpful for that statement.

Anyway, for now we have the following options;

(1) keep the feature as it is
(2) remove "valid" column
         (2-1) show NULL for the connection whose server was dropped
    (2-2) show fixed value (e.g., <dropped>) for the connection whose server was dropped
(3) remove "valid" column and don't display connection whose server was dropped
(4) remove postgres_fdw_get_connections()

For now I like (1), but if others think "valid" column should be dropped,
I'm fine with (2). But I'd like to avoid (3) because I think that
postgres_fdw_get_connections() should list all the connections that
are actually being established. I have no strong opinion about whether
(2-1) or (2-2) is better, for now.

> But the newly added function
> postgres_fdw_get_connections is VOLATILE which means that the results
> returned by postgres_fdw_get_connections() is also VOLATILE. Isn't
> this enough, so that users will not get surprised with different
> results in case invalidations occur within the server by the time they
> run the query subsequent times and see different results than what
> they saw in the first run?

I'm not sure about this...

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2021/02/01 16:39, Bharath Rupireddy wrote:
> On Mon, Feb 1, 2021 at 12:43 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> On 2021/01/27 10:06, Bharath Rupireddy wrote:
>>> On Tue, Jan 26, 2021 at 8:38 AM Bharath Rupireddy
>>> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>>> I will post "keep_connections" GUC and "keep_connection" server level
>>>> option patches later.
>>>
>>> Attaching v19 patch set for "keep_connections" GUC and
>>> "keep_connection" server level option. Please review them further.
>>
>> These options are no longer necessary because we now support idle_session_timeout? If we want to disconnect the
foreignserver connections that sit on idle to prevent them from eating up the connection capacities in the foriegn
servers,we can just set idle_session_timeout in those foreign servers. If we want to avoid the cluster-wide setting of
idle_session_timeout,we can set that per role. One issue for this approach is that the connection entry remains even
afteridle_session_timeout happens. So postgres_fdw_get_connections() returns that connection even though it's actually
closedby the timeout. Which is confusing. But which doesn't cause any actual problem, right? When the foreign table is
accessedthe next time, that connection entry is dropped, an error is detected, and then new connection will be remade.
 
> 
> First of all, idle_session_timeout is by default 0 i.e. disabled,
> there are chances that users may not use that and don't want to set it
> just for not caching any foreign server connection. A simple use case
> where server level option can be useful is that, users are accessing
> foreign tables (may be not that frequently, once in a while) from a
> long running local session using foreign servers and they don't want
> to keep the local session cache those connections, then setting this
> server level option, keep_connections to false makes their life
> easier, without having to depend on setting idle_session_timeout on
> the remote server.

Thanks for explaining this!

I understand that use case. But I still think that we can use
idle_session_timeout for that use case without keep_connections.
Per the past discussion, Robert seems to prefer controling the cached
connection by timeout rather than boolean, at [1]. Bruce seems to think
that idle_session_timeout is enough for the use case, at [2]. So I'm not
sure what the current consensus is...

Also Alexey seems to have thought that idle_session_timeout is not
suitable for cached connection because it's the cluster-wide option, at [3].
But since it's marked as PGC_USERSET, we can set it per-role, e.g.,
by using ALTER ROLE SET, so that it can affect only the foreign server
connections.

One merit of keep_connections that I found is that we can use it even
when connecting to the older PostgreSQL that doesn't support
idle_session_timeout. Also it seems simpler to use keep_connections
rather than setting idle_session_timeout in multiple remote servers.
So I'm inclined to add this feature, but I'd like to hear more opinions.

[1]
https://www.postgresql.org/message-id/CA%2BTgmob_nF7NkBfVLUhmQ%2Bt8JGVV4hXy%2BzkuMUtTSd-%3DHPBeuA%40mail.gmail.com

[2]
https://www.postgresql.org/message-id/20200714165822.GE7628%40momjian.us

[3]
https://www.postgresql.org/message-id/6df6525ca7a4b54a4a39f55e4dd6b3e9%40postgrespro.ru


> 
> And, just using idle_session_timeout on a remote server may not help
> us completely. Because the remote session may go away, while we are
> still using that cached connection in an explicit txn on the local
> session. Our connection retry will also not work because we are in the
> middle of an xact, so the local explicit txn gets aborted.

Regarding idle_in_transaction_session_timeout, this seems true. But
I was thinking that idle_session_timeout doesn't cause this issue because
it doesn't close the connection in the middle of transaction. No?


> 
> So, IMO, we can still have both server level option as well as
> postgres_fdw contrib level GUC (to tell the local session that "I
> don't want to keep any foreign connections active" instead of setting
> keep_connection server level option for each foreign server).
> 
>> Sorry I've not read the past long discussion about this feature. If there is the consensus that these options are
stillnecessary and useful even when we have idle_session_timeout, please correct me.
 
>>
>> ISTM that it's intuitive (at least for me) to add this kind of option into the foreign server. But I'm not sure if
it'sgood idea to expose the option as GUC. Also if there is the consensus about this, please correct me.
 
> 
> See here [1].
> 
> [1] - https://www.postgresql.org/message-id/f58d1df4ae58f6cf3bfa560f923462e0%40postgrespro.ru

Thanks!


Here are some review comments.

-            (used_in_current_xact && !keep_connections))
+            (used_in_current_xact &&
+            (!keep_connections || !entry->keep_connection)))

The names of GUC and server-level option should be the same,
to make the thing less confusing?

IMO the server-level option should override GUC. IOW, GUC setting
should be used only when the server-level option is not specified.
But the above code doesn't seem to do that. Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Tue, Feb 2, 2021 at 9:45 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> One merit of keep_connections that I found is that we can use it even
> when connecting to the older PostgreSQL that doesn't support
> idle_session_timeout. Also it seems simpler to use keep_connections
> rather than setting idle_session_timeout in multiple remote servers.
> So I'm inclined to add this feature, but I'd like to hear more opinions.

Thanks.

> > And, just using idle_session_timeout on a remote server may not help
> > us completely. Because the remote session may go away, while we are
> > still using that cached connection in an explicit txn on the local
> > session. Our connection retry will also not work because we are in the
> > middle of an xact, so the local explicit txn gets aborted.
>
> Regarding idle_in_transaction_session_timeout, this seems true. But
> I was thinking that idle_session_timeout doesn't cause this issue because
> it doesn't close the connection in the middle of transaction. No?

You are right. idle_session_timeout doesn't take effect when in the
middle of an explicit txn. I missed this point.

> Here are some review comments.
>
> -                       (used_in_current_xact && !keep_connections))
> +                       (used_in_current_xact &&
> +                       (!keep_connections || !entry->keep_connection)))
>
> The names of GUC and server-level option should be the same,
> to make the thing less confusing?

We can have GUC name keep_connections as there can be multiple
connections within a local session and I can change the server level
option keep_connection to keep_connections because a single foreign
server can have multiple connections as we have seen that in the use
case identified by you. I will change that in the next patch set.

> IMO the server-level option should override GUC. IOW, GUC setting
> should be used only when the server-level option is not specified.
> But the above code doesn't seem to do that. Thought?

Note that default values for GUC and server level option are on i.e.
connections are cached.

The main intention of the GUC is to not set server level options to
false for all the foreign servers in case users don't want to keep any
foreign server connections. If the server level option overrides GUC,
then even if users set GUC to off, they have to set the server level
option to false for all the foreign servers.

So, the below code in the patch, first checks the GUC. If the GUC is
off, then discards the connections. If the GUC is on, then it further
checks the server level option. If it's off discards the connection,
otherwise not.

I would like it to keep this behaviour as is. Thoughts?

         if (PQstatus(entry->conn) != CONNECTION_OK ||
             PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
             entry->changing_xact_state ||
             entry->invalidated ||
+            (used_in_current_xact &&
+            (!keep_connections || !entry->keep_connection)))
         {
             elog(DEBUG3, "discarding connection %p", entry->conn);
             disconnect_pg_server(entry);

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




On 2021/02/03 13:56, Bharath Rupireddy wrote:
> On Tue, Feb 2, 2021 at 9:45 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> One merit of keep_connections that I found is that we can use it even
>> when connecting to the older PostgreSQL that doesn't support
>> idle_session_timeout. Also it seems simpler to use keep_connections
>> rather than setting idle_session_timeout in multiple remote servers.
>> So I'm inclined to add this feature, but I'd like to hear more opinions.
> 
> Thanks.
> 
>>> And, just using idle_session_timeout on a remote server may not help
>>> us completely. Because the remote session may go away, while we are
>>> still using that cached connection in an explicit txn on the local
>>> session. Our connection retry will also not work because we are in the
>>> middle of an xact, so the local explicit txn gets aborted.
>>
>> Regarding idle_in_transaction_session_timeout, this seems true. But
>> I was thinking that idle_session_timeout doesn't cause this issue because
>> it doesn't close the connection in the middle of transaction. No?
> 
> You are right. idle_session_timeout doesn't take effect when in the
> middle of an explicit txn. I missed this point.
> 
>> Here are some review comments.
>>
>> -                       (used_in_current_xact && !keep_connections))
>> +                       (used_in_current_xact &&
>> +                       (!keep_connections || !entry->keep_connection)))
>>
>> The names of GUC and server-level option should be the same,
>> to make the thing less confusing?
> 
> We can have GUC name keep_connections as there can be multiple
> connections within a local session and I can change the server level
> option keep_connection to keep_connections because a single foreign
> server can have multiple connections as we have seen that in the use
> case identified by you. I will change that in the next patch set.
> 
>> IMO the server-level option should override GUC. IOW, GUC setting
>> should be used only when the server-level option is not specified.
>> But the above code doesn't seem to do that. Thought?
> 
> Note that default values for GUC and server level option are on i.e.
> connections are cached.
> 
> The main intention of the GUC is to not set server level options to
> false for all the foreign servers in case users don't want to keep any
> foreign server connections. If the server level option overrides GUC,
> then even if users set GUC to off, they have to set the server level
> option to false for all the foreign servers.

Maybe my explanation in the previous email was unclear. What I think is; If the server-level option is explicitly
specified,its setting is used whatever GUC is. On the other hand, if the server-level option is NOT specified, GUC
settingis used. For example, if we define the server as follows, GUC setting is used because the server-level option is
NOTspecified.
 

     CREATE SERVER loopback FOREIGN DATA WRAPPER postgres;

If we define the server as follows, the server-level setting is used.

     CREATE SERVER loopback FOREIGN DATA WRAPPER postgres OPTIONS (keep_connections 'on');


For example, log_autovacuum_min_duration GUC and reloption work in the similar way. That is, reloption setting
overridesGUC. If reltion is not specified, GUC is used.
 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Wed, Feb 3, 2021 at 4:22 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> Maybe my explanation in the previous email was unclear. What I think is; If the server-level option is explicitly
specified,its setting is used whatever GUC is. On the other hand, if the server-level option is NOT specified, GUC
settingis used. For example, if we define the server as follows, GUC setting is used because the server-level option is
NOTspecified. 
>
>      CREATE SERVER loopback FOREIGN DATA WRAPPER postgres;
>
> If we define the server as follows, the server-level setting is used.
>
>      CREATE SERVER loopback FOREIGN DATA WRAPPER postgres OPTIONS (keep_connections 'on');

Attaching v20 patch set. Now, server level option if provided
overrides the GUC.The GUC will be used only if server level option is
not provided. And also, both server level option and GUC are named the
same - "keep_connections".

Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2021/02/22 14:55, Bharath Rupireddy wrote:
> On Thu, Feb 4, 2021 at 9:36 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> On Wed, Feb 3, 2021 at 4:22 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> Maybe my explanation in the previous email was unclear. What I think is; If the server-level option is explicitly
specified,its setting is used whatever GUC is. On the other hand, if the server-level option is NOT specified, GUC
settingis used. For example, if we define the server as follows, GUC setting is used because the server-level option is
NOTspecified.
 
>>>
>>>       CREATE SERVER loopback FOREIGN DATA WRAPPER postgres;
>>>
>>> If we define the server as follows, the server-level setting is used.
>>>
>>>       CREATE SERVER loopback FOREIGN DATA WRAPPER postgres OPTIONS (keep_connections 'on');
>>
>> Attaching v20 patch set. Now, server level option if provided
>> overrides the GUC.The GUC will be used only if server level option is
>> not provided. And also, both server level option and GUC are named the
>> same - "keep_connections".
>>
>> Please have a look.
> 
> Attaching v21 patch set, rebased onto the latest master.

I agree to add the server-level option. But I'm still not sure if it's good idea to also expose that option as GUC.
Isn'tthe server-level option enough for most cases?
 

Also it's strange to expose only this option as GUC while there are other many postgres_fdw options?

With v21-002 patch, even when keep_connections GUC is disabled, the existing open connections are not close
immediately.Only connections used in the transaction are closed at the end of that transaction. That is, the existing
connectionsthat no transactions use will never be closed. I'm not sure if this behavior is intuitive for users.
 

Therefore for now I'm thinking to support the server-level option at first... Then if we find it's not enough for most
casesin practice, I'd like to consider to expose postgres_fdw options including keep_connections as GUC.
 

Thought?

BTW these patches fail to be applied to the master because of commit 27e1f14563. I updated and simplified the 003
patch.Patch attached.
 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment
On Thu, Apr 1, 2021 at 8:56 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> > Attaching v21 patch set, rebased onto the latest master.
>
> I agree to add the server-level option. But I'm still not sure if it's good idea to also expose that option as GUC.
Isn'tthe server-level option enough for most cases? 
>
> Also it's strange to expose only this option as GUC while there are other many postgres_fdw options?
>
> With v21-002 patch, even when keep_connections GUC is disabled, the existing open connections are not close
immediately.Only connections used in the transaction are closed at the end of that transaction. That is, the existing
connectionsthat no transactions use will never be closed. I'm not sure if this behavior is intuitive for users. 
>
> Therefore for now I'm thinking to support the server-level option at first... Then if we find it's not enough for
mostcases in practice, I'd like to consider to expose postgres_fdw options including keep_connections as GUC. 
>
> Thought?

+1 to have only a server-level option for now and if the need arises
we could expose it as a GUC.

> BTW these patches fail to be applied to the master because of commit 27e1f14563. I updated and simplified the 003
patch.Patch attached. 

Thanks for updating the patch. It looks good to me. Just a minor
change, instead of using "true" and "off" for the option, I used "on"
and "off" in the docs. Attaching v23.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

On 2021/04/02 1:13, Bharath Rupireddy wrote:
> On Thu, Apr 1, 2021 at 8:56 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>> Attaching v21 patch set, rebased onto the latest master.
>>
>> I agree to add the server-level option. But I'm still not sure if it's good idea to also expose that option as GUC.
Isn'tthe server-level option enough for most cases?
 
>>
>> Also it's strange to expose only this option as GUC while there are other many postgres_fdw options?
>>
>> With v21-002 patch, even when keep_connections GUC is disabled, the existing open connections are not close
immediately.Only connections used in the transaction are closed at the end of that transaction. That is, the existing
connectionsthat no transactions use will never be closed. I'm not sure if this behavior is intuitive for users.
 
>>
>> Therefore for now I'm thinking to support the server-level option at first... Then if we find it's not enough for
mostcases in practice, I'd like to consider to expose postgres_fdw options including keep_connections as GUC.
 
>>
>> Thought?
> 
> +1 to have only a server-level option for now and if the need arises
> we could expose it as a GUC.
> 
>> BTW these patches fail to be applied to the master because of commit 27e1f14563. I updated and simplified the 003
patch.Patch attached.
 
> 
> Thanks for updating the patch. It looks good to me. Just a minor
> change, instead of using "true" and "off" for the option, I used "on"
> and "off" in the docs. Attaching v23.

Thanks a lot! Barring any objection, I will commit this version.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




On 2021/04/02 2:22, Fujii Masao wrote:
> Thanks a lot! Barring any objection, I will commit this version.

Pushed. Thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION