Re: [Proposal] Add foreign-server health checks infrastructure - Mailing list pgsql-hackers

From Önder Kalacı
Subject Re: [Proposal] Add foreign-server health checks infrastructure
Date
Msg-id CACawEhXAy4gjJ4G3-59Xv_yb8Xbtn1VMYP8SOj-0C9dG2zbAcQ@mail.gmail.com
Whole thread Raw
In response to RE: [Proposal] Add foreign-server health checks infrastructure  ("kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com>)
Responses RE: [Proposal] Add foreign-server health checks infrastructure  ("kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
Hi Hayato Kuroda,

 
If the checking function is called not periodically but GetConnection(),
it means that the health of foreign servers will be check only when remote connections are used.
So following workload described in [1] cannot handle the issue.

BEGIN --- remote operations--- local operations --- COMMIT


As far as I can see this patch is mostly useful for detecting the failures on the initial remote command. This is especially common when the remote server does a failover/switchover and postgres_fdw uses a cached connection to access to the remote server.

The remote server failures within a transaction block sounds like a much less common problem. Still, you can give a good error message before COMMIT is sent to the remote server.
 
I agree that this doesn't solve the issue you described, but I'm not sure if it is worthwhile to fix such a problem.


> Can we have this function/logic on Postgres core, so that other extensions
> can also use?

I was not sure about any use-case, but I think it can because it does quite general things.
Is there any good motivation to do that?

I think any extension that deals with multiple Postgres nodes can benefit from such logic. In fact, the reason I realized this patch is that on the Citus extension, we are trying to solve a similar problem [1], [2].

Thinking even more, I think any extension that uses libpq and WaitEventSets can benefit from such a function. 


> Do you see any performance implications of creating/freeing waitEventSets
> per check? I wonder if we can somehow re-use the same waitEventSet by
> modifyWaitEvent? I guess no, but still, if this check causes a performance
> implication, can we somehow cache 1 waitEventSet per connection?

I have not tested yet, but I agreed this will be caused performance decrease.
In next version first I will re-use the event set anyway, and it must be considered later.
Actually I'm not sure your suggestion,
but you mean to say that we can add a hash table that associates  PGconn and WaitEventSet,  right?


I think it also depends on where you decide to put pgfdw_connection_check_internal(). If you prefer the postgres_fdw side, could we maybe use ConnCacheEntry in contrib/postgres_fdw/connection.c? 

But if you decide to put it into the Postgres side, the API for pgfdw_connection_check_internal() -- or equivalent function -- could be discussed. Do we pass a WaitEventSet and if it is NULL create a new one, else use what is passed to the function? Not sure, maybe you can come up with a better API.
 
Thanks,
Onder KALACI

pgsql-hackers by date:

Previous
From: "Jonathan S. Katz"
Date:
Subject: PostgreSQL 15 RC2 + GA release dates
Next
From: vignesh C
Date:
Subject: Miscellaneous tab completion issue fixes