Home > mailing lists

Re: [Proposal] Add foreign-server health checks infrastructure - Mailing list pgsql-hackers

From	Önder Kalacı
Subject	Re: [Proposal] Add foreign-server health checks infrastructure
Date	October 3, 2022 17:19:10
Msg-id	CACawEhXAy4gjJ4G3-59Xv_yb8Xbtn1VMYP8SOj-0C9dG2zbAcQ@mail.gmail.com Whole thread Raw
In response to	RE: [Proposal] Add foreign-server health checks infrastructure ("kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com>)
Responses	RE: [Proposal] Add foreign-server health checks infrastructure
List	pgsql-hackers

Tree view

Hi Hayato Kuroda,

If the checking function is called not periodically but GetConnection(),
it means that the health of foreign servers will be check only when remote connections are used.
So following workload described in [1] cannot handle the issue.

BEGIN --- remote operations--- local operations --- COMMIT

As far as I can see this patch is mostly useful for detecting the failures on the initial remote command. This is especially common when the remote server does a failover/switchover and postgres_fdw uses a cached connection to access to the remote server.

The remote server failures within a transaction block sounds like a much less common problem. Still, you can give a good error message before COMMIT is sent to the remote server.

I agree that this doesn't solve the issue you described, but I'm not sure if it is worthwhile to fix such a problem.

> Can we have this function/logic on Postgres core, so that other extensions
> can also use?

I was not sure about any use-case, but I think it can because it does quite general things.
Is there any good motivation to do that?

I think any extension that deals with multiple Postgres nodes can benefit from such logic. In fact, the reason I realized this patch is that on the Citus extension, we are trying to solve a similar problem [1], [2].

Thinking even more, I think any extension that uses libpq and WaitEventSets can benefit from such a function.

> Do you see any performance implications of creating/freeing waitEventSets
> per check? I wonder if we can somehow re-use the same waitEventSet by
> modifyWaitEvent? I guess no, but still, if this check causes a performance
> implication, can we somehow cache 1 waitEventSet per connection?

I have not tested yet, but I agreed this will be caused performance decrease.
In next version first I will re-use the event set anyway, and it must be considered later.
Actually I'm not sure your suggestion,
but you mean to say that we can add a hash table that associates PGconn and WaitEventSet, right?

I think it also depends on where you decide to put pgfdw_connection_check_internal(). If you prefer the postgres_fdw side, could we maybe use ConnCacheEntry in contrib/postgres_fdw/connection.c?

But if you decide to put it into the Postgres side, the API for pgfdw_connection_check_internal() -- or equivalent function -- could be discussed. Do we pass a WaitEventSet and if it is NULL create a new one, else use what is passed to the function? Not sure, maybe you can come up with a better API.

Thanks,

Onder KALACI

[1]: Check WL_SOCKET_CLOSED before using any connection by onderkalaci · Pull Request #6259 · citusdata/citus (github.com)

[2]: Add a single connection retry in MULTI_CONNECTION_LOST state by marcocitus · Pull Request #6283 · citusdata/citus (github.com)

pgsql-hackers by date:

From: "Jonathan S. Katz"
Date: 03 October 2022, 17:07:33
Subject: PostgreSQL 15 RC2 + GA release dates

From: vignesh C
Date: 03 October 2022, 18:11:07
Subject: Miscellaneous tab completion issue fixes

Re: [Proposal] Add foreign-server health checks infrastructure - Mailing list pgsql-hackers

Previous

Next