Home > mailing lists

Re: [Proposal] Add foreign-server health checks infrastructure - Mailing list pgsql-hackers

From	Önder Kalacı
Subject	Re: [Proposal] Add foreign-server health checks infrastructure
Date	October 12, 2022 16:30:34
Msg-id	CACawEhW19nPfbFpvfke9eidFDxAy+ic36wmY0s936T=xzxgHog@mail.gmail.com Whole thread Raw
In response to	RE: [Proposal] Add foreign-server health checks infrastructure ("kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com>)
Responses	RE: [Proposal] Add foreign-server health checks infrastructure
List	pgsql-hackers

Tree view

Hi,

Sounds reasonable. Do you mean that we can add additional GUC like "postgres_fdw.initial_check",
wait WL_SOCKET_CLOSED if the conneciton is found in the hash table, and do reconnection if it might be closed, right?

Alright, it took me sometime to realize that postgres_fdw already has a retry mechanism if the first command fails: postgres_fdw: reestablish new connection if cached one is detected as… · postgres/postgres@32a9c0b (github.com)

Still, the reestablish mechanism can be further simplified with WL_SOCKET_CLOSED event such as the following (where we should probably rename pgfdw_connection_check_internal):

/*
* If the connection needs to be remade due to invalidation, disconnect as
* soon as we're out of all transactions.
*/

| +bool remoteSocketIsClosed = entry->conn != NULL : pgfdw_connection_check_internal(entry->conn) : false;

if (entry->conn != NULL && (entry->invalidated || remoteSocketIsClosed) && entry->xact_depth == 0)

{
elog(DEBUG3, "closing connection %p for option changes to take effect",
entry->conn);
disconnect_pg_server(entry);
}

| +else if (remoteSocketIsClosed && && entry->xact_depth > 0)

| + error ("Remote Server is down ...")

In other words, a variation of pgfdw_connection_check_internal() could potentially go into interfaces/libpq/libpq-fe.h (backend/libpq/pqcomm.c or src/interfaces/libpq/fe-connect.c). Then, GetConnection() in postgres_fdw, it can force to reconnect as it is already done for some cases or error properly:

Based on off and on discussions, I modified my patch.

I still think that it is probably too much work/code to detect the mentioned use-case you described on [1]. Each backend constantly calling CallCheckingRemoteServersCallbacks() for this purpose doesn't sound the optimal way to approach the "check whether server down" problem. You typically try to decide whether a server is down by establishing a connection (or ping etc), not going over all the existing connections.

As far as I can think of, it should probably be a single background task checking whether the server is down. If so, sending an invalidation message to all the backends such that related backends could act on the invalidation and throw an error. This is to cover the use-case you described on [1].

Also, maybe we could have a new catalog table like pg_foreign_server_health or such, where we can keep the last time the check succeeded (and/or failed), and how many times the check succeeded (and/or failed).

This is of course how I would approach this problem. I think some other perspectives on this would be very useful to hear.

Thanks,

Onder KALACI

[1] https://www.postgresql.org/message-id/TYAPR01MB58662809E678253B90E82CE5F5889%40TYAPR01MB5866.jpnprd01.prod.outlook.com

pgsql-hackers by date:

From: Pavel Stehule
Date: 12 October 2022, 16:26:46
Subject: Re: Schema variables - new implementation for Postgres 15

From: Zhang Mingli
Date: 12 October 2022, 16:39:11
Subject: Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

Re: [Proposal] Add foreign-server health checks infrastructure - Mailing list pgsql-hackers

Previous

Next