RE: [Proposal] Add foreign-server health checks infrastructure - Mailing list pgsql-hackers

From kuroda.hayato@fujitsu.com
Subject RE: [Proposal] Add foreign-server health checks infrastructure
Date
Msg-id TYAPR01MB5866CE34430424A588F60FC2F55D9@TYAPR01MB5866.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: [Proposal] Add foreign-server health checks infrastructure  ("kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com>)
Responses RE: [Proposal] Add foreign-server health checks infrastructure
RE: [Proposal] Add foreign-server health checks infrastructure
List pgsql-hackers
Dear Önder, Fujii-san,

Thanks for giving many comments! Based on off and on discussions, I modified my patch.
Note that this patch-set did not contain all of fixes you said.

[Modified]

1.
QueryCancelPending was cleaned up when the query cancel was skipped.
In the previous version the QueryCancelPending was set to true again even if the ereport(ERROR) was skipped.
It was because I thought the transaction that accessed to foreign servers but one of them crashed should be aborted
immediately.
But currenlty postgres does not have such "delayed" query cancel mechanism, so I followed the manner.

Another approach is documenting like "If QueryCancelPending is already filled, the callback function must send SIGINT
signal".
This seems simpler for us, but it may be troublesome for FDW authors. How do you think?

2.
The new filed servername was added to the hash entry. It was needed
because the server name must be logged in the pgfdw_connection_check(), but according to [1],
the catalog should not be read in ProcessInterrupts() to avoid the race condition.
The string is pstrdup()'d when the entry has been created, and replaced when ALTER SERER RENAME has been executed.

3.
The manner to implement the health check function was documented.

4.
raise() was replaced to kill(), because it was not used in core code. 

5.
APIs to handle QueryCancelMessage were added to avoid overriding the message.

6.
Some checks for socket was added in the checking function.

[1]: https://www.postgresql.org/message-id/CA%2BTgmoYW_rSOW4JMQ9_0Df9PKQ%3DsQDOKUGA4Gc9D8w4wui8fSA%40mail.gmail.com



[Unmodified]

a.
The checking function is still in the postgres_fdw.
This is because I could not find any good file to move this function.
IIUC this function needs PGconn as an argument,
but it seems that it is declared in libpq-fe.h and it should not be included from core. 

b.
In the checking function, the eventset is still created, added, and freed.
Firstly I tried to declare eventset as inner-function static variable and reuse that, but I could not do. This is
because:

* The size of eventset cannot be determined when the function is called for the first time.
  The size must be set in CreateWaitEventSet(), and it must be more than the number of remote connection.
* To reuse the eventset the added event must be removed every time, but it cannot.
  The API ModifyWaitEvent() can be only used for "modifying" the event, not "removing".
  If the function is called as events=0, we will get assertion failure in WaitEventAdjustEpoll() or something.

To avoid creating/freeing the event set, hash table must be modified but it has not done yet.

c.
Currently the status of socket is not checked in the GetConnection().
I agreed it may be useful, but I thought it might be out of scope. It could not meet the requirements.
I think it can be added separately.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [Commitfest 2022-09] Date is Over.
Next
From: John Naylor
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum