RE: [Proposal] Add foreign-server health checks infrastructure - Mailing list pgsql-hackers

From kuroda.hayato@fujitsu.com
Subject RE: [Proposal] Add foreign-server health checks infrastructure
Date
Msg-id TYAPR01MB5866D5297140DA5908ECFD27F5269@TYAPR01MB5866.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [Proposal] Add foreign-server health checks infrastructure  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: [Proposal] Add foreign-server health checks infrastructure  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
Dear Fujii-san,

Thank you for reviewing! I attached the latest version.

> When more than one FDWs are used, even if one FDW calls this function to
> disable the timeout, its remote-server-check-callback can still be called. Is this
> OK?
> Please imagine the case where two FDWs are used and they registered their
> callbacks. Even when one FDW calls TryDisableRemoteServerCheckingTimeout(),
> if another FDW has not called that yet, the timeout is still being enabled. If the
> timeout is triggered during that period, even the callback registered by the FDW
> that has already called TryDisableRemoteServerCheckingTimeout() would be
> called.

Indeed and it should be avoided. I added a counter to CheckingRemoteServersCallbackItem.
The register function returns the registered item, and it must be set as the argument for
enable and disable functions.
Callback functions will be called when item->counter is larger than zero.

> +            if (remote_servers_connection_check_interval > 0)
> +            {
> +                CallCheckingRemoteServersCallbacks();
> +
>     enable_timeout_after(CHECKING_REMOTE_SERVERS_TIMEOUT,
> +
> remote_servers_connection_check_interval);
> 
> LockErrorCleanup() needs to be called before reporting the error, doesn't it?

You are right. I think this suggests that error-reporting should be done in the core layer.
For meaningful error reporting, I changed a callback specification
that it should return ForeignServer* which points to downed remote server.

> This can cause an error even while DoingCommandRead == true. Is that safe?

I read codes again and I think it is not safe. It is OK when whereToSendOutput is DestRemote,
but not good in InteractiveBackend().

I changed that if-statement for CheckingRemoteServersTimeoutPending is moved just after
ClientConnectionLost, because the that may throw a FATAL error and disconnect from client.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Add DBState to pg_control_system function
Next
From: Bharath Rupireddy
Date:
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message