Re: [Proposal] Add foreign-server health checks infrastructure - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: [Proposal] Add foreign-server health checks infrastructure
Date
Msg-id 20220216.115058.1615761004338301624.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: [Proposal] Add foreign-server health checks infrastructure  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses RE: [Proposal] Add foreign-server health checks infrastructure  ("kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
At Tue, 1 Feb 2022 23:51:54 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> This logic sounds complicated to me. I'm afraid that FDW developers
> may a bit easily misunderstand the logic and make the bug in their
> FDW.

> Isn't it simpler to just disable the timeout in core whenever the
> transaction ends whether committed or aborted, like statement_timeout
> is disabled after each command? For example, call something like
> DisableForeignCheckTimeout() in CommitTransaction() etc.

> This approach seems to assume that FDW must manage all the
> ForeignServer information so that the callback can return it. Is this
> assumption valid for all the FDW?

FWIW, I'm not sure this feature necessarily requires core support
dedicated to FDWs.  The core have USER_TIMEOUT feature already and
FDWs are not necessarily connection based.  It seems better if FDWs
can implement health check feature without core support and it seems
possible.  Or at least the core feature should be more generic and
simpler. Why don't we just expose InTransactionHealthCheckCallbacks or
something and operating functions on it?


> How about making FDW trigger a query cancel interrupt by signaling
> SIGINT to the backend, instead?

Mmm. AFAICS the running command will stop with "canceling statement
due to user request", which is a hoax.  We need a more decent message
there.

I understand that the motive of this patch is "to avoid wasted long
local work when fdw-connection dies".  In regard to the workload in
your first mail, it is easily avoided by ending the transaction as soon
as remote access ends. This feature doesn't work for the case "begin;
<long local query>; <fdw access>". But the same measure also works in
that case.  So the only case where this feature is useful is "begin;
<fdw-access>; <some long work>; <fdw-access>; end;".  But in the first
place how frequently do you expecting remote-connection close happens?
If that happens so frequently, you might need to recheck the system
health before implementing this feature.  Since it is correctly
detected when something really went wrong, I feel that it is a bit too
complex for the usefulness especially for the core part.


In conclusion, as my humble opinion I would like to propose to reduce
this feature to:

- Just periodically check health (in any aspect) of all live
  connections regardless of the session state.

- If an existing connection is found to be dead, just try canceling
  the query (or sending query cancel).

One issue with it is how to show the decent message for the query
cancel, but maybe we can have a global variable that suggests the
reason for the cancel.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: adding 'zstd' as a compression algorithm
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: [PATCH] Fix out-of-bouds access (src/common/wchar.c)