Re: [Proposal] Add foreign-server health checks infrastructure - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: [Proposal] Add foreign-server health checks infrastructure |
Date | |
Msg-id | 20220217.170832.627054018291482509.horikyota.ntt@gmail.com Whole thread Raw |
In response to | RE: [Proposal] Add foreign-server health checks infrastructure ("kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com>) |
Responses |
RE: [Proposal] Add foreign-server health checks infrastructure
|
List | pgsql-hackers |
Hi, Kuroda-san. At Thu, 17 Feb 2022 04:11:09 +0000, "kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com> wrote in > Dear Horiguchi-san, > > Thank you for giving your suggestions. I want to confirm your saying. > > > FWIW, I'm not sure this feature necessarily requires core support > > dedicated to FDWs. The core have USER_TIMEOUT feature already and > > FDWs are not necessarily connection based. It seems better if FDWs > > can implement health check feature without core support and it seems > > possible. Or at least the core feature should be more generic and > > simpler. Why don't we just expose InTransactionHealthCheckCallbacks or > > something and operating functions on it? > > I understood that core is too complicated and FDW side is too stupid, right? I don't think the FDW side is stupid but seem too complex for the benefit. And just think that maybe we don't need the core part. > > Mmm. AFAICS the running command will stop with "canceling statement > > due to user request", which is a hoax. We need a more decent message > > there. > > +1 about better messages. > > > I understand that the motive of this patch is "to avoid wasted long > > local work when fdw-connection dies". > > Yeah your understanding is right. > > > In regard to the workload in > > your first mail, it is easily avoided by ending the transaction as soon > > as remote access ends. This feature doesn't work for the case "begin; > > <long local query>; <fdw access>". But the same measure also works in > > that case. So the only case where this feature is useful is "begin; > > <fdw-access>; <some long work>; <fdw-access>; end;". But in the first > > place how frequently do you expecting remote-connection close happens? > > If that happens so frequently, you might need to recheck the system > > health before implementing this feature. Since it is correctly > > detected when something really went wrong, I feel that it is a bit too > > complex for the usefulness especially for the core part. > > Thanks for analyzing motivation. > Indeed, some cases may be resolved by separating tx and this event rarely happens. > > > In conclusion, as my humble opinion I would like to propose to reduce > > this feature to: > > > > - Just periodically check health (in any aspect) of all live > > connections regardless of the session state. > > I understood here as removing following mechanism from core: > > * disable timeout at end of tx. > * skip if held off or read commands I think we're on the same page. Anyway query cancel interrupt is ignored while rading input. > > - If an existing connection is found to be dead, just try canceling > > the query (or sending query cancel). > > One issue with it is how to show the decent message for the query > > cancel, but maybe we can have a global variable that suggests the > > reason for the cancel. > > Currently I have no good idea for that but I'll try. However, I would like to hear others' opnions about the direction, of course. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: