Re: allow specifying action when standby encounters incompatible parameter settings - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: allow specifying action when standby encounters incompatible parameter settings
Date
Msg-id 20220414.113611.1900283723994151474.horikyota.ntt@gmail.com
Whole thread Raw
In response to allow specifying action when standby encounters incompatible parameter settings  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: allow specifying action when standby encounters incompatible parameter settings  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
At Wed, 13 Apr 2022 14:35:21 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in 
> Hi hackers,
> 
> As of 15251c0, when a standby encounters an incompatible parameter change,
> it pauses replay so that read traffic can continue while the administrator
> fixes the parameters.  Once the server is restarted, replay can continue.
> Before this change, such incompatible parameter changes caused the standby
> to immediately shut down.
> 
> I noticed that there was some suggestion in the thread associated with
> 15251c0 [0] for making this behavior configurable, but there didn't seem to
> be much interest at the time.  I am interested in allowing administrators
> to specify the behavior before 15251c0 (i.e., immediately shut down the
> standby when an incompatible parameter change is detected).  The use-case I
> have in mind is when an administrator has automation in place for adjusting
> these parameters and would like to avoid stopping replay any longer than
> necessary.  FWIW this is what we do in RDS.
> 
> I've attached a patch that adds a new GUC where users can specify the
> action to take when an incompatible parameter change is detected on a
> standby.  For now, there are just two options: 'pause' and 'shutdown'.
> This new GUC is largely modeled after recovery_target_action.

The overall direction of going to shutdown without needing user
interaction seems fine.  I think the same can be done by
timeout. That is, we provide a GUC named like
insufficient_standby_setting_shutdown_timeout (mmm. too long..), then
recovery sits down for the duration then shuts down. -1 means the
current behavior, 0 means what this patch is going to
introduce. However I don't see a concrete use case of the timeout.

> I initially set out to see if it was possible to automatically adjust these
> parameters on a standby, but that is considerably more difficult.  It isn't
> enough to just hook into the restart_after_crash functionality since it
> doesn't go back far enough in the postmaster logic.  IIUC we'd need to
> reload preloaded libraries (which there is presently no support for),
> recalculate MaxBackends, etc.  Another option I considered was to

Sure.

> automatically adjust the parameters during startup so that you just need to
> restart the server.  However, we need to know for sure that the server is
> going to be a hot standby, and I don't believe we have that information
> where such GUC changes would need to occur (I could be wrong about this).

Conldn't we use AlterSystemSetConfigFile for this purpose in
CheckRequiredParameterValues?

> Anyway, for now I'm just proposing the modest change described above, but
> I'd welcome any discussion about improving matters further in this area.
> 
> [0] https://postgr.es/m/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b%402ndquadrant.com

Is the reason for the enum the extensibility to add a new choice like
"auto-adjust"?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Intermittent buildfarm failures on wrasse
Next
From: Noah Misch
Date:
Subject: Re: Intermittent buildfarm failures on wrasse