On Thu, Jun 18, 2020 at 11:56 AM Jehan-Guillaume de Rorthais
<jgdr@dalibo.com> wrote:
> Considering the current demote patch improvement. I was considering to digg in
> the following direction:
>
> * add a new state in the state machine where all backends are idle
> * this new state forbid any new writes, the same fashion we do on standby nodes
> * this state could either wait for end of xact, or cancel/kill
> RW backends, in the same fashion current smart/fast stop do
> * from this state, we might then rollback pending prepared xact, stop other
> sub-process etc (as the current patch does), and demote safely to
> PM_RECOVERY or PM_HOT_STANDBY (depending on the setup).
>
> Is it something worth considering?
> Maybe the code will be so close from ASRO, it would just be kind of a fusion of
> both patch? I don't know, I didn't look at the ASRO patch yet.
I don't think that the postmaster state machine is the interesting
part of this problem. The tricky parts have to do with updating shared
memory state, and with updating per-backend private state. For
example, snapshots are taken in a different way during recovery than
they are in normal operation, hence SnapshotData's takenDuringRecovery
member. And I think that we allocate extra shared memory space for
storing the data that those snapshots use if, and only if, the server
starts up in recovery. So if the server goes backward from normal
running into recovery, we might not have the space that we need in
shared memory to store the extra data, and even if we had the space it
might not be populated correctly, and the code that takes snapshots
might not be written properly to handle multiple transitions between
recovery and normal running, or even a single backward transition.
In general, there's code scattered all throughout the system that
assumes the recovery -> normal running transition is one-way. If we go
back into recovery by killing off all backends and reinitializing
shared memory, then we don't have to worry about that stuff. If we do
anything less than that, we have to find all the code that relies on
never reentering recovery and fix it all. Now it's also true that we
have to do some other things, like restarting the startup process, and
stopping things like autovacuum, and the postmaster may need to be
involved in some of that. There's clearly some engineering work there,
but I think it's substantially less than the amount of engineering
work involved in fixing problems with shared memory contents and
backend-local state.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company