Re: [patch] demote - Mailing list pgsql-hackers

From Jehan-Guillaume de Rorthais
Subject Re: [patch] demote
Date
Msg-id 20200618121627.1f21e376@firost
Whole thread Raw
In response to Re: [patch] demote  (Andres Freund <andres@anarazel.de>)
Responses Re: [patch] demote
List pgsql-hackers
On Wed, 17 Jun 2020 11:14:47 -0700
Andres Freund <andres@anarazel.de> wrote:

> Hi,
>
> On 2020-06-17 17:44:51 +0200, Jehan-Guillaume de Rorthais wrote:
> > As Amul sent a patch about "ALTER SYSTEM READ ONLY"[1], with similar futur
> > objectives than mine, I decided to share the humble patch I am playing with
> > to step down an instance from primary to standby status.
>
> To make sure we are on the same page: What your patch intends to do is
> to leave the server running, but switch from being a primary to
> replicating from another system. Correct?

Yes. The instance status is retrograded from "in production" to "in archive
recovery".

Of course, it will start replicating depending on archive_mode/command and
primary_conninfo setup.

> > Main difference with Amul's patch is that all backends must be disconnected
> > to process with the demote. Either we wait for them to disconnect (smart)
> > or we kill them (fast). This makes life much easier from the code point of
> > view, but much more naive as well. Eg. calling "SELECT pg_demote('fast')"
> > as an admin would kill the session, with no options to wait for the action
> > to finish, as we do with pg_promote(). Keeping read only session around
> > could probably be achieved using global barrier as Amul did, but without
> > all the complexity related to WAL writes prohibition.
>
> FWIW just doing that for normal backends isn't sufficient, you also have
> to deal with bgwriter, checkpointer, ... triggering WAL writes (FPWs due
> to hint bits, the checkpoint record, and some more).

In fact, the patch relies on existing code path in the state machine. The
startup process is started when the code enters in PM_NO_CHILDREN state. This
state is set when «These other guys should be dead already» as stated in the
code:

            /* These other guys should be dead already */
            Assert(StartupPID == 0);
            Assert(WalReceiverPID == 0);
            Assert(BgWriterPID == 0);
            Assert(CheckpointerPID == 0);
            Assert(WalWriterPID == 0);
            Assert(AutoVacPID == 0);
            /* syslogger is not considered here */
            pmState = PM_NO_CHILDREN;

> > There's still some questions in the current patch. As I wrote, it's an
> > humble patch, a proof of concept, a bit naive.
> >
> > Does it worth discussing it and improving it further or do I miss something
> > obvious in this design that leads to a dead end?
>
> I don't think there's a fundamental issue, but I think it needs to deal
> with a lot more things than it does right now. StartupXLOG doesn't
> currently deal correctly with subsystems that are already
> initialized. And your patch doesn't make it so as far as I can tell.

If you are talking about bgwriter, checkpointer, etc, as far as I understand
the current state machine, my patch actually deal with them.

Thank you for your feedback!

I'll study how hard it would be to keep read only backends around during the
demote step.

Regards,



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Parallel Seq Scan vs kernel read ahead
Next
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: [Patch] ALTER SYSTEM READ ONLY