Re: Function to promote standby servers - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Function to promote standby servers
Date
Msg-id 20181019000810.GC2099@paquier.xyz
Whole thread Raw
In response to Re: Function to promote standby servers  (Laurenz Albe <laurenz.albe@cybertec.at>)
Responses Re: Function to promote standby servers  (Laurenz Albe <laurenz.albe@cybertec.at>)
List pgsql-hackers
On Mon, Oct 15, 2018 at 04:16:02PM +0200, Laurenz Albe wrote:
> Michael Paquier wrote:
>> Let's remove this restriction at code level, and instead use
>> function-level ACLs so as it is possible to allow non-superusers to
>> trigger a promotion as well, say a dedicated role.  We could add a
>> system role for this purpose, but I am not sure that it is worth the
>> addition yet.
>
> Agreed, done.

>> As of now, this function returns true if promotion has been triggered,
>> and false if not.  However it seems to me that we should have something
>> more consistent with "pg_ctl promote", so there are actually three
>> possible states:
>> 1) Node is in recovery, with promotion triggered.  pg_promote() returns
>> true in case of success in creating the trigger file.
>> 2) Node is in recovery, with promotion triggered.  pg_promote() returns
>> false in case of failure in creating the trigger file.
>> 3) Node is not in recovery, where I think a call to pg_promote should
>> trigger an error.  This is not handled in the patch.
>
> So far I had returned "false" in the last case, but I am fine with
> throwing an error in that case.  Done.

Thanks, that looks correct to me.  I think that consistency with pg_ctl
is quite important, as this is a feature present for many releases now.

>> At the end this patch needs a bit more work.
>
> Yes, it did.  Thanks for the thorough review!

+       /* wait for up to a minute for promotion */
+       for (i = 0; i < WAITS_PER_SECOND * WAIT_SECONDS; ++i)
+       {
+               if (!RecoveryInProgress())
+                       PG_RETURN_BOOL(true);
+
+               pg_usleep(1000000L / WAITS_PER_SECOND);
+       }
I would recommend to avoid pg_usleep and instead use a WaitLatch() or
similar to generate a wait event.  The wait can then also be seen in
pg_stat_activity, which is useful for monitoring purposes.  Using
RecoveryInProgress is indeed doable, and that's more simple than what I
thought first.

Something I missed to mention in the previous review: the timeout should
be manually enforceable, with a default at 60s.

Is the function marked as strict?  Per the code it should be, I am not
able to test now though.

You are missing REVOKE EXECUTE ON FUNCTION pg_promote() in
system_views.sql, or any users could trigger a promotion, no?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Function to promote standby servers
Next
From: Michael Paquier
Date:
Subject: Re: lowering pg_regress privileges on Windows