Thread: pg_rewind vs superuser

pg_rewind vs superuser

From

Magnus Hagander

Date:

03 April 2019, 09:28:50

As pointed out by Michael Banck as a comment on my blogpost, the pg_rewind documentation says it requires superuser permissions on the remote server.

Is that really so, though? I haven't tested it, but from a quick look at the code it looks like it needs pg_ls_dir(), pg_stat_file() and pg_read_binary_file(), all, of which are independently grantable.

Or am I missing something?

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: pg_rewind vs superuser

From

Michael Paquier

Date:

04 April 2019, 04:11:29

On Wed, Apr 03, 2019 at 11:28:50AM +0200, Magnus Hagander wrote:
> As pointed out by Michael Banck as a comment on my blogpost, the pg_rewind
> documentation says it requires superuser permissions on the remote server.
>
> Is that really so, though? I haven't tested it, but from a quick look at
> the code it looks like it needs pg_ls_dir(), pg_stat_file() and
> pg_read_binary_file(), all, of which are independently grantable.
>
> Or am I missing something?

Somebody I heard of has mentioned that stuff on his blog some time
ago:
https://paquier.xyz/postgresql-2/postgres-11-superuser-rewind/

And what you need to do is just that:
CREATE USER rewind_user LOGIN;
GRANT EXECUTE ON function pg_catalog.pg_ls_dir(text, boolean, boolean)
TO rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_stat_file(text, boolean) TO
rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text) TO
rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint,
bigint, boolean) TO rewind_user;

I think that we should document that and back-patch, as now the docs
only say that a superuser should be used, but that is wrong.

At the same time, let's also document that we need to use a checkpoint
on the promoted standby so as the control file gets a refresh and
pg_rewind is able to work properly.  I promised that some time ago and
got reminded of that issue after seeing this thread...

What do you think about the attached?
--
Michael

Attachment

Re: pg_rewind vs superuser

From

Magnus Hagander

Date:

04 April 2019, 08:18:45

On Thu, Apr 4, 2019 at 6:11 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Apr 03, 2019 at 11:28:50AM +0200, Magnus Hagander wrote:
> As pointed out by Michael Banck as a comment on my blogpost, the pg_rewind
> documentation says it requires superuser permissions on the remote server.
>
> Is that really so, though? I haven't tested it, but from a quick look at
> the code it looks like it needs pg_ls_dir(), pg_stat_file() and
> pg_read_binary_file(), all, of which are independently grantable.
>
> Or am I missing something?

Somebody I heard of has mentioned that stuff on his blog some time
ago:
https://paquier.xyz/postgresql-2/postgres-11-superuser-rewind/

Hah. I usually read your blog, but I had forgotten about that one :)

And what you need to do is just that:
CREATE USER rewind_user LOGIN;
GRANT EXECUTE ON function pg_catalog.pg_ls_dir(text, boolean, boolean)
TO rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_stat_file(text, boolean) TO
rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text) TO
rewind_user;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint,
bigint, boolean) TO rewind_user;

I think that we should document that and back-patch, as now the docs
only say that a superuser should be used, but that is wrong.

At the same time, let's also document that we need to use a checkpoint
on the promoted standby so as the control file gets a refresh and
pg_rewind is able to work properly. I promised that some time ago and
got reminded of that issue after seeing this thread...

What do you think about the attached?

Looks good. Maybe we should list the "role having sufficient permissions" before superuser, "just because", but not something I feel strongly about.

The part about CHECKPOINT also looks pretty good, but that's entirely unrelated, right? :)

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: pg_rewind vs superuser

From

Michael Paquier

Date:

04 April 2019, 10:43:10

On Thu, Apr 04, 2019 at 10:18:45AM +0200, Magnus Hagander wrote:
> Looks good. Maybe we should list the "role having sufficient permissions"
> before superuser, "just because", but not something I feel strongly about.

Listing the superuser after sounds fine to me.

> The part about CHECKPOINT also looks pretty good, but that's entirely
> unrelated, right? :)

Completely unrelated, but as we are on this part of the documentation
now, and as we discussed that stuff face-to-face last September where
I actually promised to write a patch without doing it for seven
months, I see no problems to tackle this issue as well now.  Better
later than never :)

I would like to apply this down to 9.5 for the checkpoint part and
down to 11 for the role part, so if anybody has any comments, please
feel free.
--
Michael

Attachment

signature.asc

Re: pg_rewind vs superuser

From

Magnus Hagander

Date:

04 April 2019, 11:19:44

On Thu, Apr 4, 2019 at 12:43 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Apr 04, 2019 at 10:18:45AM +0200, Magnus Hagander wrote:
> Looks good. Maybe we should list the "role having sufficient permissions"
> before superuser, "just because", but not something I feel strongly about.

Listing the superuser after sounds fine to me.

> The part about CHECKPOINT also looks pretty good, but that's entirely
> unrelated, right? :)

Completely unrelated, but as we are on this part of the documentation
now, and as we discussed that stuff face-to-face last September where
I actually promised to write a patch without doing it for seven
months, I see no problems to tackle this issue as well now. Better
later than never :)

:) Nope, I definitely think we need to include that.

I would like to apply this down to 9.5 for the checkpoint part and
down to 11 for the role part, so if anybody has any comments, please
feel free.

All of it, or just the checkpoint part? I assume just the checkpoint part? AFAIK it does require superuser in those earlier versions?

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: pg_rewind vs superuser

From

Michael Paquier

Date:

05 April 2019, 01:40:34

On Thu, Apr 04, 2019 at 01:19:44PM +0200, Magnus Hagander wrote:
> All of it, or just the checkpoint part? I assume just the checkpoint part?
> AFAIK it does require superuser in those earlier versions?

I meant of course the checkpoint part down to 9.5, and the rest down
to 11, so done this way.
--
Michael

Attachment

signature.asc

Re: pg_rewind vs superuser

From

Michael Banck

Date:

05 April 2019, 07:41:58

Hi,

On Thu, Apr 04, 2019 at 01:11:29PM +0900, Michael Paquier wrote:
> At the same time, let's also document that we need to use a checkpoint
> on the promoted standby so as the control file gets a refresh and
> pg_rewind is able to work properly.  I promised that some time ago and
> got reminded of that issue after seeing this thread...

Is there a good reason why Postgres doesn't just issue a CHECKPOINT
after promote itself? After all, this seems to be about making the
control file having the proper content, which sounds like a good thing
to have in general.

Could this be a problem for anything else besides pg_rewind?

This looks like a needless footgun waiting to happen, and just
documenting it in pg_rewind's notes section looks a bit too hidden to me
(but is certainly an improvement).

Michael

Re: pg_rewind vs superuser

From

Michael Paquier

Date:

05 April 2019, 07:56:32

On Fri, Apr 05, 2019 at 09:41:58AM +0200, Michael Banck wrote:
> Is there a good reason why Postgres doesn't just issue a CHECKPOINT
> after promote itself? After all, this seems to be about making the
> control file having the proper content, which sounds like a good thing
> to have in general.

The startup process requests a checkpoint since 9.3, and before that
it was doing the checkpoint by itself (grep for fast_promoted and
RequestCheckpoint() around line 7579 in xlog.c).  This allows the
recovery to finish much faster.

> Could this be a problem for anything else besides pg_rewind?

Not that I know of, at least not in the tree.

> This looks like a needless footgun waiting to happen, and just
> documenting it in pg_rewind's notes section looks a bit too hidden to me
> (but is certainly an improvement).

We had a couple of reports on the matter over the past years.  Perhaps
we could use a big fat warning but that feels a bit overdoing it.
--
Michael

Attachment

signature.asc

Re: pg_rewind vs superuser

From

Magnus Hagander

Date:

05 April 2019, 07:59:29

On Fri, Apr 5, 2019 at 9:56 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Apr 05, 2019 at 09:41:58AM +0200, Michael Banck wrote:
> Is there a good reason why Postgres doesn't just issue a CHECKPOINT
> after promote itself? After all, this seems to be about making the
> control file having the proper content, which sounds like a good thing
> to have in general.

The startup process requests a checkpoint since 9.3, and before that
it was doing the checkpoint by itself (grep for fast_promoted and
RequestCheckpoint() around line 7579 in xlog.c). This allows the
recovery to finish much faster.

> Could this be a problem for anything else besides pg_rewind?

Not that I know of, at least not in the tree.

> This looks like a needless footgun waiting to happen, and just
> documenting it in pg_rewind's notes section looks a bit too hidden to me
> (but is certainly an improvement).

We had a couple of reports on the matter over the past years. Perhaps
we could use a big fat warning but that feels a bit overdoing it.

A related question is, could we (for 12+) actually make the problem go away? As in, can we detect the state and just have pg_rewind issue the checkpoint as needed?

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: pg_rewind vs superuser

From

Michael Paquier

Date:

05 April 2019, 08:06:01

On Fri, Apr 05, 2019 at 09:59:29AM +0200, Magnus Hagander wrote:
> A related question is, could we (for 12+) actually make the problem go
> away? As in, can we detect the state and just have pg_rewind issue the
> checkpoint as needed?

I am not sure as you can still bump into the legit case where one is
trying to rewind an instance which is on the same timeline as the
source, and nothing should happen in this case.
--
Michael

Attachment

signature.asc

Re: pg_rewind vs superuser

From

Magnus Hagander

Date:

05 April 2019, 08:11:22

On Fri, Apr 5, 2019 at 10:06 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Apr 05, 2019 at 09:59:29AM +0200, Magnus Hagander wrote:
> A related question is, could we (for 12+) actually make the problem go
> away? As in, can we detect the state and just have pg_rewind issue the
> checkpoint as needed?

I am not sure as you can still bump into the legit case where one is
trying to rewind an instance which is on the same timeline as the
source, and nothing should happen in this case.

If that is the case, would running a CHECKPOINT actually cause a problem?

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: pg_rewind vs superuser

From

Michael Banck

Date:

05 April 2019, 08:39:26

Hi,

On Fri, Apr 05, 2019 at 04:56:32PM +0900, Michael Paquier wrote:
> On Fri, Apr 05, 2019 at 09:41:58AM +0200, Michael Banck wrote:
> > Is there a good reason why Postgres doesn't just issue a CHECKPOINT
> > after promote itself? After all, this seems to be about making the
> > control file having the proper content, which sounds like a good thing
> > to have in general.
>
> The startup process requests a checkpoint since 9.3, and before that
> it was doing the checkpoint by itself (grep for fast_promoted and
> RequestCheckpoint() around line 7579 in xlog.c).  This allows the
> recovery to finish much faster.

Ok, so the problem is that that checkpoint might be still ongoing when
you quickly issue a pg_rewind from the other side? If not, can you
explain once more the actual problem?

In any case, the updated documentation says:

|When executing pg_rewind using an online cluster as source which has
|been recently promoted, it is necessary to execute a CHECKPOINT after
|promotion so as its control file reflects up-to-date timeline
|information

I think it might be useful to specify more exactly which of the two
servers (the remote one AIUI) needs a CHECKPOINT in the abvoe. Also, if
it is the case that a CHECKPOINT is done automatically (see above), that
paragraph could be rewritten to say something like "pg_rewind needs to
wait for the checkoint on the remote server to finish. This can be
ensured by issueing an explicit checkpoint on the remote server prior to
running pg_rewind."

Finally, (and still, if I got the above correctly), to the suggestion of
Magnus of pg_rewind running the checkpoint itself on the remote: would
that again mean that pg_rewind needs SUPERUSER rights or is there
a(nother) GRANTable function that could be added to the list in this
case?

Sorry for being a bit dense here.

Michael

Re: pg_rewind vs superuser

From

Michael Paquier

Date:

05 April 2019, 10:59:30

On Fri, Apr 05, 2019 at 10:11:22AM +0200, Magnus Hagander wrote:
> If that is the case, would running a CHECKPOINT actually cause a problem?

If you exclude the point that it may not be necessary and the
potential extra I/O, no.  However we would come back to the point of
pg_rewind requiring a superuser :(
--
Michael

Attachment

signature.asc

Re: pg_rewind vs superuser

From

Michael Paquier

Date:

05 April 2019, 11:05:29

On Fri, Apr 05, 2019 at 10:39:26AM +0200, Michael Banck wrote:
> Ok, so the problem is that that checkpoint might be still ongoing when
> you quickly issue a pg_rewind from the other side?

The end-of-recovery checkpoint may not have even begun.

> I think it might be useful to specify more exactly which of the two
> servers (the remote one AIUI) needs a CHECKPOINT in the above. Also, if
> it is the case that a CHECKPOINT is done automatically (see above), that
> paragraph could be rewritten to say something like "pg_rewind needs to
> wait for the checkoint on the remote server to finish. This can be
> ensured by issueing an explicit checkpoint on the remote server prior to
> running pg_rewind."

Well, the target server needs to be cleanly shut down, so it seems
pretty clear to me which one needs to have a checkpoint :)

> Finally, (and still, if I got the above correctly), to the suggestion of
> Magnus of pg_rewind running the checkpoint itself on the remote: would
> that again mean that pg_rewind needs SUPERUSER rights or is there
> a(nother) GRANTable function that could be added to the list in this
> case?

pg_rewind would require again a superuser.  So this could be
optional.  In one HA workflow I maintain, what I actually do is to
enforce directly a checkpoint immediately after the promotion is done
to make sure that the data is up-to-date, and I don't meddle with
pg_rewind workflow.
--
Michael

Attachment

signature.asc

Re: pg_rewind vs superuser

From

Magnus Hagander

Date:

07 April 2019, 13:06:56

On Fri, Apr 5, 2019 at 1:05 PM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Apr 05, 2019 at 10:39:26AM +0200, Michael Banck wrote:
> Ok, so the problem is that that checkpoint might be still ongoing when
> you quickly issue a pg_rewind from the other side?

The end-of-recovery checkpoint may not have even begun.

So can we *detect* that this is the case? Because if so, we could perhaps just wait for it to be done? Because there will always be one?

The main point is -- we know from experience that it's pretty fragile to assume the user read the documentation :) So if we can find *any* way to handle this in code rather than docs, that'd be great. We would still absolutely want the docs change for back branches of course.

> I think it might be useful to specify more exactly which of the two
> servers (the remote one AIUI) needs a CHECKPOINT in the above. Also, if
> it is the case that a CHECKPOINT is done automatically (see above), that
> paragraph could be rewritten to say something like "pg_rewind needs to
> wait for the checkoint on the remote server to finish. This can be
> ensured by issueing an explicit checkpoint on the remote server prior to
> running pg_rewind."

Well, the target server needs to be cleanly shut down, so it seems
pretty clear to me which one needs to have a checkpoint :)

Clear to you and us of course, but quite possibly not to everybody. I'm sure there are a *lot* of users out there who do not realize that "cleanly shut down" means "ran a checkpoint just before it shut down".

> Finally, (and still, if I got the above correctly), to the suggestion of
> Magnus of pg_rewind running the checkpoint itself on the remote: would
> that again mean that pg_rewind needs SUPERUSER rights or is there
> a(nother) GRANTable function that could be added to the list in this
> case?

pg_rewind would require again a superuser. So this could be

Ugh, you are right of course.

optional. In one HA workflow I maintain, what I actually do is to
enforce directly a checkpoint immediately after the promotion is done
to make sure that the data is up-to-date, and I don't meddle with
pg_rewind workflow.

Sure. And every other HA setup also has to take care of it. That's why it would make sense to centralize it into the tool itself when it's *mandatory* to deal with it somehow.

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: pg_rewind vs superuser

From

Michael Paquier

Date:

08 April 2019, 06:17:25

On Sun, Apr 07, 2019 at 03:06:56PM +0200, Magnus Hagander wrote:
> So can we *detect* that this is the case? Because if so, we could perhaps
> just wait for it to be done? Because there will always be one?

Yes, this one is technically possible.  We could add a timeout option
which checks each N seconds the control file of the online source and
sees if its timeline differs or not with the target, waiting for the
change to happen.  If we do that, we may want to revisit the behavior
of not issuing an error if the source and the target are detected as
being on the same timeline, and consider it as a failure.

> The main point is -- we know from experience that it's pretty fragile to
> assume the user read the documentation :) So if we can find *any* way to
> handle this in code rather than docs, that'd be great. We would still
> absolutely want the docs change for back branches of course.

Any veeeeery recent experience on the matter perhaps? :)
--
Michael

Attachment

signature.asc

Re: pg_rewind vs superuser

From

Kyotaro HORIGUCHI

Date:

08 April 2019, 07:14:42

At Mon, 8 Apr 2019 15:17:25 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190408061725.GF2712@paquier.xyz>
> On Sun, Apr 07, 2019 at 03:06:56PM +0200, Magnus Hagander wrote:
> > So can we *detect* that this is the case? Because if so, we could perhaps
> > just wait for it to be done? Because there will always be one?
> 
> Yes, this one is technically possible.  We could add a timeout option
> which checks each N seconds the control file of the online source and
> sees if its timeline differs or not with the target, waiting for the
> change to happen.  If we do that, we may want to revisit the behavior
> of not issuing an error if the source and the target are detected as
> being on the same timeline, and consider it as a failure.
> 
> > The main point is -- we know from experience that it's pretty fragile to
> > assume the user read the documentation :) So if we can find *any* way to
> > handle this in code rather than docs, that'd be great. We would still
> > absolutely want the docs change for back branches of course.
> 
> Any veeeeery recent experience on the matter perhaps? :)

I (am not Magnus) saw a similar but a bit different case. Just
after master's promote, standby was killed in immediate mode
after catching up to master's latest TLI but before restartpoint
finished. They are in different TLIs in control data so *the
tool* decides to try pg_rewind. Restart->shutdown (*1) sequence
for cleanup made standby catch up to the master's TLI but their
histories have diverged from each other in the latest TLI. Of
course, pg_rewind says "no need to rewind since they're on the
same TLI". The subsequent replication starts from the segment
beginning and overwrote the WAL records already applied on the
standby. The result was a broken database. I suspect that it is
the result of a kind of misoperation and sane operation won't
cause the situation, but such situation could be "cleaned up" if
pg_rewind did the work for a replication set on the same TLI.

I haven't find exactly what happend yet in the case.

*1: It is somewhat strange, that recovery reaches to the next TLI
 despite that I heard that the restart is in non-standby,
 non-recovery mode.. Something should be wrong.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_rewind vs superuser

From

Peter Eisentraut

Date:

08 April 2019, 08:03:48

On 2019-04-04 12:43, Michael Paquier wrote:
> I would like to apply this down to 9.5 for the checkpoint part and
> down to 11 for the role part, so if anybody has any comments, please
> feel free.

How about some tests to show that this is actually true?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_rewind vs superuser

From

Michael Paquier

Date:

09 April 2019, 01:38:19

On Mon, Apr 08, 2019 at 10:03:48AM +0200, Peter Eisentraut wrote:
> How about some tests to show that this is actually true?

Sure.  With something like the attached?  I don't think that there is
much point to complicate the test code with multiple roles if the
default is a superuser.
--
Michael

Attachment

Re: pg_rewind vs superuser

From

Michael Paquier

Date:

11 April 2019, 04:14:46

On Tue, Apr 09, 2019 at 10:38:19AM +0900, Michael Paquier wrote:
> Sure.  With something like the attached?  I don't think that there is
> much point to complicate the test code with multiple roles if the
> default is a superuser.

As this topic differs from the original thread, I haev started a new
thread, so let's discuss the proposed patch there:
https://www.postgresql.org/message-id/20190411041336.GM2728@paquier.xyz
--
Michael

Attachment

signature.asc

Re: pg_rewind vs superuser

From

Magnus Hagander

Date:

11 April 2019, 08:33:13

On Mon, Apr 8, 2019 at 8:17 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Apr 07, 2019 at 03:06:56PM +0200, Magnus Hagander wrote:
> So can we *detect* that this is the case? Because if so, we could perhaps
> just wait for it to be done? Because there will always be one?

Yes, this one is technically possible. We could add a timeout option
which checks each N seconds the control file of the online source and
sees if its timeline differs or not with the target, waiting for the
change to happen. If we do that, we may want to revisit the behavior
of not issuing an error if the source and the target are detected as
being on the same timeline, and consider it as a failure.

I think doing something like that would be a good idea.

I mean, we should *always* detect if if we can, since it's a condition where things don't work properly.

And I think it would make sense to wait by default, but we could then also have a commandline parameter that says "don't wait, instead error out in case the checkpoint isn't done".

Or something like that?

> The main point is -- we know from experience that it's pretty fragile to
> assume the user read the documentation :) So if we can find *any* way to
> handle this in code rather than docs, that'd be great. We would still
> absolutely want the docs change for back branches of course.

Any veeeeery recent experience on the matter perhaps? :)

Actually no, I've been considering it for some time due to the number of questions we get on it that get exactly the same answer. And then you doing the docs patch reminded me of it :)

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: pg_rewind vs superuser

From

Michael Paquier

Date:

11 April 2019, 13:33:17

On Thu, Apr 11, 2019 at 10:33:13AM +0200, Magnus Hagander wrote:
> And I think it would make sense to wait by default, but we could then also
> have a commandline parameter that says "don't wait, instead error out in
> case the checkpoint isn't done".
>
> Or something like that?

Yes, that would be the idea.  You still need to cover the case where
both instances are on the same timeline, in which case you could wait
for a checkpoint forever, so we'd need to change the current behavior
a bit by making sure that we always throw an error if both nodes are
still on the same timeline after the timeout (see TAP test
005_same_timeline.pl).  I am not sure that you need a separate option
to control the case where you don't want to wait though.  Perhaps we
could have a separate switch, but a user could also just set
--timeout=0 to match that behavior.
--
Michael

Attachment

signature.asc