Thread: Pg_rewind cannot load history wal

Pg_rewind cannot load history wal

From

Richard Schmidt

Date:

02 August 2018, 00:09:30

We have been struggling to get pr_rewind to work. In desperation we have been trying to make the use case as simple as possibleJ

We have two databases servers running Postgres 10 on two different machine in the normal Primary/Standby configuration.

Both machines write their WAL archive logs to the same shared drive (called /ice_dev/wal_archive).

The configuration has the following terms

archive_mode = always

archive_command = 'test ! -f /ice-dev/wal_archive/%f && cp %p /ice-dev/wal_archive/%f'

full_page_writes = on

wal_log_hints = on

Checksums are enabled

Our procedure that runs on machine A and B is as follows:

Build new databases on A and B, and configure A as Primary and B as Standby databases.
Make some changes to the A (the primary) and check that they are replicated to the B (the standby)
Promote B to be the new primary
Switch of the A (the original primary)
Add the replication slot to B (the new primary) for A (soon to be standby)
Add a recovery.conf to A (soon to be standby). File contains recovery_target_timeline = 'latest' and restore_command = 'cp /ice-dev/wal_archive/%f "%p"
Run pg_rewind on A – this appears to work as it returns the message ‘source and target cluster are on the same timeline no rewind required’;
Start up server A (now a slave)

At this point A is in a read only mode but not replicating. Its logs contain the following repeating message

2018-08-01 20:30:58 UTC [7257]: [1] user=,db=,app=,client= FATAL: could not start WAL streaming: ERROR: requested starting point 0/6000000 on timeline 1 is not in this server's history

DETAIL: This server's history forked from timeline 1 at 0/57639D0.

cp: cannot stat ‘/ice-dev/wal_archive/00000002.history’: No such file or directory

cp: cannot stat ‘/ice-dev/wal_archive/00000003.history’: No such file or directory

cp: cannot stat ‘/ice-dev/wal_archive/00000002.history’: No such file or directory

2018-08-01 20:30:58 UTC [6840]: [48] user=,db=,app=,client= LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/6000098

cp: cannot stat ‘/ice-dev/wal_archive/000000010000000000000006’: No such file or directory

We can see the 00000002.history file in B’s wal directory…..but it never appears in the wal_archive directory – not even if we issue a checkout or even restart the server.

00000003.history does not appear to exist on either of the machines.

Any ideas what we are doing wrong?

Thanks. Richard

This email and any attachments may contain confidential information. If you are not the intended recipient, your use or communication of the information is strictly prohibited. If you have received this message in error please notify MetService immediately.

Re: Pg_rewind cannot load history wal

From

Abhinav Mehta

Date:

02 August 2018, 08:00:33

Whenever you do switch-over, postgres-wal creates a new timeline, which simplifies managing PITR process.

During switch-over(promoting B as master) you had some delta records written to A, that’s where it causes this timeline issue.

Now since A had some delta records, it can’t replicate from B and hence you are getting that issue.

Now once your master A can’t become slave of B.

— A

On 02-Aug-2018, at 2:39 AM, Richard Schmidt <Richard.Schmidt@metservice.com> wrote:

We have been struggling to get pr_rewind to work. In desperation we have been trying to make the use case as simple as possibleJ

We have two databases servers running Postgres 10 on two different machine in the normal Primary/Standby configuration.
Both machines write their WAL archive logs to the same shared drive (called /ice_dev/wal_archive).
The configuration has the following terms
   archive_mode = always
archive_command = 'test ! -f /ice-dev/wal_archive/%f && cp %p /ice-dev/wal_archive/%f'
full_page_writes = on
wal_log_hints = on
Checksums are enabled

Our procedure that runs on machine A and B is as follows:

Build new databases on A and B, and configure A as Primary and B as Standby databases.
Make some changes to the A (the primary) and check that they are replicated to the B (the standby)
Promote B to be the new primary
Switch of the A (the original primary)
Add the replication slot to B (the new primary) for A (soon to be standby)
Add a recovery.conf to A (soon to be standby). File contains recovery_target_timeline = 'latest' and restore_command = 'cp /ice-dev/wal_archive/%f "%p"
Run pg_rewind on A – this appears to work as it returns the message ‘source and target cluster are on the same timeline no rewind required’;
Start up server A (now a slave)

At this point A is in a read only mode but not replicating. Its logs contain the following repeating message

2018-08-01 20:30:58 UTC [7257]: [1] user=,db=,app=,client= FATAL: could not start WAL streaming: ERROR: requested starting point 0/6000000 on timeline 1 is not in this server's history
        DETAIL: This server's history forked from timeline 1 at 0/57639D0.
cp: cannot stat ‘/ice-dev/wal_archive/00000002.history’: No such file or directory
cp: cannot stat ‘/ice-dev/wal_archive/00000003.history’: No such file or directory
cp: cannot stat ‘/ice-dev/wal_archive/00000002.history’: No such file or directory
2018-08-01 20:30:58 UTC [6840]: [48] user=,db=,app=,client= LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/6000098
cp: cannot stat ‘/ice-dev/wal_archive/000000010000000000000006’: No such file or directory

We can see the 00000002.history file in B’s wal directory…..but it never appears in the wal_archive directory – not even if we issue a checkout or even restart the server.
00000003.history does not appear to exist on either of the machines.

Any ideas what we are doing wrong?
Thanks. Richard

This email and any attachments may contain confidential information. If you are not the intended recipient, your use or communication of the information is strictly prohibited. If you have received this message in error please notify MetService immediately.

FW: Pg_rewind cannot load history wal

From

Richard Schmidt

Date:

03 August 2018, 02:48:53

> Now once your master A can’t become slave of B.

Isn’t that the exact situation that pg_rewind should take care of?

Re: Pg_rewind cannot load history wal

From

Michael Paquier

Date:

03 August 2018, 23:59:22

On Wed, Aug 01, 2018 at 09:09:30PM +0000, Richard Schmidt wrote:
> Our procedure that runs on machine A and B is as follows:
>
>   1.  Build new databases on A and B, and configure A as Primary and B
>   as Standby databases.
>   2.  Make some changes to the A (the primary) and check that they are
>   replicated to the B (the standby)
>   3.  Promote B to be the new primary
>   4.  Switch of the A (the original primary)
>   5.  Add the replication slot to B (the new primary) for A (soon to
>   be standby)
>   6.  Add a recovery.conf to A (soon to be standby). File contains
>   recovery_target_timeline = 'latest' and restore_command = 'cp
>   /ice-dev/wal_archive/%f "%p"
>   7.  Run pg_rewind on A - this appears to work as it returns the
>   message 'source and target cluster are on the same timeline no
>   rewind required';
>   8.  Start up server A (now a slave)

Step 7 is incorrect here, after promotion of B you should see pg_rewind
actually do its work.  The problem is that you are missing a piece in
your flow in the shape of a checkpoint on the promoted standby to run
after 3 and before step 7.  This makes the promoted standby update its
timeline number in the on-disk control file, which is used by pg_rewind
to check if a rewind needs to happen or not.

We see too many reports of such mistakes, I am going to propose a patch
on the -hackers mailing list to mention that in the documentation...
--
Michael

Attachment

signature.asc

Re: Pg_rewind cannot load history wal

From

Simon Riggs

Date:

04 August 2018, 09:44:59

On 3 August 2018 at 21:59, Michael Paquier <michael@paquier.xyz> wrote:
> On Wed, Aug 01, 2018 at 09:09:30PM +0000, Richard Schmidt wrote:
>> Our procedure that runs on machine A and B is as follows:
>>
>>   1.  Build new databases on A and B, and configure A as Primary and B
>>   as Standby databases.
>>   2.  Make some changes to the A (the primary) and check that they are
>>   replicated to the B (the standby)
>>   3.  Promote B to be the new primary
>>   4.  Switch of the A (the original primary)
>>   5.  Add the replication slot to B (the new primary) for A (soon to
>>   be standby)
>>   6.  Add a recovery.conf to A (soon to be standby). File contains
>>   recovery_target_timeline = 'latest' and restore_command = 'cp
>>   /ice-dev/wal_archive/%f "%p"
>>   7.  Run pg_rewind on A - this appears to work as it returns the
>>   message 'source and target cluster are on the same timeline no
>>   rewind required';
>>   8.  Start up server A (now a slave)
>
> Step 7 is incorrect here, after promotion of B you should see pg_rewind
> actually do its work.  The problem is that you are missing a piece in
> your flow in the shape of a checkpoint on the promoted standby to run
> after 3 and before step 7.  This makes the promoted standby update its
> timeline number in the on-disk control file, which is used by pg_rewind
> to check if a rewind needs to happen or not.
>
> We see too many reports of such mistakes, I am going to propose a patch
> on the -hackers mailing list to mention that in the documentation...

I think the problem is that writing the online checkpoint is deferred
after promotion, so this is a timing issue that probably doesn't show
in our regression tests.

Sounds like we should write a pending timeline change to the control
file and have pg_rewind check that instead.

I'd call this a timing bug, not a doc issue.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Pg_rewind cannot load history wal

From

Michael Paquier

Date:

04 August 2018, 09:56:54

On Sat, Aug 04, 2018 at 07:44:59AM +0100, Simon Riggs wrote:
> I think the problem is that writing the online checkpoint is deferred
> after promotion, so this is a timing issue that probably doesn't show
> in our regression tests.

Somewhat.  It is a performance improvement of 9.3 to let the startup
request a checkpoint to the checkpointer process instead of doing it
itself.

> Sounds like we should write a pending timeline change to the control
> file and have pg_rewind check that instead.
>
> I'd call this a timing bug, not a doc issue.

Well, having pg_rewind enforce a checkpoint on the promoted standby
could cause a performance hit as well if we do it mandatorily as if
there is delay between the promotion and the rewind triggerring a
checkpoint could have already happen.  So it is for me a documentation
bug first regarding the failover workflow, and potentially a patch for a
new feature which makes pg_rewind trigger directly a checkpoint.
--
Michael

Attachment

signature.asc

Re: Pg_rewind cannot load history wal

From

Simon Riggs

Date:

04 August 2018, 12:54:22

On 4 August 2018 at 07:56, Michael Paquier <michael@paquier.xyz> wrote:
> On Sat, Aug 04, 2018 at 07:44:59AM +0100, Simon Riggs wrote:
>> I think the problem is that writing the online checkpoint is deferred
>> after promotion, so this is a timing issue that probably doesn't show
>> in our regression tests.
>
> Somewhat.  It is a performance improvement of 9.3 to let the startup
> request a checkpoint to the checkpointer process instead of doing it
> itself.

Yes, and so issuing a manual CHECKPOINT would remove that benefit.

>> Sounds like we should write a pending timeline change to the control
>> file and have pg_rewind check that instead.
>>
>> I'd call this a timing bug, not a doc issue.
>
> Well, having pg_rewind enforce a checkpoint on the promoted standby
> could cause a performance hit as well if we do it mandatorily as if
> there is delay between the promotion and the rewind triggerring a
> checkpoint could have already happen.  So it is for me a documentation
> bug first regarding the failover workflow, and potentially a patch for a
> new feature which makes pg_rewind trigger directly a checkpoint.

pg_rewind doesn't work correctly. Documenting a workaround doesn't change that.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Pg_rewind cannot load history wal

From

Andres Freund

Date:

04 August 2018, 14:59:45

On 2018-08-04 10:54:22 +0100, Simon Riggs wrote:
> On 4 August 2018 at 07:56, Michael Paquier <michael@paquier.xyz> wrote:
> >> Sounds like we should write a pending timeline change to the control
> >> file and have pg_rewind check that instead.
> >>
> >> I'd call this a timing bug, not a doc issue.
> >
> > Well, having pg_rewind enforce a checkpoint on the promoted standby
> > could cause a performance hit as well if we do it mandatorily as if
> > there is delay between the promotion and the rewind triggerring a
> > checkpoint could have already happen.  So it is for me a documentation
> > bug first regarding the failover workflow, and potentially a patch for a
> > new feature which makes pg_rewind trigger directly a checkpoint.
> 
> pg_rewind doesn't work correctly. Documenting a workaround doesn't change that.

Especially because most people will only understand this after they've
been hit, as test scenarios will often just be quick enough.

Greetings,

Andres Freund

Re: Pg_rewind cannot load history wal

From

Michael Paquier

Date:

04 August 2018, 16:13:12

On Sat, Aug 04, 2018 at 04:59:45AM -0700, Andres Freund wrote:
> On 2018-08-04 10:54:22 +0100, Simon Riggs wrote:
>> pg_rewind doesn't work correctly. Documenting a workaround doesn't change that.
>
> Especially because most people will only understand this after they've
> been hit, as test scenarios will often just be quick enough.

Well, since its creation we have the tool behave this way.  I am not
sure either that we can have pg_rewind create a checkpoint on the source
node each time a rewind is done, as it may not be necessary, and it
would enforce WAL segment recycling more than necessary, so if we were
to back-patch something like that I am pretty much convinced that we
would get complains from people already using the tool, with existing
failover flows which are broken.  Making this stuff to not need a
checkpoint is actually possible.  When the source is offline, the
control file can be relied on as the shutdown checkpoint would update
the on-disk control file.  When the source is online, pg_rewind only
needs to know the new timeline number from the source, which we could
provide via a SQL function, but that would work only on HEAD (look at
ControlFile_source, you would see that only the new TLI matters, and
that getTimelineHistory does not really need to know the contents of the
control file).
--
Michael

Attachment

signature.asc

Re: Pg_rewind cannot load history wal

From

Christophe Pettus

Date:

04 August 2018, 17:54:36

> On Aug 4, 2018, at 06:13, Michael Paquier <michael@paquier.xyz> wrote:
>
> Well, since its creation we have the tool behave this way.  I am not
> sure either that we can have pg_rewind create a checkpoint on the source
> node each time a rewind is done, as it may not be necessary, and it
> would enforce WAL segment recycling more than necessary, so if we were
> to back-patch something like that I am pretty much convinced that we
> would get complains from people already using the tool, with existing
> failover flows which are broken.

Would having pg_rewind do a checkpoint on the source actually cause anything to break, as opposed to a delay while the
checkpointcompletes?  The current situation can create a corrupted target, which seems far worse than just slowing down
pg_rewind.

--
-- Christophe Pettus
   xof@thebuild.com

Re: Pg_rewind cannot load history wal

From

Michael Paquier

Date:

04 August 2018, 23:50:26

On Sat, Aug 04, 2018 at 07:54:36AM -0700, Christophe Pettus wrote:
> Would having pg_rewind do a checkpoint on the source actually cause
> anything to break, as opposed to a delay while the checkpoint
> completes?

Users relying only on streaming without archives would be impacted as
potentially two checkpoints could be used on the promoted standby,
making all past segment needed from the divergence point not to be
around.  That's a problem which exists in v11 as only WAL segments worth
one checkpoint are kept around, not for 9.5, 9.6 and 10.

> The current situation can create a corrupted target, which seems far
> worse than just slowing down pg_rewind.

Hm?  pg_rewind requires the target to be stopped properly, meaning that
the divergence point is known to both nodes.  If the source is online
and has not created the first post-recovery checkpoint, then you would
get a no-op with pg_rewind, and when restarting the old master witha
recovery.conf you would get a failure.  If you stop the old master
so as at next startup it needs crash recovery to recover, then there is
indeed a risk of corrupted instance, but that would be the same problem
even if pg_rewind is used.
--
Michael

Attachment

signature.asc

Re: Pg_rewind cannot load history wal

From

Christophe Pettus

Date:

05 August 2018, 00:01:55

> On Aug 4, 2018, at 13:50, Michael Paquier <michael@paquier.xyz> wrote:
>
> Hm?

The specific situation is if pg_rewind is attached to the target before the forced post-recovery checkpoint completes,
thetarget can be corrupted: 

    https://www.postgresql.org/message-id/ECE3B665-E9DD-43FF-B6A6-734E7435221D@thebuild.com

--
-- Christophe Pettus
   xof@thebuild.com

FW: Pg_rewind cannot load history wal

From

Richard Schmidt

Date:

08 August 2018, 00:50:36

We think we have found our missing step. We needed to do an ordered shutdown of the original primary before promoting the standby

I.e.

>1. Make some changes to the A (the primary) and check that they are replicated to the B (the standby)

Missing step:

Perform ordered shutdown of A (the primary)

>2.Promote B to be the new primary

This know means that we have this simple use-case working.

Re: Pg_rewind cannot load history wal

From

Abhinav Mehta

Date:

08 August 2018, 20:54:45

Yes, consider using Repmgr.

On 08-Aug-2018, at 3:20 AM, Richard Schmidt <Richard.Schmidt@metservice.com> wrote:

We think we have found our missing step. We needed to do an ordered shutdown of the original primary before promoting the standby
I.e.

>1. Make some changes to the A (the primary) and check that they are replicated to the B (the standby)

Missing step:
Perform ordered shutdown of A (the primary)

>2.Promote B to be the new primary

This know means that we have this simple use-case working.

This email and any attachments may contain confidential information. If you are not the intended recipient, your use or communication of the information is strictly prohibited. If you have received this message in error please notify MetService immediately.