Re: Trouble using pg_rewind to undo standby promotion - Mailing list pgsql-general

From Craig McIlwee
Subject Re: Trouble using pg_rewind to undo standby promotion
Date
Msg-id CAGqBcTZKSYTuVmf6ppR=GKYPtgKKOp6DASaP6YZYUAks49EHoQ@mail.gmail.com
Whole thread Raw
In response to Re: Trouble using pg_rewind to undo standby promotion  (Torsten Förtsch <tfoertsch123@gmail.com>)
Responses Re: Trouble using pg_rewind to undo standby promotion
List pgsql-general
On Thu, Nov 7, 2024 at 4:47 AM Torsten Förtsch <tfoertsch123@gmail.com> wrote:
Your point of divergence is in the middle of the 7718/000000BF file. So, you should have 2 such files eventually, one on timeline 1 and the other on timeline 2.

Are you archiving WAL on the promoted machine in a way that your restore_command can find it? Check archive_command and archive_mode on the promoted machine.

No, the promoted machine is not archiving.  How should that work?  Is it OK for a log shipping standby that uses restore_command to also push to the same directory with an archive_command or would that cause issues of trying to read and write the same file simultaneously during WAL replay?  Or should I be setting up an archive_command that pushes to a separate directory and have a restore_command that knows to check both locations?

Hmm, as I write that out, I realize that I could use archive_mode = on instead of archive_mode = always to avoid the potential for read/write conflicts during WAL replay.  I can try this later and report back.

Also, do your archive/restore scripts work properly for history files?

The scripts don't do anything special with history files.  They are based on the continuous archive docs [1] and this [2] article the with slight modification to include a throttled scp since the log shipping server is located in a different data center from the promoted standby and there is limited bandwidth between the two.  (Also note that the archive script from [2] is adapted to properly handle file transfer failures - the one in the article will use the exit code of the rm command so postgres won't be informed the file transfer fails resulting in missing WAL in the archive.)

Archive script:
---
#!/bin/bash

# $1 = %p
# $2 = %f

limit=10240 # 10Mbps

gzip < /var/lib/pgsql/13/data/$1 > /tmp/archive/$2.gz

scp -l $limit /tmp/archive/$2.gz postgres@x.x.x.x:/data/wal_archive/operational/$2.gz
exit_code=$?

rm /tmp/archive/$2.gz

exit $exit_code
---

Restore script:
---
gunzip < /data/wal_archive/operational/$2.gz > $1
---


Craig

pgsql-general by date:

Previous
From: Torsten Förtsch
Date:
Subject: Re: Trouble using pg_rewind to undo standby promotion
Next
From: Adrian Klaver
Date:
Subject: Re: About the stability of COPY BINARY data