Thread: Trouble using pg_rewind to undo standby promotion
I have a primary -> standby 1 -> standby 2 setup with all servers running PG 13.8 (this effort is part of getting on to a newer version, but I think those details aren't relevant to this problem). The first standby uses streaming replication from the primary and the second standby is using a WAL archive with a restore_command. To make this standby chain work, standby 1 is set to archive_mode = always with a command that populates the WAL archive.
I would like to be able to promote standby 2 (hereon referred to just as 'standby'), perform some writes, then rewind it back to the point before promotion so it can become a standby again. The documentation for pg_rewind says that this is supported and it seems like it should be straightforward, but I'm not having any luck getting this to work so I'm hoping someone can point out what I'm doing wrong. Here's what I did:
First, observe that WAL is properly being applied from the archive. Note that we are currently on timeline 1.
2024-11-06 09:51:23.286 EST [5438] LOG: restored log file "0000000100007711000000F9" from archive
2024-11-06 09:51:23.434 EST [5438] LOG: restored log file "0000000100007711000000FA" from archive
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/0000000100007711000000FB.gz: No such file or directory
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/00000002.history.gz: No such file or directory
2024-11-06 09:51:23.286 EST [5438] LOG: restored log file "0000000100007711000000F9" from archive
2024-11-06 09:51:23.434 EST [5438] LOG: restored log file "0000000100007711000000FA" from archive
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/0000000100007711000000FB.gz: No such file or directory
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/00000002.history.gz: No such file or directory
Next, stop postgres, set wal_log_hints = on as required by pg_rewind, and restart postgres. I also make a copy of the data directory while the postgres is not running so I can repeat my test, which works fine on a small test database but won't be possible for the multi TB database that I will eventually be doing this on.
Now promote the standby using "select pg_promote()" and see that it switches to a new timeline. You can also see that the last WAL applied from the archive is 7718/BF.
2024-11-06 12:10:10.831 EST [4336] LOG: restored log file "0000000100007718000000BD" from archive
2024-11-06 12:10:10.996 EST [4336] LOG: restored log file "0000000100007718000000BE" from archive
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/0000000100007718000000BF.gz: No such file or directory
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/00000002.history.gz: No such file or directory
2024-11-06 12:10:15.384 EST [4336] LOG: restored log file "0000000100007718000000BF" from archive
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/0000000100007718000000C0.gz: No such file or directory
2024-11-06 12:10:15.457 EST [4336] LOG: received promote request
2024-11-06 12:10:15.457 EST [4336] LOG: redo done at 7718/BFFFFF30
2024-11-06 12:10:15.457 EST [4336] LOG: last completed transaction was at log time 2024-11-06 12:10:22.627074-05
2024-11-06 12:10:15.593 EST [4336] LOG: restored log file "0000000100007718000000BF" from archive
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/00000002.history.gz: No such file or directory
2024-11-06 12:10:15.611 EST [4336] LOG: selected new timeline ID: 2
2024-11-06 12:10:15.640 EST [4336] LOG: archive recovery complete
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/00000001.history.gz: No such file or directory
2024-11-06 12:10:17.028 EST [4329] LOG: database system is ready to accept connections
2024-11-06 12:10:10.996 EST [4336] LOG: restored log file "0000000100007718000000BE" from archive
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/0000000100007718000000BF.gz: No such file or directory
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/00000002.history.gz: No such file or directory
2024-11-06 12:10:15.384 EST [4336] LOG: restored log file "0000000100007718000000BF" from archive
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/0000000100007718000000C0.gz: No such file or directory
2024-11-06 12:10:15.457 EST [4336] LOG: received promote request
2024-11-06 12:10:15.457 EST [4336] LOG: redo done at 7718/BFFFFF30
2024-11-06 12:10:15.457 EST [4336] LOG: last completed transaction was at log time 2024-11-06 12:10:22.627074-05
2024-11-06 12:10:15.593 EST [4336] LOG: restored log file "0000000100007718000000BF" from archive
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/00000002.history.gz: No such file or directory
2024-11-06 12:10:15.611 EST [4336] LOG: selected new timeline ID: 2
2024-11-06 12:10:15.640 EST [4336] LOG: archive recovery complete
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/00000001.history.gz: No such file or directory
2024-11-06 12:10:17.028 EST [4329] LOG: database system is ready to accept connections
Next, insert a record into just to make some changes that I want to rollback later. (What I will eventually be doing is creating a publication to ship data to a newer version, but again, that's not what's important here.)
Finally, shutdown postgres and attempt a rewind. The address used in the --source-server connection string is the address of the primary.
2024-11-06 12:11:11.139 EST [4329] LOG: received fast shutdown request
2024-11-06 12:11:11.143 EST [4329] LOG: aborting any active transactions
2024-11-06 12:11:11.144 EST [4329] LOG: background worker "logical replication launcher" (PID 5923) exited with exit code 1
2024-11-06 12:11:40.933 EST [4342] LOG: shutting down
2024-11-06 12:11:41.753 EST [4329] LOG: database system is shut down
2024-11-06 12:11:11.143 EST [4329] LOG: aborting any active transactions
2024-11-06 12:11:11.144 EST [4329] LOG: background worker "logical replication launcher" (PID 5923) exited with exit code 1
2024-11-06 12:11:40.933 EST [4342] LOG: shutting down
2024-11-06 12:11:41.753 EST [4329] LOG: database system is shut down
/usr/pgsql-13/bin/pg_rewind --target-pgdata=/data/pgsql/operational --source-server="host=x.x.x.x dbname=postgres user=xxx password=xxx" --dry-run --progress --restore-target-wal
pg_rewind: connected to server
pg_rewind: servers diverged at WAL location 7718/BFFFFFE8 on timeline 1
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/0000000200007718000000BF.gz: No such file or directory
pg_rewind: error: could not restore file "0000000200007718000000BF" from archive
pg_rewind: fatal: could not find previous WAL record at 7718/BFFFFFE8
pg_rewind: servers diverged at WAL location 7718/BFFFFFE8 on timeline 1
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/0000000200007718000000BF.gz: No such file or directory
pg_rewind: error: could not restore file "0000000200007718000000BF" from archive
pg_rewind: fatal: could not find previous WAL record at 7718/BFFFFFE8
pg_rewind shows the point of divergence as 7718/BF on timeline 1, but when it tries to replay WAL using the restore command it is trying to find WAL from timeline 2 rather than picking back up on timeline 1. I tried setting recovery_target_timeline on the target database to 'current' and '1' but that gave the same result. Searching the archives, [1] mentions the need to force a checkpoint after promotion which I tried even though the problem description isn't the same. [2] mentions a problem that looks more like the one I am facing but has no responses. At this point I don't know what to do next and hope someone can point me in the right direction.
[1] https://www.postgresql.org/message-id/e7b16ddea93a92575cb6d143b6ef602cab22432e.camel%40cybertec.at
Craig
Your point of divergence is in the middle of the 7718/000000BF file. So, you should have 2 such files eventually, one on timeline 1 and the other on timeline 2.
Are you archiving WAL on the promoted machine in a way that your restore_command can find it? Check archive_command and archive_mode on the promoted machine.
Also, do your archive/restore scripts work properly for history files?
On Wed, Nov 6, 2024 at 7:48 PM Craig McIlwee <craigm@vt.edu> wrote:
I have a primary -> standby 1 -> standby 2 setup with all servers running PG 13.8 (this effort is part of getting on to a newer version, but I think those details aren't relevant to this problem). The first standby uses streaming replication from the primary and the second standby is using a WAL archive with a restore_command. To make this standby chain work, standby 1 is set to archive_mode = always with a command that populates the WAL archive.I would like to be able to promote standby 2 (hereon referred to just as 'standby'), perform some writes, then rewind it back to the point before promotion so it can become a standby again. The documentation for pg_rewind says that this is supported and it seems like it should be straightforward, but I'm not having any luck getting this to work so I'm hoping someone can point out what I'm doing wrong. Here's what I did:First, observe that WAL is properly being applied from the archive. Note that we are currently on timeline 1.
2024-11-06 09:51:23.286 EST [5438] LOG: restored log file "0000000100007711000000F9" from archive
2024-11-06 09:51:23.434 EST [5438] LOG: restored log file "0000000100007711000000FA" from archive
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/0000000100007711000000FB.gz: No such file or directory
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/00000002.history.gz: No such file or directoryNext, stop postgres, set wal_log_hints = on as required by pg_rewind, and restart postgres. I also make a copy of the data directory while the postgres is not running so I can repeat my test, which works fine on a small test database but won't be possible for the multi TB database that I will eventually be doing this on.Now promote the standby using "select pg_promote()" and see that it switches to a new timeline. You can also see that the last WAL applied from the archive is 7718/BF.2024-11-06 12:10:10.831 EST [4336] LOG: restored log file "0000000100007718000000BD" from archive
2024-11-06 12:10:10.996 EST [4336] LOG: restored log file "0000000100007718000000BE" from archive
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/0000000100007718000000BF.gz: No such file or directory
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/00000002.history.gz: No such file or directory
2024-11-06 12:10:15.384 EST [4336] LOG: restored log file "0000000100007718000000BF" from archive
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/0000000100007718000000C0.gz: No such file or directory
2024-11-06 12:10:15.457 EST [4336] LOG: received promote request
2024-11-06 12:10:15.457 EST [4336] LOG: redo done at 7718/BFFFFF30
2024-11-06 12:10:15.457 EST [4336] LOG: last completed transaction was at log time 2024-11-06 12:10:22.627074-05
2024-11-06 12:10:15.593 EST [4336] LOG: restored log file "0000000100007718000000BF" from archive
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/00000002.history.gz: No such file or directory
2024-11-06 12:10:15.611 EST [4336] LOG: selected new timeline ID: 2
2024-11-06 12:10:15.640 EST [4336] LOG: archive recovery complete
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/00000001.history.gz: No such file or directory
2024-11-06 12:10:17.028 EST [4329] LOG: database system is ready to accept connectionsNext, insert a record into just to make some changes that I want to rollback later. (What I will eventually be doing is creating a publication to ship data to a newer version, but again, that's not what's important here.)Finally, shutdown postgres and attempt a rewind. The address used in the --source-server connection string is the address of the primary.2024-11-06 12:11:11.139 EST [4329] LOG: received fast shutdown request
2024-11-06 12:11:11.143 EST [4329] LOG: aborting any active transactions
2024-11-06 12:11:11.144 EST [4329] LOG: background worker "logical replication launcher" (PID 5923) exited with exit code 1
2024-11-06 12:11:40.933 EST [4342] LOG: shutting down
2024-11-06 12:11:41.753 EST [4329] LOG: database system is shut down/usr/pgsql-13/bin/pg_rewind --target-pgdata=/data/pgsql/operational --source-server="host=x.x.x.x dbname=postgres user=xxx password=xxx" --dry-run --progress --restore-target-walpg_rewind: connected to server
pg_rewind: servers diverged at WAL location 7718/BFFFFFE8 on timeline 1
/data/wal_archive/restore_operational.sh: line 2: /data/wal_archive/operational/0000000200007718000000BF.gz: No such file or directory
pg_rewind: error: could not restore file "0000000200007718000000BF" from archive
pg_rewind: fatal: could not find previous WAL record at 7718/BFFFFFE8pg_rewind shows the point of divergence as 7718/BF on timeline 1, but when it tries to replay WAL using the restore command it is trying to find WAL from timeline 2 rather than picking back up on timeline 1. I tried setting recovery_target_timeline on the target database to 'current' and '1' but that gave the same result. Searching the archives, [1] mentions the need to force a checkpoint after promotion which I tried even though the problem description isn't the same. [2] mentions a problem that looks more like the one I am facing but has no responses. At this point I don't know what to do next and hope someone can point me in the right direction.[1] https://www.postgresql.org/message-id/e7b16ddea93a92575cb6d143b6ef602cab22432e.camel%40cybertec.atCraig
On Thu, Nov 7, 2024 at 4:47 AM Torsten Förtsch <tfoertsch123@gmail.com> wrote:
Your point of divergence is in the middle of the 7718/000000BF file. So, you should have 2 such files eventually, one on timeline 1 and the other on timeline 2.Are you archiving WAL on the promoted machine in a way that your restore_command can find it? Check archive_command and archive_mode on the promoted machine.
No, the promoted machine is not archiving. How should that work? Is it OK for a log shipping standby that uses restore_command to also push to the same directory with an archive_command or would that cause issues of trying to read and write the same file simultaneously during WAL replay? Or should I be setting up an archive_command that pushes to a separate directory and have a restore_command that knows to check both locations?
Hmm, as I write that out, I realize that I could use archive_mode = on instead of archive_mode = always to avoid the potential for read/write conflicts during WAL replay. I can try this later and report back.
Also, do your archive/restore scripts work properly for history files?
The scripts don't do anything special with history files. They are based on the continuous archive docs [1] and this [2] article the with slight modification to include a throttled scp since the log shipping server is located in a different data center from the promoted standby and there is limited bandwidth between the two. (Also note that the archive script from [2] is adapted to properly handle file transfer failures - the one in the article will use the exit code of the rm command so postgres won't be informed the file transfer fails resulting in missing WAL in the archive.)
Archive script:
---
#!/bin/bash
# $1 = %p
# $2 = %f
limit=10240 # 10Mbps
gzip < /var/lib/pgsql/13/data/$1 > /tmp/archive/$2.gz
scp -l $limit /tmp/archive/$2.gz postgres@x.x.x.x:/data/wal_archive/operational/$2.gz
exit_code=$?
rm /tmp/archive/$2.gz
exit $exit_code
# $1 = %p
# $2 = %f
limit=10240 # 10Mbps
gzip < /var/lib/pgsql/13/data/$1 > /tmp/archive/$2.gz
scp -l $limit /tmp/archive/$2.gz postgres@x.x.x.x:/data/wal_archive/operational/$2.gz
exit_code=$?
rm /tmp/archive/$2.gz
exit $exit_code
---
Restore script:
---
gunzip < /data/wal_archive/operational/$2.gz > $1
---
Craig
Are you archiving WAL on the promoted machine in a way that your restore_command can find it? Check archive_command and archive_mode on the promoted machine.No, the promoted machine is not archiving. How should that work? Is it OK for a log shipping standby that uses restore_command to also push to the same directory with an archive_command or would that cause issues of trying to read and write the same file simultaneously during WAL replay? Or should I be setting up an archive_command that pushes to a separate directory and have a restore_command that knows to check both locations?Hmm, as I write that out, I realize that I could use archive_mode = on instead of archive_mode = always to avoid the potential for read/write conflicts during WAL replay. I can try this later and report back.
Setting archive_mode = on and a restore_command that reads from the WAL archive did the trick. With those changes in place, I was able to successfully run pg_rewind and get the promoted standby back onto timeline 1. Thanks for the tips.
Craig