On Wed, 2025-05-07 at 12:51 +0200, Luca Ferrari wrote:
> running 17.4 on ubuntu 24.04 machines. I've three hosts, pg-1
> (primary) and two physical replicas.
> I then promote host pg-3 as a master (pg_promote()) and want to rewind
> the pg-1 to follow the new master, so:
>
> ssh pg-3 'sudo -u postgres /usr/lib/postgresql/17/bin/pg_rewind -D
> /var/lib/postgresql/17/main --source-server="user=replica_fluca
> host=pg-3 dbname=replica_fluca"'
> pg_rewind: servers diverged at WAL location 0/B8550F8 on timeline 1
> pg_rewind: error: could not open file
> "/var/lib/postgresql/17/main/pg_wal/00000001000000000000000A": No such
> file or directory
> pg_rewind: error: could not find previous WAL record at 0/AFFF4E8
>
> But the file 0x010000A is not there:
>
>
> % ssh pg-3 'sudo ls /var/lib/postgresql/17/main/pg_wal'
> 00000001000000000000000B.partial
> 00000002.history
> 00000002000000000000000B
> 00000002000000000000000C
> 00000002000000000000000D
> 00000002000000000000000E
> archive_status
> summaries
>
> % ssh pg-1 'sudo ls /var/lib/postgresql/17/main/pg_wal'
> 000000010000000000000005.00000028.backup
> 00000001000000000000000B
> 00000001000000000000000C
> 00000001000000000000000D
> 00000001000000000000000E
> archive_status
> summaries
>
> Do i have to ensure the old primary pg-1 does a wal switch before
> promoting the other one and try to rewind?
I don't think it is connected to a WAL switch.
I'd say that you should set "wal_keep_size" high enough that all the WAL
needed for pg_rewind is still present.
If you have a WAL archive, you could define a restore_command on the server
you want to rewind.
Yours,
Laurenz Albe