The following bug has been logged on the website:
Bug reference: 15638
Logged by: Andre Piwoni
Email address: apiwoni@webmd.net
PostgreSQL version: 10.5
Operating system: CentOS 7.3
Description:
When new slave is created by taking base backup from the primary using
pg_basebackup with --wal-method=stream option the WAL file generated during
the backup is different (as compared with diff or cmp command) than that on
the master and in WAL archive directory. Furthermore, this file does not
exist in pg_wal/archive_status with .done extension on new slave, though it
exists in pg_wal directory, resulting in failed attempt to archive this file
when slave node is promoted as master node.
2019-02-15 14:15:58.872 PST [5369] DETAIL: The failed archive command was:
test ! -f /mnt/pgsql/archive/000000010000000000000002 && cp
pg_wal/000000010000000000000002
/mnt/pgsql/archive/000000010000000000000002
pg_basebackup with --wal-method=fetch option correctly creates this WAL
segment on slave in pg_wal/archive_status with .done extension and in pg_wal
directory and its contents are equivalent to those on the old primary and in
WAL archive directory.
Here are contents of old_primary, new_primary and WAL archive:
OLD PRIMARY
/var/lib/pgsql/10/data/pg_wal/:
-rw-------. 1 postgres postgres 16777216 Feb 15 13:39
000000010000000000000001
-rw-------. 1 postgres postgres 16777216 Feb 15 13:40
000000010000000000000002
-rw-------. 1 postgres postgres 302 Feb 15 13:40
000000010000000000000002.00000028.backup
-rw-------. 1 postgres postgres 16777216 Feb 15 13:40
000000010000000000000003
drwx------. 2 postgres postgres 133 Feb 15 13:40 archive_status
/var/lib/pgsql/10/data/pg_wal/archive_status:
-rw-------. 1 postgres postgres 0 Feb 15 13:39
000000010000000000000001.done
-rw-------. 1 postgres postgres 0 Feb 15 13:40
000000010000000000000002.00000028.backup.done
-rw-------. 1 postgres postgres 0 Feb 15 13:40
000000010000000000000002.done
WAL ARCHIVE
-rw-rw-rw-. 1 root root 16777216 Feb 15 13:39 000000010000000000000001
-rw-rw-rw-. 1 root root 16777216 Feb 15 13:40 000000010000000000000002
-rw-rw-rw-. 1 root root 302 Feb 15 13:40
000000010000000000000002.00000028.backup
NEW PRIMARY (created with --wal-method=stream)
/var/lib/pgsql/10/data/pg_wal/:
-rw-------. 1 postgres root 16777216 Feb 15 13:40
000000010000000000000002
-rw-------. 1 postgres postgres 16777216 Feb 15 13:40
000000010000000000000003
drwx------. 2 postgres root 6 Feb 15 13:40 archive_status
/var/lib/pgsql/10/data/pg_wal/archive_status:
total 0
Please note lack of 000000010000000000000002.done file in
/var/lib/pgsql/10/data/pg_wal/archive_status.
When using backup with --wal-method=fetch this file is created in
/var/lib/pgsql/10/data/pg_wal/archive_status and NEW_PRIMARY won't try to
archive it, which would fail because it has been archived when NEW_PRIMARY
was created as slave from OLD_PRIMARY.
Consequence of all this is that failover does not work correctly.