The following bug has been logged on the website:
Bug reference: 13644
Logged by: Amir Rohan
Email address: amir.rohan@mail.com
PostgreSQL version: 9.5alpha2
Operating system: Linux
Description:
Summary: recovering from backup, when the archive directory is shared
between primary/new primary/newnew primary,
each shutting down before the next is started, causes the last primary to
fail during archiving, because it tries
to overwrite existing files.
Walkthrough:
1. create a server, with wal_level=archive and archive_command copying files
to ARCHIVE_DIR
2. load some data
3. create a tar backup with pg_basebackup --format=t -x
4. shut down server with `pg_ctl stop`
5. delete the PGDATA directory, and replace it by untarring the backup file
place a recovery.conf with only "recovery_target_timeline='latest'" and
a restore_command set to copy from ARCHIVE_DIR.
6. use the same postgresql.conf file, archiving to ARCHIVE_DIR again.
7. start the server, recovery completes.
8. a new timeline is created:
# ls -l archive/
-rw-------. 1 postgres postgres 16777216 Sep 26 06:07
/var/lib/pgsql/archive/000000010000000000000001
-rw-------. 1 postgres postgres 16777216 Sep 26 06:07
/var/lib/pgsql/archive/000000010000000000000002
-rw-------. 1 postgres postgres 302 Sep 26 06:07
/var/lib/pgsql/archive/000000010000000000000002.00000028.backup
-rw-------. 1 postgres postgres 41 Sep 26 06:07
/var/lib/pgsql/archive/00000002.history
8. without writing any data, shut down the server and repeat 5-7
9. no new timeline appears in ARCHIVE_DIR:
# ls -l archive/
-rw-------. 1 postgres postgres 16777216 Sep 26 06:10
000000010000000000000001
-rw-------. 1 postgres postgres 16777216 Sep 26 06:10
000000010000000000000002
-rw-------. 1 postgres postgres 302 Sep 26 06:10
000000010000000000000002.00000028.backup
-rw-------. 1 postgres postgres 16777216 Sep 26 06:10
000000020000000000000002
-rw-------. 1 postgres postgres 16777216 Sep 26 06:10
000000020000000000000003
-rw-------. 1 postgres postgres 41 Sep 26 06:10 00000002.history
and the log fills up with:
DETAIL: The failed archive command was: test ! -f
/var/lib/pgsql/archive/00000002.history && cp pg_xlog/00000002.history
/var/lib/pgsql/archive/00000002.history
I expected the last server to create timeline 3, but it looks as if it
thinks it's timeline2. While in terms of data it's at the same WAL point,
in terms of events, it should be at timeline 3. That's what I make
of it anyway.
I happen to be testing with 9.5, but I have no reason to think this is
particular to it.
Amir