BUG #13644: 2xRecovery without new writes, causes archiving failure - Mailing list pgsql-bugs

From amir.rohan@mail.com
Subject BUG #13644: 2xRecovery without new writes, causes archiving failure
Date
Msg-id 20150926032247.3022.98224@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #13644: 2xRecovery without new writes, causes archiving failure  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      13644
Logged by:          Amir Rohan
Email address:      amir.rohan@mail.com
PostgreSQL version: 9.5alpha2
Operating system:   Linux
Description:

Summary: recovering from backup, when the archive directory is shared
between primary/new primary/newnew primary,
each shutting down before the next is started, causes the last primary to
fail during archiving, because it tries
to overwrite existing files.

Walkthrough:
1. create a server, with wal_level=archive and archive_command copying files
to ARCHIVE_DIR
2. load some data
3. create a tar backup with pg_basebackup --format=t -x
4. shut down server with `pg_ctl stop`
5. delete the PGDATA directory, and replace it by untarring the backup file
place a recovery.conf with only "recovery_target_timeline='latest'" and
a restore_command set to copy from ARCHIVE_DIR.
6. use the same postgresql.conf file, archiving to ARCHIVE_DIR again.
7. start the server, recovery completes.
8. a new timeline is created:
# ls -l archive/
-rw-------. 1 postgres postgres 16777216 Sep 26 06:07
/var/lib/pgsql/archive/000000010000000000000001
-rw-------. 1 postgres postgres 16777216 Sep 26 06:07
/var/lib/pgsql/archive/000000010000000000000002
-rw-------. 1 postgres postgres      302 Sep 26 06:07
/var/lib/pgsql/archive/000000010000000000000002.00000028.backup
-rw-------. 1 postgres postgres       41 Sep 26 06:07
/var/lib/pgsql/archive/00000002.history

8. without writing any data, shut down the server and repeat 5-7
9. no new timeline appears in ARCHIVE_DIR:
# ls -l archive/

-rw-------. 1 postgres postgres 16777216 Sep 26 06:10
000000010000000000000001
-rw-------. 1 postgres postgres 16777216 Sep 26 06:10
000000010000000000000002
-rw-------. 1 postgres postgres      302 Sep 26 06:10
000000010000000000000002.00000028.backup
-rw-------. 1 postgres postgres 16777216 Sep 26 06:10
000000020000000000000002
-rw-------. 1 postgres postgres 16777216 Sep 26 06:10
000000020000000000000003
-rw-------. 1 postgres postgres       41 Sep 26 06:10 00000002.history

and the log fills up with:

DETAIL:  The failed archive command was: test ! -f
/var/lib/pgsql/archive/00000002.history && cp pg_xlog/00000002.history
/var/lib/pgsql/archive/00000002.history

I expected the last server to create timeline 3, but it looks as if it
thinks it's timeline2. While in terms of data it's at the same WAL point,
in terms of events, it should be at timeline 3. That's what I make
of it anyway.

I happen to be testing with 9.5, but I have no reason to think this is
particular to it.

Amir

pgsql-bugs by date:

Previous
From: amir.rohan@mail.com
Date:
Subject: BUG #13643: Should a process dying bring postgresql down, or not?
Next
From: Jeff Janes
Date:
Subject: Re: BUG #13642: no backup_label file in PG_DATA after pg_stop_backup();