Hello,
We have 2 postgresql servers (v 9.4.2) master and slave in streaming
replication. The overall cluster is controlled using pacemaker &
corosync and the pgsql cluster agent which handles failover to, and
promotion of, the slave.
Recently a failover occured and I noticed that log archiving was failing
on the master:
cp: cannot stat 'pg_xlog/000000020000000000000002': No such file or
directory
2016-06-30 11:49:48 BST [13816]: [1235-1] db=,user=,client= LOG: archive
command failed with exit code 1
2016-06-30 11:49:48 BST [13816]: [1236-1] db=,user=,client= DETAIL: The
failed archive command was: cp pg_xlog/000000020000000000000002
/mnt/pgsql/data/pg_archive/000000020000000000000002
cp: cannot stat 'pg_xlog/000000020000000000000002': No such file or
directory
2016-06-30 11:49:49 BST [13816]: [1237-1] db=,user=,client= LOG: archive
command failed with exit code 1
2016-06-30 11:49:49 BST [13816]: [1238-1] db=,user=,client= DETAIL: The
failed archive command was: cp pg_xlog/000000020000000000000002
/mnt/pgsql/data/pg_archive/000000020000000000000002
2016-06-30 11:49:49 BST [13816]: [1239-1] db=,user=,client= WARNING:
archiving transaction log file "000000020000000000000002" failed too
many times, will try again later
But the timeline we're on is different:
# /usr/lib/postgresql/9.4/bin/pg_controldata /mnt/pgsql/data
pg_control version number: 942
Catalog version number: 201409291
Database system identifier: 6198394727571912088
Database cluster state: in production
pg_control last modified: Thu 30 Jun 2016 11:42:42 BST
Latest checkpoint location: 2/EEE842E8
Prior checkpoint location: 2/EED64F68
Latest checkpoint's REDO location: 2/EEE4B610
Latest checkpoint's REDO WAL file: 0000002C00000002000000EE
Latest checkpoint's TimeLineID: 44
Latest checkpoint's PrevTimeLineID: 44
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0/2947680
Latest checkpoint's NextOID: 74375
Latest checkpoint's NextMultiXactId: 464
Latest checkpoint's NextMultiOffset: 929
Latest checkpoint's oldestXID: 677
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 2947680
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint: Thu 30 Jun 2016 11:42:27 BST
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current wal_log_hints setting: off
Current max_connections setting: 250
Current max_worker_processes setting: 8
Current max_prepared_xacts setting: 10
Current max_locks_per_xact setting: 64
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
Why are we trying to archive logs which belong to an old timeline?
Any thoughts much appreciated.
Regards
Chris