Thread: BUG #8294: new timeline 6 forked off current dat abase system timeline 5 before current recovery point 0/100000
BUG #8294: new timeline 6 forked off current dat abase system timeline 5 before current recovery point 0/100000
From
digoal@126.com
Date:
The following bug has been logged on the website: Bug reference: 8294 Logged by: digoal.zhou Email address: digoal@126.com PostgreSQL version: Unsupported/Unknown Operating system: CentOS 5.8 x64 Description: PostgreSQL 9.3 beta2 stream replication primary and standby cann't switchover. Primary : psql checkpont; pg_controldata pg_control version number: 937 Catalog version number: 201306121 Database system identifier: 5899156703623933927 Database cluster state: in production pg_control last modified: Thu 11 Jul 2013 11:05:38 AM CST Latest checkpoint location: 0/C000060 Prior checkpoint location: 0/B000060 Latest checkpoint's REDO location: 0/C000028 Latest checkpoint's REDO WAL file: 00000004000000000000000C Latest checkpoint's TimeLineID: 4 Latest checkpoint's PrevTimeLineID: 4 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/1850 Latest checkpoint's NextOID: 32768 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 1674 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 1850 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Time of latest checkpoint: Thu 11 Jul 2013 11:05:38 AM CST Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no Current wal_level setting: hot_standby Current max_connections setting: 10000 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 16384 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 Standby : pg_control version number: 937 Catalog version number: 201306121 Database system identifier: 5899156703623933927 Database cluster state: in archive recovery pg_control last modified: Thu 11 Jul 2013 11:05:29 AM CST Latest checkpoint location: 0/B000060 Prior checkpoint location: 0/B000060 Latest checkpoint's REDO location: 0/B000028 Latest checkpoint's REDO WAL file: 00000004000000000000000B Latest checkpoint's TimeLineID: 4 Latest checkpoint's PrevTimeLineID: 4 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/1850 Latest checkpoint's NextOID: 32768 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 1674 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 1850 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Time of latest checkpoint: Thu 11 Jul 2013 11:04:57 AM CST Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/B0000F0 Min recovery ending loc's timeline: 4 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no Current wal_level setting: hot_standby Current max_connections setting: 10000 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 16384 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 Primary : pg_ctl stop waiting for server to shut down.... done server stopped pg_controldata pg_control version number: 937 Catalog version number: 201306121 Database system identifier: 5899156703623933927 Database cluster state: shut down pg_control last modified: Thu 11 Jul 2013 11:06:31 AM CST Latest checkpoint location: 0/D000028 Prior checkpoint location: 0/C000060 Latest checkpoint's REDO location: 0/D000028 Latest checkpoint's REDO WAL file: 00000004000000000000000D Latest checkpoint's TimeLineID: 4 Latest checkpoint's PrevTimeLineID: 4 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/1850 Latest checkpoint's NextOID: 32768 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 1674 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 0 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Time of latest checkpoint: Thu 11 Jul 2013 11:06:31 AM CST Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no Current wal_level setting: hot_standby Current max_connections setting: 10000 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 16384 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 standby : pg_controldata pg_control version number: 937 Catalog version number: 201306121 Database system identifier: 5899156703623933927 Database cluster state: in archive recovery pg_control last modified: Thu 11 Jul 2013 11:05:29 AM CST Latest checkpoint location: 0/B000060 Prior checkpoint location: 0/B000060 Latest checkpoint's REDO location: 0/B000028 Latest checkpoint's REDO WAL file: 00000004000000000000000B Latest checkpoint's TimeLineID: 4 Latest checkpoint's PrevTimeLineID: 4 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/1850 Latest checkpoint's NextOID: 32768 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 1674 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 1850 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Time of latest checkpoint: Thu 11 Jul 2013 11:04:57 AM CST Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/B0000F0 Min recovery ending loc's timeline: 4 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no Current wal_level setting: hot_standby Current max_connections setting: 10000 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 16384 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 pg_ctl promote server promoting pg_controldata pg_control version number: 937 Catalog version number: 201306121 Database system identifier: 5899156703623933927 Database cluster state: in production pg_control last modified: Thu 11 Jul 2013 11:07:37 AM CST Latest checkpoint location: 0/D000090 Prior checkpoint location: 0/B000060 Latest checkpoint's REDO location: 0/D000058 Latest checkpoint's REDO WAL file: 00000005000000000000000D Latest checkpoint's TimeLineID: 5 Latest checkpoint's PrevTimeLineID: 5 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/1851 Latest checkpoint's NextOID: 32768 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 1674 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 1851 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Time of latest checkpoint: Thu 11 Jul 2013 11:07:37 AM CST Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no Current wal_level setting: hot_standby Current max_connections setting: 10000 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 16384 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 primary : cd $PGDATA mv recovery.done recovery.conf pg_ctl start log : 2013-07-11 11:10:11.386 CST,,,14911,,51de2213.3a3f,1,,2013-07-11 11:10:11 CST,,0,LOG,00000,"database system was shut down in recovery at 2013-07-11 11:09:51 CST",,,,,,,,"StartupXLOG, xlog.c:4895","" 2013-07-11 11:10:11.387 CST,,,14911,,51de2213.3a3f,2,,2013-07-11 11:10:11 CST,,0,LOG,00000,"entering standby mode",,,,,,,,"StartupXLOG, xlog.c:4968","" 2013-07-11 11:10:11.391 CST,,,14911,,51de2213.3a3f,3,,2013-07-11 11:10:11 CST,1/0,0,LOG,00000,"consistent recovery state reached at 0/D000090",,,,,,,,"CheckRecoveryConsistency, xlog.c:6187","" 2013-07-11 11:10:11.391 CST,,,14911,,51de2213.3a3f,4,,2013-07-11 11:10:11 CST,1/0,0,LOG,00000,"record with zero length at 0/D000090",,,,,,,,"ReadRecord, xlog.c:3285","" 2013-07-11 11:10:11.392 CST,,,14909,,51de2213.3a3d,1,,2013-07-11 11:10:11 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,"sigusr1_handler, postmaster.c:4658","" 2013-07-11 11:10:11.407 CST,,,14915,,51de2213.3a43,1,,2013-07-11 11:10:11 CST,,0,LOG,00000,"fetching timeline history file for timeline 5 from primary server",,,,,,,,"WalRcvFetchTimeLineHistoryFiles, walreceiver.c:666","" 2013-07-11 11:10:11.411 CST,,,14915,,51de2213.3a43,2,,2013-07-11 11:10:11 CST,,0,LOG,00000,"primary server contains no more WAL on requested timeline 4",,,,,,,,"WalReceiverMain, walreceiver.c:529","" 2013-07-11 11:10:11.411 CST,,,14911,,51de2213.3a3f,5,,2013-07-11 11:10:11 CST,1/0,0,LOG,00000,"new timeline 5 forked off current database system timeline 4 before current recovery point 0/D000090",,,,,,,,"rescanLatestTimeLine, xlog.c:3441","" 2013-07-11 11:10:11.412 CST,,,14915,,51de2213.3a43,3,,2013-07-11 11:10:11 CST,,0,LOG,00000,"primary server contains no more WAL on requested timeline 4",,,,,,,,"WalReceiverMain, walreceiver.c:529","" 2013-07-11 11:10:11.412 CST,,,14911,,51de2213.3a3f,6,,2013-07-11 11:10:11 CST,1/0,0,LOG,00000,"new timeline 5 forked off current database system timeline 4 before current recovery point 0/D000090",,,,,,,,"rescanLatestTimeLine, xlog.c:3441","" 2013-07-11 11:10:16.407 CST,,,14915,,51de2213.3a43,4,,2013-07-11 11:10:11 CST,,0,LOG,00000,"primary server contains no more WAL on requested timeline 4",,,,,,,,"WalReceiverMain, walreceiver.c:529","" 2013-07-11 11:10:16.407 CST,,,14911,,51de2213.3a3f,7,,2013-07-11 11:10:11 CST,1/0,0,LOG,00000,"new timeline 5 forked off current database system timeline 4 before current recovery point 0/D000090",,,,,,,,"rescanLatestTimeLine, xlog.c:3441","" 2013-07-11 11:10:21.402 CST,,,14915,,51de2213.3a43,5,,2013-07-11 11:10:11 CST,,0,LOG,00000,"primary server contains no more WAL on requested timeline 4",,,,,,,,"WalReceiverMain, walreceiver.c:529","" 2013-07-11 11:10:21.402 CST,,,14911,,51de2213.3a3f,8,,2013-07-11 11:10:11 CST,1/0,0,LOG,00000,"new timeline 5 forked off current database system timeline 4 before current recovery point 0/D000090",,,,,,,,"rescanLatestTimeLine, xlog.c:3441","" 2013-07-11 11:10:26.398 CST,,,14915,,51de2213.3a43,6,,2013-07-11 11:10:11 CST,,0,LOG,00000,"primary server contains no more WAL on requested timeline 4",,,,,,,,"WalReceiverMain, walreceiver.c:529",""
Re: BUG #8294: new timeline 6 forked off current dat abase system timeline 5 before current recovery point 0/100000
From
Heikki Linnakangas
Date:
(Quoted pg_controldata output edited to highlight the important parts) On 11.07.2013 06:26, digoal@126.com wrote: > PostgreSQL 9.3 beta2 stream replication primary and standby cann't > switchover. > ... Primary : > psql > checkpont; > pg_controldata > ... > Database cluster state: in production > Latest checkpoint location: 0/C000060 > Prior checkpoint location: 0/B000060 > Latest checkpoint's REDO location: 0/C000028 > Latest checkpoint's TimeLineID: 4 > Latest checkpoint's PrevTimeLineID: 4 > Minimum recovery ending location: 0/0 > Min recovery ending loc's timeline: 0 > > Standby : > Database cluster state: in archive recovery > Latest checkpoint location: 0/B000060 > Prior checkpoint location: 0/B000060 > Latest checkpoint's REDO location: 0/B000028 > Latest checkpoint's TimeLineID: 4 > Latest checkpoint's PrevTimeLineID: 4 > Minimum recovery ending location: 0/B0000F0 > Min recovery ending loc's timeline: 4 So at this point you have a primary, and a standby server following the primary through streaming replication. > Primary : > pg_ctl stop > waiting for server to shut down.... done > server stopped > pg_controldata > Database cluster state: shut down > Latest checkpoint location: 0/D000028 > Prior checkpoint location: 0/C000060 > Latest checkpoint's REDO location: 0/D000028 > Latest checkpoint's TimeLineID: 4 > Latest checkpoint's PrevTimeLineID: 4 > Minimum recovery ending location: 0/0 > Min recovery ending loc's timeline: 0 > > standby : > pg_controldata > Database cluster state: in archive recovery > Latest checkpoint location: 0/B000060 > Prior checkpoint location: 0/B000060 > Latest checkpoint's REDO location: 0/B000028 > Latest checkpoint's TimeLineID: 4 > Latest checkpoint's PrevTimeLineID: 4 > Minimum recovery ending location: 0/B0000F0 > Min recovery ending loc's timeline: 4 > > pg_ctl promote > server promoting > > pg_controldata > Database cluster state: in production > Latest checkpoint location: 0/D000090 > Prior checkpoint location: 0/B000060 > Latest checkpoint's REDO location: 0/D000058 > Latest checkpoint's TimeLineID: 5 > Latest checkpoint's PrevTimeLineID: 5 The primary is shut down, and the standby is promoted to become new master. > primary : > cd $PGDATA > mv recovery.done recovery.conf > pg_ctl start > log : > 2013-07-11 11:10:11.386 CST,,,14911,,51de2213.3a3f,1,,2013-07-11 11:10:11 > CST,,0,LOG,00000,"database system was shut down in recovery at 2013-07-11 > 11:09:51 CST",,,,,,,,"StartupXLOG, xlog.c:4895","" > 2013-07-11 11:10:11.387 CST,,,14911,,51de2213.3a3f,2,,2013-07-11 11:10:11 > CST,,0,LOG,00000,"entering standby mode",,,,,,,,"StartupXLOG, > xlog.c:4968","" > 2013-07-11 11:10:11.391 CST,,,14911,,51de2213.3a3f,3,,2013-07-11 11:10:11 > CST,1/0,0,LOG,00000,"consistent recovery state reached at > 0/D000090",,,,,,,,"CheckRecoveryConsistency, xlog.c:6187","" > 2013-07-11 11:10:11.391 CST,,,14911,,51de2213.3a3f,4,,2013-07-11 11:10:11 > CST,1/0,0,LOG,00000,"record with zero length at > 0/D000090",,,,,,,,"ReadRecord, xlog.c:3285","" > 2013-07-11 11:10:11.392 CST,,,14909,,51de2213.3a3d,1,,2013-07-11 11:10:11 > CST,,0,LOG,00000,"database system is ready to accept read only > connections",,,,,,,,"sigusr1_handler, postmaster.c:4658","" > 2013-07-11 11:10:11.407 CST,,,14915,,51de2213.3a43,1,,2013-07-11 11:10:11 > CST,,0,LOG,00000,"fetching timeline history file for timeline 5 from primary > server",,,,,,,,"WalRcvFetchTimeLineHistoryFiles, walreceiver.c:666","" > 2013-07-11 11:10:11.411 CST,,,14915,,51de2213.3a43,2,,2013-07-11 11:10:11 > CST,,0,LOG,00000,"primary server contains no more WAL on requested timeline > 4",,,,,,,,"WalReceiverMain, walreceiver.c:529","" > 2013-07-11 11:10:11.411 CST,,,14911,,51de2213.3a3f,5,,2013-07-11 11:10:11 > CST,1/0,0,LOG,00000,"new timeline 5 forked off current database system > timeline 4 before current recovery point > 0/D000090",,,,,,,,"rescanLatestTimeLine, xlog.c:3441","" The old primary is restarted in standby-mode. It tries to connect to the new primary, but it refuses to follow it because there is some WAL applied on the old primary that was not replicated to the new primary before the switchover. I believe this is the same issue that Fujii reported in June: http://www.postgresql.org/message-id/CAHGQGwHLjEROTMtSWJd=xg_VFwRe3oJWnTYsyBDUbRYa6rr0DQ@mail.gmail.com. This was fixed in commit 0b958f3efcfcc3d9b0e39d550b705a28763bc9e2 on June 25th: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0b958f3efcfcc3d9b0e39d550b705a28763bc9e2 Unfortunately that commit didn't make it into 9.3beta2, which was wrapped just one day before that commit. Could you try again with a fresh checkout from REL9_3_STABLE branch? Or if you can't easily build from sources, you can wait for the 9.3rc1 release, which should be available later this week (http://www.postgresql.org/message-id/24973.1376419162@sss.pgh.pa.us). Thanks for the report! Please let us know if the next version fixes this. - Heikki