pg cluster not cleaning up after failover - Mailing list pgsql-admin
From | Peter Brunnengräber |
---|---|
Subject | pg cluster not cleaning up after failover |
Date | |
Msg-id | 1941931359.108.1468425520834.JavaMail.pbrunnen@Station8.local Whole thread Raw |
List | pgsql-admin |
Hello all, I'm having an issue with a postgresql 9.2 cluster during failover and hope you all can help. I have been attempting tofollow the guide provided at ClusterLabs(1) but not having much luck and I don't quite understand where the issue is. I'm running on debian wheezy. I have my crm_mon output below. One server is PRI and operating normally after taking over. I have pg setup to do thewal archiving via rsync to the opposite node. <archive_command = 'rsync -a %p test-node2:/db/data/postgresql/9.2/pg_archive/%f'> The rsync is working and I do see WAL files going to the other host appropriately. Node2 was the PRI... So after node1 that was previously in HA:sync promoted last night to PRI and node2 is stopped. TheWAL files are arriving from node1 on node2. I cleaned-up the /tmp/PGSQL.lock file and proceed with a pg_basebackup restorefrom node1. This all went well without error in the node1 postgresql log. After running a crm cleanup on the msPostgresql resource, node2 keeps showing 'LATEST' but gets hung up at HS:alone. PlusI don't understand why the xlog-loc of node2 shows 0000001EB9053DD8 which is farther ahead of node1's master-baselineof 0000001EB2000080. I saw the 'cannot stat ... 000000010000001E000000BB' error, but that seems to alwayshappen for the current xlog filename. And if I wasn't confused enough, the pg log on node2 says "streaming replication successfully connected to primary" andthe pg_stat_replication query on node1 shows connected, but ASYNC. Any ideas? Very much appreciated! -With kind regards, Peter Brunnengräber References: (1) http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster#after_fail-over ### ============ Last updated: Wed Jul 13 14:51:53 2016 Last change: Wed Jul 13 14:49:17 2016 via crmd on test-node2 Stack: openais Current DC: test-node1 - partition with quorum Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff 2 Nodes configured, 2 expected votes 4 Resources configured. ============ Online: [ test-node1 test-node2 ] Full list of resources: Resource Group: g_master ClusterIP-Net1 (ocf::heartbeat:IPaddr2): Started test-node1 ReplicationIP-Net2 (ocf::heartbeat:IPaddr2): Started test-node1 Master/Slave Set: msPostgresql [pgsql] Masters: [ test-node1 ] Slaves: [ test-node2 ] Node Attributes: * Node test-node1: + master-pgsql:0 : 1000 + master-pgsql:1 : 1000 + pgsql-data-status : LATEST + pgsql-master-baseline : 0000001EB2000080 + pgsql-status : PRI * Node test-node2: + master-pgsql:0 : -INFINITY + master-pgsql:1 : -INFINITY + pgsql-data-status : LATEST + pgsql-status : HS:alone + pgsql-xlog-loc : 0000001EB9053DD8 Migration summary: * Node test-node2: * Node test-node1: #### Node2 2016-07-13 14:55:09 UTC LOG: database system was interrupted; last known up at 2016-07-13 14:54:27 UTC 2016-07-13 14:55:09 UTC LOG: creating missing WAL directory "pg_xlog/archive_status" cp: cannot stat `/db/data/postgresql/9.2/pg_archive/00000002.history': No such file or directory 2016-07-13 14:55:09 UTC LOG: entering standby mode 2016-07-13 14:55:09 UTC LOG: restored log file "000000010000001E000000BA" from archive 2016-07-13 14:55:09 UTC FATAL: the database system is starting up 2016-07-13 14:55:09 UTC LOG: redo starts at 1E/BA000020 2016-07-13 14:55:09 UTC LOG: consistent recovery state reached at 1E/BA05FED8 2016-07-13 14:55:09 UTC LOG: database system is ready to accept read only connections cp: cannot stat `/db/data/postgresql/9.2/pg_archive/000000010000001E000000BB': No such file or directory cp: cannot stat `/db/data/postgresql/9.2/pg_archive/00000002.history': No such file or directory 2016-07-13 14:55:09 UTC LOG: streaming replication successfully connected to primary #### Node1 postgres=# select application_name,upper(state),upper(sync_state) from pg_stat_replication; +------------------+-----------+-------+ | application_name | upper | upper | +------------------+-----------+-------+ | test-node2 | STREAMING | ASYNC | +------------------+-----------+-------+ (1 row)
pgsql-admin by date: