Thread: recovery_target_timeline and multiple slave behavior when master fails

recovery_target_timeline and multiple slave behavior when master fails

From
Rick Pufky
Date:
I'm attempting to make use of the ability defined by the statement at the end of the first paragraph in this section of the documentation (http://www.postgresql.org/docs/9.1/static/warm-standby.html#STANDBY-SERVER-SETUP). Specifically, "If you plan to have multiple standby servers for high availability purposes, set recovery_target_timeline to latest, to make the standby server follow the timeline change that occurs at failover to another standby."

To attempt this, I've setup 3 pg machines, a master (192.168.56.5) and 2 pitr slaves (192.168.56.6, 192.168.56.7), both connected to the master. Below, the hostnames will remain the same, even if a slave is promoted to master, so slave 1 will always refer to 192.168.56.6. I've compiled/installed pg 9.1.2 and pgbench for testing purposes. I've made changes to the postgresql.conf file as follows (for ease of testing, using same postgresql.conf on all servers):
wal_level = hot_standby
archive_mode = off
max_wal_senders = 3
wal_keep_segments = 100
hot_standby = on
hot_standby_feedback = on

And on the slaves, I've got this for recovery.conf:
standby_mode = 'on'
primary_conninfo = 'host=192.168.56.5 port=5432 user=postgres'
trigger_file = '/pgsql/omnipitr/finish.recovery'
recovery_target_timeline = latest


I've initialized a fresh DB on master, then executed the following to create fresh slave copies (with .7 instead of .6 to create slave 2):
      pg_start_backup('start');
      rsync -avh --exclude=pg_log --exclude=postgresql.conf --exclude=postmaster.pid . 192.168.56.6:/pgsql/91/data
      pg_stop_backup();
      rsync -avh --exclude=pg_log --exclude=postgresql.conf --exclude=postmaster.pid . 192.168.56.6:/pgsql/91/data

Both slaves were brought online and to prove that streaming replication was working, I created a table and threw some simple data into it. These showed up on the slaves, so setup appears good so far. For further data on the master, I used pgbench (time /opt/pgsql/bin/pgbench -i -s 35 test). In the middle of the pgbench data load (I let it load more than 1 million tuples first), I killed the postgresql/pgbench processes on the master.
root@pg912-master:~# ps -ef |grep post
postgres  1955     1  0 11:39 pts/0    00:00:00 /opt/pgsql/bin/postmaster -D /pgsql/91/data
postgres  1956  1955  0 11:39 ?        00:00:00 postgres: logger process                   
postgres  1958  1955  0 11:39 ?        00:00:00 postgres: writer process                   
postgres  1959  1955  0 11:39 ?        00:00:00 postgres: wal writer process               
postgres  1960  1955  0 11:39 ?        00:00:00 postgres: autovacuum launcher process      
postgres  1961  1955  0 11:39 ?        00:00:00 postgres: stats collector process          
postgres  2013  1955  1 11:50 ?        00:00:11 postgres: wal sender process postgres 192.168.56.6(35184) streaming 0/2B000000
postgres  2015  1955  0 11:50 ?        00:00:02 postgres: wal sender process postgres 192.168.56.7(44052) streaming 0/2B000000
postgres  2049  1709  1 12:05 pts/1    00:00:00 /opt/pgsql/bin/pgbench -i -s 35 test
postgres  2050  1955 12 12:05 ?        00:00:00 postgres: postgres test [local] COPY       
root      2052  1602  0 12:05 pts/0    00:00:00 grep post
root@pg912-master:~# kill -9 1955 1956 1958 1959 1960 1961 2013 2015 2049 2050

After the master terminated, I promoted slave 1 to become the new master by touching the trigger file (touch /pgsql/omnipitr/finish.recovery). Some relevant snippets from slave 1's log file:
2011-12-15 12:05:56.191 EST    2580:LOG:  recovery restart point at 0/2D0BE0B0
2011-12-15 12:05:56.191 EST    2580:DETAIL:  last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:05:57.031 EST    2580:LOG:  restartpoint starting: xlog
2011-12-15 12:06:03.067 EST    2580:LOG:  restartpoint complete: wrote 98 buffers (3.2%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=4.907 s, sync=1.101 s, total=6.035 s; sync files=2, longest=1.070 s, average=0.550 s
2011-12-15 12:06:03.067 EST    2580:LOG:  recovery restart point at 0/300BE0B0
2011-12-15 12:06:03.067 EST    2580:DETAIL:  last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:06:05.148 EST    2581:FATAL:  could not receive data from WAL stream:
2011-12-15 12:06:09.769 EST    2579:LOG:  invalid record length at 0/33C92290
2011-12-15 12:06:09.938 EST    2608:FATAL:  could not connect to the primary server: could not connect to server: Connection refused
                Is the server running on host "192.168.56.5" and accepting
                TCP/IP connections on port 5432?
... more of the same primary server connection refused error ...
2011-12-15 12:08:09.892 EST    2579:LOG:  trigger file found: /pgsql/omnipitr/finish.recovery
2011-12-15 12:08:09.892 EST    2579:LOG:  redo done at 0/33C921F0
2011-12-15 12:08:09.892 EST    2579:LOG:  last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:08:09.892 EST    2579:LOG:  selected new timeline ID: 2
2011-12-15 12:08:10.078 EST    2579:LOG:  archive recovery complete
2011-12-15 12:08:10.491 EST    2580:LOG:  restartpoint complete: wrote 2692 buffers (87.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=112.051 s, sync=0.274 s, total=112.361 s; sync files=2, longest=0.273 s, average=0.137 s
2011-12-15 12:08:10.492 EST    2580:LOG:  recovery restart point at 0/330BE0B0
2011-12-15 12:08:10.492 EST    2580:DETAIL:  last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:08:10.493 EST    2580:LOG:  checkpoint starting: end-of-recovery immediate wait
2011-12-15 12:08:10.497 EST    2580:LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.005 s; sync files=0, longest=0.000 s, average=0.000 s
2011-12-15 12:08:10.951 EST    2651:LOG:  autovacuum launcher started
2011-12-15 12:08:10.960 EST    2577:LOG:  database system is ready to accept connections


Slave 2 was still attempting to connect to the master, so I modified its recovery.conf file to look like this (connect to slave 1 instead of master) and restarted pg on slave 2:
standby_mode = 'on'
primary_conninfo = 'host=192.168.56.6 port=5432 user=postgres'
trigger_file = '/pgsql/omnipitr/finish.recovery'
recovery_target_timeline = latest

Relevant log snippets from slave 2:
2011-12-15 12:05:54.992 EST    2626:LOG:  recovery restart point at 0/2D0BE0B0
2011-12-15 12:05:54.992 EST    2626:DETAIL:  last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:05:56.295 EST    2626:LOG:  restartpoint starting: xlog
2011-12-15 12:06:05.979 EST    2627:FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.

2011-12-15 12:06:06.191 EST    2625:LOG:  unexpected pageaddr 0/28E40000 in log file 0, segment 50, offset 14942208
2011-12-15 12:06:06.218 EST    2626:LOG:  restartpoint complete: wrote 185 buffers (6.0%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=9.692 s, sync=0.222 s, total=9.923 s; sync files=2, longest=0.204 s, average=0.111 s
2011-12-15 12:06:06.218 EST    2626:LOG:  recovery restart point at 0/300BE0B0
2011-12-15 12:06:06.218 EST    2626:DETAIL:  last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:06:21.419 EST    2652:FATAL:  could not connect to the primary server: could not connect to server: Connection refused
                Is the server running on host "192.168.56.5" and accepting
                TCP/IP connections on port 5432?
... more of the same primary server connection refused error ...
2011-12-15 12:08:44.924 EST    2623:LOG:  received fast shutdown request
2011-12-15 12:08:44.935 EST    2623:LOG:  aborting any active transactions
2011-12-15 12:08:44.935 EST    2626:LOG:  shutting down
2011-12-15 12:08:44.945 EST    2626:LOG:  database system is shut down
2011-12-15 12:08:46.802 EST    2737:LOG:  database system was shut down in recovery at 2011-12-15 12:08:44 EST
2011-12-15 12:08:46.804 EST    2737:LOG:  entering standby mode
2011-12-15 12:08:46.818 EST    2737:LOG:  redo starts at 0/300BE0B0
2011-12-15 12:08:47.313 EST    2737:LOG:  consistent recovery state reached at 0/32E3FFF0
2011-12-15 12:08:47.313 EST    2735:LOG:  database system is ready to accept read only connections
2011-12-15 12:08:47.313 EST    2737:LOG:  unexpected pageaddr 0/28E40000 in log file 0, segment 50, offset 14942208
2011-12-15 12:08:47.437 EST    2740:FATAL:  timeline 2 of the primary does not match recovery target timeline 1
2011-12-15 12:08:52.323 EST    2742:FATAL:  timeline 2 of the primary does not match recovery target timeline 1
... repeated continuously ...


Any thoughts on the above snippets? Am I interpreting the documentation correctly? Is there any further information needed to debug this?

Thanks,
Rick

Re: recovery_target_timeline and multiple slave behavior when master fails

From
Fujii Masao
Date:
On Fri, Dec 16, 2011 at 3:59 AM, Rick Pufky <rick@omniti.com> wrote:
> Any thoughts on the above snippets? Am I interpreting the documentation
> correctly? Is there any further information needed to debug this?

You need to share the archive directory between all three nodes to use that
trick.

To follow the timeline change that occurs at failover to another standby,
the standby needs to read the timeline history file. This file is created and
archived at failover by new master (i.e., another standby). This file is not
shipped via replication, so the standby needs to read it from the archive.
So you need to have the shared archive directory.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: recovery_target_timeline and multiple slave behavior when master fails

From
Rick Pufky
Date:
Thanks for the comments. I'm not actually running with an archive directory in this configuration (archiving is disabled), however, scp'ing the new history file and the last WAL File from the new master allowed the other slave to just continue replay from where it left off.

This is expected in the SR only setup configuration case?

On Sun, Dec 18, 2011 at 9:51 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Fri, Dec 16, 2011 at 3:59 AM, Rick Pufky <rick@omniti.com> wrote:
> Any thoughts on the above snippets? Am I interpreting the documentation
> correctly? Is there any further information needed to debug this?

You need to share the archive directory between all three nodes to use that
trick.

To follow the timeline change that occurs at failover to another standby,
the standby needs to read the timeline history file. This file is created and
archived at failover by new master (i.e., another standby). This file is not
shipped via replication, so the standby needs to read it from the archive.
So you need to have the shared archive directory.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



--
Rick Pufky
OmniTI Computer Consulting Inc.
Database Administrator