Thread: recovery_target_timeline and multiple slave behavior when master fails
I'm attempting to make use of the ability defined by the statement at the end of the first paragraph in this section of the documentation (http://www.postgresql.org/docs/9.1/static/warm-standby.html#STANDBY-SERVER-SETUP). Specifically, "If you plan to have multiple standby servers for high availability purposes, set recovery_target_timeline to latest, to make the standby server follow the timeline change that occurs at failover to another standby."
To attempt this, I've setup 3 pg machines, a master (192.168.56.5) and 2 pitr slaves (192.168.56.6, 192.168.56.7), both connected to the master. Below, the hostnames will remain the same, even if a slave is promoted to master, so slave 1 will always refer to 192.168.56.6. I've compiled/installed pg 9.1.2 and pgbench for testing purposes. I've made changes to the postgresql.conf file as follows (for ease of testing, using same postgresql.conf on all servers):
wal_level = hot_standby
archive_mode = off
max_wal_senders = 3
wal_keep_segments = 100
hot_standby = on
hot_standby_feedback = on
And on the slaves, I've got this for recovery.conf:
standby_mode = 'on'
primary_conninfo = 'host=192.168.56.5 port=5432 user=postgres'
trigger_file = '/pgsql/omnipitr/finish.recovery'
recovery_target_timeline = latest
I've initialized a fresh DB on master, then executed the following to create fresh slave copies (with .7 instead of .6 to create slave 2):
pg_start_backup('start');
rsync -avh --exclude=pg_log --exclude=postgresql.conf --exclude=postmaster.pid . 192.168.56.6:/pgsql/91/data
pg_stop_backup();
rsync -avh --exclude=pg_log --exclude=postgresql.conf --exclude=postmaster.pid . 192.168.56.6:/pgsql/91/data
Both slaves were brought online and to prove that streaming replication was working, I created a table and threw some simple data into it. These showed up on the slaves, so setup appears good so far. For further data on the master, I used pgbench (time /opt/pgsql/bin/pgbench -i -s 35 test). In the middle of the pgbench data load (I let it load more than 1 million tuples first), I killed the postgresql/pgbench processes on the master.
root@pg912-master:~# ps -ef |grep post
postgres 1955 1 0 11:39 pts/0 00:00:00 /opt/pgsql/bin/postmaster -D /pgsql/91/data
postgres 1956 1955 0 11:39 ? 00:00:00 postgres: logger process
postgres 1958 1955 0 11:39 ? 00:00:00 postgres: writer process
postgres 1959 1955 0 11:39 ? 00:00:00 postgres: wal writer process
postgres 1960 1955 0 11:39 ? 00:00:00 postgres: autovacuum launcher process
postgres 1961 1955 0 11:39 ? 00:00:00 postgres: stats collector process
postgres 2013 1955 1 11:50 ? 00:00:11 postgres: wal sender process postgres 192.168.56.6(35184) streaming 0/2B000000
postgres 2015 1955 0 11:50 ? 00:00:02 postgres: wal sender process postgres 192.168.56.7(44052) streaming 0/2B000000
postgres 2049 1709 1 12:05 pts/1 00:00:00 /opt/pgsql/bin/pgbench -i -s 35 test
postgres 2050 1955 12 12:05 ? 00:00:00 postgres: postgres test [local] COPY
root 2052 1602 0 12:05 pts/0 00:00:00 grep post
root@pg912-master:~# kill -9 1955 1956 1958 1959 1960 1961 2013 2015 2049 2050
After the master terminated, I promoted slave 1 to become the new master by touching the trigger file (touch /pgsql/omnipitr/finish.recovery). Some relevant snippets from slave 1's log file:
2011-12-15 12:05:56.191 EST 2580:LOG: recovery restart point at 0/2D0BE0B0
2011-12-15 12:05:56.191 EST 2580:DETAIL: last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:05:57.031 EST 2580:LOG: restartpoint starting: xlog
2011-12-15 12:06:03.067 EST 2580:LOG: restartpoint complete: wrote 98 buffers (3.2%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=4.907 s, sync=1.101 s, total=6.035 s; sync files=2, longest=1.070 s, average=0.550 s
2011-12-15 12:06:03.067 EST 2580:LOG: recovery restart point at 0/300BE0B0
2011-12-15 12:06:03.067 EST 2580:DETAIL: last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:06:05.148 EST 2581:FATAL: could not receive data from WAL stream:
2011-12-15 12:06:09.769 EST 2579:LOG: invalid record length at 0/33C92290
2011-12-15 12:06:09.938 EST 2608:FATAL: could not connect to the primary server: could not connect to server: Connection refused
Is the server running on host "192.168.56.5" and accepting
TCP/IP connections on port 5432?
... more of the same primary server connection refused error ...
2011-12-15 12:08:09.892 EST 2579:LOG: trigger file found: /pgsql/omnipitr/finish.recovery
2011-12-15 12:08:09.892 EST 2579:LOG: redo done at 0/33C921F0
2011-12-15 12:08:09.892 EST 2579:LOG: last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:08:09.892 EST 2579:LOG: selected new timeline ID: 2
2011-12-15 12:08:10.078 EST 2579:LOG: archive recovery complete
2011-12-15 12:08:10.491 EST 2580:LOG: restartpoint complete: wrote 2692 buffers (87.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=112.051 s, sync=0.274 s, total=112.361 s; sync files=2, longest=0.273 s, average=0.137 s
2011-12-15 12:08:10.492 EST 2580:LOG: recovery restart point at 0/330BE0B0
2011-12-15 12:08:10.492 EST 2580:DETAIL: last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:08:10.493 EST 2580:LOG: checkpoint starting: end-of-recovery immediate wait
2011-12-15 12:08:10.497 EST 2580:LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.005 s; sync files=0, longest=0.000 s, average=0.000 s
2011-12-15 12:08:10.951 EST 2651:LOG: autovacuum launcher started
2011-12-15 12:08:10.960 EST 2577:LOG: database system is ready to accept connections
Slave 2 was still attempting to connect to the master, so I modified its recovery.conf file to look like this (connect to slave 1 instead of master) and restarted pg on slave 2:
standby_mode = 'on'
primary_conninfo = 'host=192.168.56.6 port=5432 user=postgres'
trigger_file = '/pgsql/omnipitr/finish.recovery'
recovery_target_timeline = latest
Relevant log snippets from slave 2:
2011-12-15 12:05:54.992 EST 2626:LOG: recovery restart point at 0/2D0BE0B0
2011-12-15 12:05:54.992 EST 2626:DETAIL: last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:05:56.295 EST 2626:LOG: restartpoint starting: xlog
2011-12-15 12:06:05.979 EST 2627:FATAL: could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2011-12-15 12:06:06.191 EST 2625:LOG: unexpected pageaddr 0/28E40000 in log file 0, segment 50, offset 14942208
2011-12-15 12:06:06.218 EST 2626:LOG: restartpoint complete: wrote 185 buffers (6.0%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=9.692 s, sync=0.222 s, total=9.923 s; sync files=2, longest=0.204 s, average=0.111 s
2011-12-15 12:06:06.218 EST 2626:LOG: recovery restart point at 0/300BE0B0
2011-12-15 12:06:06.218 EST 2626:DETAIL: last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:06:21.419 EST 2652:FATAL: could not connect to the primary server: could not connect to server: Connection refused
Is the server running on host "192.168.56.5" and accepting
TCP/IP connections on port 5432?
... more of the same primary server connection refused error ...
2011-12-15 12:08:44.924 EST 2623:LOG: received fast shutdown request
2011-12-15 12:08:44.935 EST 2623:LOG: aborting any active transactions
2011-12-15 12:08:44.935 EST 2626:LOG: shutting down
2011-12-15 12:08:44.945 EST 2626:LOG: database system is shut down
2011-12-15 12:08:46.802 EST 2737:LOG: database system was shut down in recovery at 2011-12-15 12:08:44 EST
2011-12-15 12:08:46.804 EST 2737:LOG: entering standby mode
2011-12-15 12:08:46.818 EST 2737:LOG: redo starts at 0/300BE0B0
2011-12-15 12:08:47.313 EST 2737:LOG: consistent recovery state reached at 0/32E3FFF0
2011-12-15 12:08:47.313 EST 2735:LOG: database system is ready to accept read only connections
2011-12-15 12:08:47.313 EST 2737:LOG: unexpected pageaddr 0/28E40000 in log file 0, segment 50, offset 14942208
2011-12-15 12:08:47.437 EST 2740:FATAL: timeline 2 of the primary does not match recovery target timeline 1
2011-12-15 12:08:52.323 EST 2742:FATAL: timeline 2 of the primary does not match recovery target timeline 1
... repeated continuously ...
Any thoughts on the above snippets? Am I interpreting the documentation correctly? Is there any further information needed to debug this?
Thanks,
Rick
To attempt this, I've setup 3 pg machines, a master (192.168.56.5) and 2 pitr slaves (192.168.56.6, 192.168.56.7), both connected to the master. Below, the hostnames will remain the same, even if a slave is promoted to master, so slave 1 will always refer to 192.168.56.6. I've compiled/installed pg 9.1.2 and pgbench for testing purposes. I've made changes to the postgresql.conf file as follows (for ease of testing, using same postgresql.conf on all servers):
wal_level = hot_standby
archive_mode = off
max_wal_senders = 3
wal_keep_segments = 100
hot_standby = on
hot_standby_feedback = on
And on the slaves, I've got this for recovery.conf:
standby_mode = 'on'
primary_conninfo = 'host=192.168.56.5 port=5432 user=postgres'
trigger_file = '/pgsql/omnipitr/finish.recovery'
recovery_target_timeline = latest
I've initialized a fresh DB on master, then executed the following to create fresh slave copies (with .7 instead of .6 to create slave 2):
pg_start_backup('start');
rsync -avh --exclude=pg_log --exclude=postgresql.conf --exclude=postmaster.pid . 192.168.56.6:/pgsql/91/data
pg_stop_backup();
rsync -avh --exclude=pg_log --exclude=postgresql.conf --exclude=postmaster.pid . 192.168.56.6:/pgsql/91/data
Both slaves were brought online and to prove that streaming replication was working, I created a table and threw some simple data into it. These showed up on the slaves, so setup appears good so far. For further data on the master, I used pgbench (time /opt/pgsql/bin/pgbench -i -s 35 test). In the middle of the pgbench data load (I let it load more than 1 million tuples first), I killed the postgresql/pgbench processes on the master.
root@pg912-master:~# ps -ef |grep post
postgres 1955 1 0 11:39 pts/0 00:00:00 /opt/pgsql/bin/postmaster -D /pgsql/91/data
postgres 1956 1955 0 11:39 ? 00:00:00 postgres: logger process
postgres 1958 1955 0 11:39 ? 00:00:00 postgres: writer process
postgres 1959 1955 0 11:39 ? 00:00:00 postgres: wal writer process
postgres 1960 1955 0 11:39 ? 00:00:00 postgres: autovacuum launcher process
postgres 1961 1955 0 11:39 ? 00:00:00 postgres: stats collector process
postgres 2013 1955 1 11:50 ? 00:00:11 postgres: wal sender process postgres 192.168.56.6(35184) streaming 0/2B000000
postgres 2015 1955 0 11:50 ? 00:00:02 postgres: wal sender process postgres 192.168.56.7(44052) streaming 0/2B000000
postgres 2049 1709 1 12:05 pts/1 00:00:00 /opt/pgsql/bin/pgbench -i -s 35 test
postgres 2050 1955 12 12:05 ? 00:00:00 postgres: postgres test [local] COPY
root 2052 1602 0 12:05 pts/0 00:00:00 grep post
root@pg912-master:~# kill -9 1955 1956 1958 1959 1960 1961 2013 2015 2049 2050
After the master terminated, I promoted slave 1 to become the new master by touching the trigger file (touch /pgsql/omnipitr/finish.recovery). Some relevant snippets from slave 1's log file:
2011-12-15 12:05:56.191 EST 2580:LOG: recovery restart point at 0/2D0BE0B0
2011-12-15 12:05:56.191 EST 2580:DETAIL: last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:05:57.031 EST 2580:LOG: restartpoint starting: xlog
2011-12-15 12:06:03.067 EST 2580:LOG: restartpoint complete: wrote 98 buffers (3.2%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=4.907 s, sync=1.101 s, total=6.035 s; sync files=2, longest=1.070 s, average=0.550 s
2011-12-15 12:06:03.067 EST 2580:LOG: recovery restart point at 0/300BE0B0
2011-12-15 12:06:03.067 EST 2580:DETAIL: last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:06:05.148 EST 2581:FATAL: could not receive data from WAL stream:
2011-12-15 12:06:09.769 EST 2579:LOG: invalid record length at 0/33C92290
2011-12-15 12:06:09.938 EST 2608:FATAL: could not connect to the primary server: could not connect to server: Connection refused
Is the server running on host "192.168.56.5" and accepting
TCP/IP connections on port 5432?
... more of the same primary server connection refused error ...
2011-12-15 12:08:09.892 EST 2579:LOG: trigger file found: /pgsql/omnipitr/finish.recovery
2011-12-15 12:08:09.892 EST 2579:LOG: redo done at 0/33C921F0
2011-12-15 12:08:09.892 EST 2579:LOG: last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:08:09.892 EST 2579:LOG: selected new timeline ID: 2
2011-12-15 12:08:10.078 EST 2579:LOG: archive recovery complete
2011-12-15 12:08:10.491 EST 2580:LOG: restartpoint complete: wrote 2692 buffers (87.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=112.051 s, sync=0.274 s, total=112.361 s; sync files=2, longest=0.273 s, average=0.137 s
2011-12-15 12:08:10.492 EST 2580:LOG: recovery restart point at 0/330BE0B0
2011-12-15 12:08:10.492 EST 2580:DETAIL: last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:08:10.493 EST 2580:LOG: checkpoint starting: end-of-recovery immediate wait
2011-12-15 12:08:10.497 EST 2580:LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.005 s; sync files=0, longest=0.000 s, average=0.000 s
2011-12-15 12:08:10.951 EST 2651:LOG: autovacuum launcher started
2011-12-15 12:08:10.960 EST 2577:LOG: database system is ready to accept connections
Slave 2 was still attempting to connect to the master, so I modified its recovery.conf file to look like this (connect to slave 1 instead of master) and restarted pg on slave 2:
standby_mode = 'on'
primary_conninfo = 'host=192.168.56.6 port=5432 user=postgres'
trigger_file = '/pgsql/omnipitr/finish.recovery'
recovery_target_timeline = latest
Relevant log snippets from slave 2:
2011-12-15 12:05:54.992 EST 2626:LOG: recovery restart point at 0/2D0BE0B0
2011-12-15 12:05:54.992 EST 2626:DETAIL: last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:05:56.295 EST 2626:LOG: restartpoint starting: xlog
2011-12-15 12:06:05.979 EST 2627:FATAL: could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2011-12-15 12:06:06.191 EST 2625:LOG: unexpected pageaddr 0/28E40000 in log file 0, segment 50, offset 14942208
2011-12-15 12:06:06.218 EST 2626:LOG: restartpoint complete: wrote 185 buffers (6.0%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=9.692 s, sync=0.222 s, total=9.923 s; sync files=2, longest=0.204 s, average=0.111 s
2011-12-15 12:06:06.218 EST 2626:LOG: recovery restart point at 0/300BE0B0
2011-12-15 12:06:06.218 EST 2626:DETAIL: last completed transaction was at log time 2011-12-15 12:05:28.868118-05
2011-12-15 12:06:21.419 EST 2652:FATAL: could not connect to the primary server: could not connect to server: Connection refused
Is the server running on host "192.168.56.5" and accepting
TCP/IP connections on port 5432?
... more of the same primary server connection refused error ...
2011-12-15 12:08:44.924 EST 2623:LOG: received fast shutdown request
2011-12-15 12:08:44.935 EST 2623:LOG: aborting any active transactions
2011-12-15 12:08:44.935 EST 2626:LOG: shutting down
2011-12-15 12:08:44.945 EST 2626:LOG: database system is shut down
2011-12-15 12:08:46.802 EST 2737:LOG: database system was shut down in recovery at 2011-12-15 12:08:44 EST
2011-12-15 12:08:46.804 EST 2737:LOG: entering standby mode
2011-12-15 12:08:46.818 EST 2737:LOG: redo starts at 0/300BE0B0
2011-12-15 12:08:47.313 EST 2737:LOG: consistent recovery state reached at 0/32E3FFF0
2011-12-15 12:08:47.313 EST 2735:LOG: database system is ready to accept read only connections
2011-12-15 12:08:47.313 EST 2737:LOG: unexpected pageaddr 0/28E40000 in log file 0, segment 50, offset 14942208
2011-12-15 12:08:47.437 EST 2740:FATAL: timeline 2 of the primary does not match recovery target timeline 1
2011-12-15 12:08:52.323 EST 2742:FATAL: timeline 2 of the primary does not match recovery target timeline 1
... repeated continuously ...
Any thoughts on the above snippets? Am I interpreting the documentation correctly? Is there any further information needed to debug this?
Thanks,
Rick
On Fri, Dec 16, 2011 at 3:59 AM, Rick Pufky <rick@omniti.com> wrote: > Any thoughts on the above snippets? Am I interpreting the documentation > correctly? Is there any further information needed to debug this? You need to share the archive directory between all three nodes to use that trick. To follow the timeline change that occurs at failover to another standby, the standby needs to read the timeline history file. This file is created and archived at failover by new master (i.e., another standby). This file is not shipped via replication, so the standby needs to read it from the archive. So you need to have the shared archive directory. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Thanks for the comments. I'm not actually running with an archive directory in this configuration (archiving is disabled), however, scp'ing the new history file and the last WAL File from the new master allowed the other slave to just continue replay from where it left off.
This is expected in the SR only setup configuration case?
--
This is expected in the SR only setup configuration case?
On Sun, Dec 18, 2011 at 9:51 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Fri, Dec 16, 2011 at 3:59 AM, Rick Pufky <rick@omniti.com> wrote:You need to share the archive directory between all three nodes to use that
> Any thoughts on the above snippets? Am I interpreting the documentation
> correctly? Is there any further information needed to debug this?
trick.
To follow the timeline change that occurs at failover to another standby,
the standby needs to read the timeline history file. This file is created and
archived at failover by new master (i.e., another standby). This file is not
shipped via replication, so the standby needs to read it from the archive.
So you need to have the shared archive directory.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Rick Pufky
OmniTI Computer Consulting Inc.
Database Administrator