Thread: Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc

Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc

From

hubert depesz lubaczewski

Date:

15 December 2012, 18:06:21

I might be missing something, but what exactly does that commit give us?

I mean - we were able, previously, to make slave switch to new master
- as Phil Sorber described in here:
http://philsorber.blogspot.com/2012/03/what-to-do-when-your-timeline-isnt.html

After some talk on IRC, I understood that this patch will make it
possible to switch to new master in plain SR replication, with no WAL
archive (because if you have wal archive, you can use the method Phil
described, which basically "just works").

So I did setup three machines: master and two slaves.
Master had 2 IPs - its own, and a floating one.
Both slaves were connecting to the floating one, and recovery.conf
looked like:
---------
standby_mode = 'on'
primary_conninfo = 'port=5920 user=replication host=172.28.173.253'
trigger_file = '/tmp/finish.replication'
recovery_target_timeline='latest'
---------

After I verified that replication works to both slaves, I did failover one of
the slaves, shut down master, and did ip takeover of floating ip to the slave
that did takeover.

new master did work nicely, but the 2nd slave, which was supposed to
switch to new master - didn't switch automatically.
In logs I saw:
2012-12-15 15:59:58.495 CET @ 24288  FATAL:  could not send data to WAL stream: server closed the connection
unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.

2012-12-15 16:00:13.518 CET @ 24304  LOG:  fetching timeline history file for timeline 2 from primary server
2012-12-15 16:00:13.522 CET @ 24304  FATAL:  could not start WAL streaming: ERROR:  requested starting point 0/F000000
ontimeline 1 is not in this server's history 
        DETAIL:  This server's history forked from timeline 1 at 0/E002D40

2012-12-15 16:00:13.523 CET @ 24287  LOG:  new timeline 2 forked off current database system timeline 1 before current
recoverypoint 0/F000000 
2012-12-15 16:00:18.535 CET @ 24309  FATAL:  could not start WAL streaming: ERROR:  requested starting point 0/F000000
ontimeline 1 is not in this server's history 
        DETAIL:  This server's history forked from timeline 1 at 0/E002D40

2012-12-15 16:00:18.536 CET @ 24287  LOG:  new timeline 2 forked off current database system timeline 1 before current
recoverypoint 0/F000000 
2012-12-15 16:00:23.549 CET @ 24323  FATAL:  could not start WAL streaming: ERROR:  requested starting point 0/F000000
ontimeline 1 is not in this server's history 
        DETAIL:  This server's history forked from timeline 1 at 0/E002D40

2012-12-15 16:00:23.550 CET @ 24287  LOG:  new timeline 2 forked off current database system timeline 1 before current
recoverypoint 0/F000000 
2012-12-15 16:00:28.564 CET @ 24327  FATAL:  could not start WAL streaming: ERROR:  requested starting point 0/F000000
ontimeline 1 is not in this server's history 
        DETAIL:  This server's history forked from timeline 1 at 0/E002D40

2012-12-15 16:00:28.565 CET @ 24287  LOG:  new timeline 2 forked off current database system timeline 1 before current
recoverypoint 0/F000000 
2012-12-15 16:00:32.927 CET @ 24282  LOG:  received smart shutdown request
2012-12-15 16:00:33.132 CET @ 24289  LOG:  restartpoint complete: wrote 4427 buffers (27.0%); 1 transaction log file(s)
added,0 removed, 0 recycled; write=63.750 s, sync=0.095 s, total=63.943 s; sync files=2, longest=0.093 s, average=0.047
s
2012-12-15 16:00:33.132 CET @ 24289  LOG:  recovery restart point at 0/DD376B8
2012-12-15 16:00:33.132 CET @ 24289  DETAIL:  last completed transaction was at log time 2012-12-15 15:59:43.993318+01
2012-12-15 16:00:33.133 CET @ 24289  LOG:  shutting down
2012-12-15 16:00:33.140 CET @ 24289  LOG:  database system is shut down

I did shutdown the slave, and restarted it, but it fails to start, and instead it just shows in log:

2012-12-15 16:00:38.130 CET @ 24348  LOG:  database system was shut down in recovery at 2012-12-15 16:00:33 CET
2012-12-15 16:00:38.130 CET @ 24348  FATAL:  requested timeline 2 does not contain minimum recovery point 0/F000000 on
timeline1 
2012-12-15 16:00:38.130 CET @ 24343  LOG:  startup process (PID 24348) exited with exit code 1
2012-12-15 16:00:38.130 CET @ 24343  LOG:  aborting startup due to startup process failure

So, what is the problem, and how can I get it working?

Or, if wal archive is necessary - what is the point of this commit? What am I
missing in here?

Best regards,

depesz

On Thu, Dec 13, 2012 at 05:19:36PM +0000, Heikki Linnakangas wrote:
> Allow a streaming replication standby to follow a timeline switch.
>
> Before this patch, streaming replication would refuse to start replicating
> if the timeline in the primary doesn't exactly match the standby. The
> situation where it doesn't match is when you have a master, and two
> standbys, and you promote one of the standbys to become new master.
> Promoting bumps up the timeline ID, and after that bump, the other standby
> would refuse to continue.
>
> There's significantly more timeline related logic in streaming replication
> now. First of all, when a standby connects to primary, it will ask the
> primary for any timeline history files that are missing from the standby.
> The missing files are sent using a new replication command TIMELINE_HISTORY,
> and stored in standby's pg_xlog directory. Using the timeline history files,
> the standby can follow the latest timeline present in the primary
> (recovery_target_timeline='latest'), just as it can follow new timelines
> appearing in an archive directory.
>
> START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
> timeline to stream WAL from. This allows the standby to request the primary
> to send over WAL that precedes the promotion. The replication protocol is
> changed slightly (in a backwards-compatible way although there's little hope
> of streaming replication working across major versions anyway), to allow
> replication to stop when the end of timeline reached, putting the walsender
> back into accepting a replication command.
>
> Many thanks to Amit Kapila for testing and reviewing various versions of
> this patch.
>
> Branch
> ------
> master
>
> Details
> -------
> http://git.postgresql.org/pg/commitdiff/abfd192b1b5ba5216ac4b1f31dcd553106304b19
>
> Modified Files
> --------------
> doc/src/sgml/high-availability.sgml                |    7 +-
> doc/src/sgml/protocol.sgml                         |   77 +++-
> src/backend/access/transam/timeline.c              |   83 +++
> src/backend/access/transam/xlog.c                  |   55 +-
> src/backend/access/transam/xlogfuncs.c             |    4 +-
> src/backend/postmaster/postmaster.c                |   21 -
> src/backend/postmaster/startup.c                   |    2 +
> src/backend/replication/basebackup.c               |   21 +-
> .../libpqwalreceiver/libpqwalreceiver.c            |  202 +++++--
> src/backend/replication/repl_gram.y                |   47 ++-
> src/backend/replication/repl_scanner.l             |   10 +
> src/backend/replication/walreceiver.c              |  472 +++++++++++++---
> src/backend/replication/walreceiverfuncs.c         |   90 +++-
> src/backend/replication/walsender.c                |  587 +++++++++++++++----
> src/include/access/timeline.h                      |    1 +
> src/include/access/xlog.h                          |    4 +-
> src/include/nodes/nodes.h                          |    1 +
> src/include/nodes/replnodes.h                      |   12 +
> src/include/replication/walreceiver.h              |   52 ++-
> src/include/replication/walsender.h                |    1 -
> src/include/replication/walsender_private.h        |    2 +-
> src/interfaces/libpq/fe-exec.c                     |    8 +-
> src/interfaces/libpq/fe-protocol3.c                |    7 +-
> 23 files changed, 1396 insertions(+), 370 deletions(-)
>
>
> --
> Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-committers

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc

From

Heikki Linnakangas

Date:

17 December 2012, 15:01:23

On 15.12.2012 17:06, hubert depesz lubaczewski wrote:
> I might be missing something, but what exactly does that commit give us?
>
> I mean - we were able, previously, to make slave switch to new master
> - as Phil Sorber described in here:
> http://philsorber.blogspot.com/2012/03/what-to-do-when-your-timeline-isnt.html
>
> After some talk on IRC, I understood that this patch will make it
> possible to switch to new master in plain SR replication, with no WAL
> archive (because if you have wal archive, you can use the method Phil
> described, which basically "just works").

Right, that's exactly the point of the patch. A WAL archive is no longer
necessary for failover.

> So I did setup three machines: master and two slaves.
> Master had 2 IPs - its own, and a floating one.
> Both slaves were connecting to the floating one, and recovery.conf
> looked like:
> ---------
> standby_mode = 'on'
> primary_conninfo = 'port=5920 user=replication host=172.28.173.253'
> trigger_file = '/tmp/finish.replication'
> recovery_target_timeline='latest'
> ---------
>
> After I verified that replication works to both slaves, I did failover one of
> the slaves, shut down master, and did ip takeover of floating ip to the slave
> that did takeover.

Hmm, is it possible that some WAL was generated in the old master, and
streamed to the standby, after the new master was already promoted? It's
important to kill the old master before promoting the new master.
Otherwise the timelines diverge, so that you have some WAL on the old
timeline that's not present in the new master, and some WAL in the new
master's timeline that's not present in the old master. In that
situation, if the standby has already replicated the WAL from the old
master, it can no longer start to follow the new master. I think that
would match the symptoms you're seeing.

I wouldn't rule out a bug in the patch either, though. Amit found a
worrying number of bugs in his testing, and although we stamped out all
the known bugs, it wouldn't surprise me if there's more :-(..

- Heikki

Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc

From

hubert depesz lubaczewski

Date:

17 December 2012, 17:11:05

On Mon, Dec 17, 2012 at 02:01:20PM +0200, Heikki Linnakangas wrote:
> Hmm, is it possible that some WAL was generated in the old master,
> and streamed to the standby, after the new master was already
> promoted? It's important to kill the old master before promoting the
> new master. Otherwise the timelines diverge, so that you have some
> WAL on the old timeline that's not present in the new master, and
> some WAL in the new master's timeline that's not present in the old
> master. In that situation, if the standby has already replicated the
> WAL from the old master, it can no longer start to follow the new
> master. I think that would match the symptoms you're seeing.
>
> I wouldn't rule out a bug in the patch either, though. Amit found a
> worrying number of bugs in his testing, and although we stamped out
> all the known bugs, it wouldn't surprise me if there's more :-(..

Will verify and reach back to you.
In my test I was doing first promote slave, and later (like 2 seconds
later, and there was no traffic, but still - later) kill master.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc

From

hubert depesz lubaczewski

Date:

18 December 2012, 14:42:22

On Mon, Dec 17, 2012 at 02:01:20PM +0200, Heikki Linnakangas wrote:
> Hmm, is it possible that some WAL was generated in the old master,
> and streamed to the standby, after the new master was already
> promoted? It's important to kill the old master before promoting the
> new master. Otherwise the timelines diverge, so that you have some
> WAL on the old timeline that's not present in the new master, and
> some WAL in the new master's timeline that's not present in the old
> master. In that situation, if the standby has already replicated the
> WAL from the old master, it can no longer start to follow the new
> master. I think that would match the symptoms you're seeing.

OK. Checked, and can't get it work.

Situation:
3 hosts:
ubuntu1 (172.28.173.140)
ubuntu2 (172.28.173.141)
ubuntu3 (172.28.173.142)
on all I have 9.3head compiled.
Also I have floating IP (172.28.173.253) which is aliased on ubuntu1.

On ubuntu1 I make simple Pg initdb, start backup, copy data to ubuntu2
and ubuntu3, and stop backup.

than, on both ubuntu2 and 3, I do:
rm data/pg_xlog/0* data/pg_xlog/archive_status/* data/pg_log/* data/postmaster.pid ; cp ~/recovery.conf ~/data/

recovery.conf is:

standby_mode = 'on'
primary_conninfo = 'port=5920 user=replication host=172.28.173.253'
trigger_file = '/tmp/finish.replication'
recovery_target_timeline='latest'

relevant settings in master conf:
-----------------
=$ grep -i -E '(wal|checkpoint|archive)' data/postgresql.conf
# RESOURCE USAGE (except WAL)
wal_level = hot_standby
#wal_sync_method = fsync                # the default is the first option
#wal_buffers = -1                       # min 32kB, -1 sets based on shared_buffers
#wal_writer_delay = 200ms               # 1-10000 milliseconds
# - Checkpoints -
#checkpoint_segments = 3                # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min              # range 30s-1h
#checkpoint_completion_target = 0.5     # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning = 30s               # 0 disables
archive_mode = on
archive_command = '/bin/true'
                                # placeholders: %p = path of file to archive
                                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0            # force a logfile segment switch after this
max_wal_senders = 3            # max number of walsender processes
wal_keep_segments = 20          # in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s       # in milliseconds; 0 disables
#max_standby_archive_delay = 30s        # max delay before canceling queries
                                        # when reading WAL from archive;
                                        # when reading streaming WAL;
#wal_receiver_status_interval = 10s     # send replies at least this often
#wal_receiver_timeout = 60s             # time that receiver waits for
log_checkpoints = on
-----------------

Afterwards I do:

on ubuntu1:
pg_ctl -m fast stop
sudo ifdown eth0:0

eth0:0 is the interface with shared ip.

all works fine.

then on ubuntu2 I do:
touch /tmp/finish.replication
sudo ifup eth0:0

result:
ubuntu2 is working OK, but ubuntu3 logs:
2012-12-18 12:33:34.037 CET @ 1513  LOG:  database system was interrupted; last known up at 2012-12-18 12:32:47 CET
2012-12-18 12:33:34.037 CET @ 1513  LOG:  entering standby mode
2012-12-18 12:33:49.075 CET @ 1514  LOG:  started streaming WAL from primary at 0/14000000 on timeline 1
2012-12-18 12:33:49.279 CET @ 1513  LOG:  redo starts at 0/14000028
2012-12-18 12:33:49.282 CET @ 1513  LOG:  consistent recovery state reached at 0/140000E8
2012-12-18 12:33:49.282 CET @ 1508  LOG:  database system is ready to accept read only connections
2012-12-18 12:34:14.007 CET @ 1514  LOG:  replication terminated by primary server
2012-12-18 12:34:14.007 CET @ 1514  DETAIL:  End of WAL reached on timeline 1
2012-12-18 12:34:14.014 CET @ 1514  FATAL:  could not send data to WAL stream: server closed the connection
unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.

2012-12-18 12:34:14.014 CET @ 1513  LOG:  record with zero length at 0/15000088
2012-12-18 12:34:44.066 CET @ 1598  LOG:  fetching timeline history file for timeline 2 from primary server
2012-12-18 12:34:44.071 CET @ 1598  LOG:  started streaming WAL from primary at 0/15000000 on timeline 1
2012-12-18 12:34:44.071 CET @ 1598  LOG:  replication terminated by primary server
2012-12-18 12:34:44.071 CET @ 1598  DETAIL:  End of WAL reached on timeline 1
2012-12-18 12:34:44.071 CET @ 1598  FATAL:  error reading result of streaming command: ERROR:  requested WAL segment
000000010000000000000015has already been removed 

2012-12-18 12:34:44.072 CET @ 1513  LOG:  new target timeline is 2
2012-12-18 12:34:44.078 CET @ 1612  LOG:  started streaming WAL from primary at 0/15000000 on timeline 2
2012-12-18 12:34:44.078 CET @ 1612  LOG:  replication terminated by primary server
2012-12-18 12:34:44.078 CET @ 1612  DETAIL:  End of WAL reached on timeline 2
2012-12-18 12:34:44.078 CET @ 1612  FATAL:  error reading result of streaming command: ERROR:  requested WAL segment
000000020000000000000015has already been removed 
...

and so on.

Looks like it did switch timeline, but the new slave doesn't have some bits in WAL.
Which is weird, because, on ubuntu2:
$ select pg_current_xlog_location();
 pg_current_xlog_location
──────────────────────────
 0/150000E8
(1 row)

On ubuntu2 in pg_xlog:

12:41:05 pgdba@ubuntu2 ~/data/pg_xlog
=$ ls -la
total 49168
drwx------  3 pgdba pgdba     4096 Dec 18 12:34 ./
drwx------ 15 pgdba pgdba     4096 Dec 18 12:34 ../
-rw-------  1 pgdba pgdba 16777216 Dec 18 12:34 000000020000000000000016
-rw-------  1 pgdba pgdba 16777216 Dec 18 12:34 000000020000000000000017
-rw-------  1 pgdba pgdba 16777216 Dec 18 12:33 000000020000000000000018
-rw-------  1 pgdba pgdba       42 Dec 18 12:34 00000002.history
drwx------  2 pgdba pgdba     4096 Dec 18 12:34 archive_status/

12:41:06 pgdba@ubuntu2 ~/data/pg_xlog
=$ cat 00000002.history
1       0/15000088      no recovery target specified

In pg_log on ubuntu2 I see:

2012-12-18 12:41:34.428 CET [unknown]@[unknown] 1685  LOG:  connection received: host=172.28.173.142 port=45842
2012-12-18 12:41:34.430 CET replication@[unknown] 1685 172.28.173.142(45842) LOG:  replication connection authorized:
user=replication
2012-12-18 12:41:34.432 CET replication@[unknown] 1685 172.28.173.142(45842) ERROR:  requested WAL segment
000000020000000000000015has already been removed 
2012-12-18 12:41:34.433 CET replication@[unknown] 1685 172.28.173.142(45842) LOG:  disconnection: session time:
0:00:00.005user=replication database= host=172.28.173.142 port=45842 

Something looks weird. To put it lightly.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc

From

Heikki Linnakangas

Date:

20 December 2012, 15:50:11

On 18.12.2012 13:42, hubert depesz lubaczewski wrote:
> In pg_log on ubuntu2 I see:
>
> 2012-12-18 12:41:34.428 CET [unknown]@[unknown] 1685  LOG:  connection received: host=172.28.173.142 port=45842
> 2012-12-18 12:41:34.430 CET replication@[unknown] 1685 172.28.173.142(45842) LOG:  replication connection authorized:
user=replication
> 2012-12-18 12:41:34.432 CET replication@[unknown] 1685 172.28.173.142(45842) ERROR:  requested WAL segment
000000020000000000000015has already been removed 
> 2012-12-18 12:41:34.433 CET replication@[unknown] 1685 172.28.173.142(45842) LOG:  disconnection: session time:
0:00:00.005user=replication database= host=172.28.173.142 port=45842 
>
> Something looks weird. To put it lightly.

Hmm, that's a different error than you got before. Thom also reported a
"requested WAL segment ... has already been removed" error, but in his
test case, and as far as I could reproduce it, the error doesn't reoccur
when the standby reconnects. In other words, it eventually worked
despite that error. In any case, I just committed a fix for the scenario
that Thom reported. Can you try again with a fresh checkout?

- Heikki

Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc

From

hubert depesz lubaczewski

Date:

22 December 2012, 16:13:25

On Thu, Dec 20, 2012 at 02:50:12PM +0200, Heikki Linnakangas wrote:
> Hmm, that's a different error than you got before. Thom also
> reported a "requested WAL segment ... has already been removed"
> error, but in his test case, and as far as I could reproduce it, the
> error doesn't reoccur when the standby reconnects. In other words,
> it eventually worked despite that error. In any case, I just
> committed a fix for the scenario that Thom reported. Can you try
> again with a fresh checkout?

Tested today with checkout of 1ff92eea140ccf0503b7399549031976e5c6642e

All worked fine. Thanks a lot.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/