Re: can we avoid pg_basebackup on planned switches? - Mailing list pgsql-general

From Fujii Masao
Subject Re: can we avoid pg_basebackup on planned switches?
Date
Msg-id CAHGQGwEbZQdqRjCrBgx6d48+QYxJxA1_BdUxhCEcU5-y0xENwg@mail.gmail.com
Whole thread Raw
In response to Re: can we avoid pg_basebackup on planned switches?  (Ben Chobot <bench@silentmedia.com>)
Responses Re: can we avoid pg_basebackup on planned switches?  (Ben Chobot <bench@silentmedia.com>)
List pgsql-general
On Mon, Aug 6, 2012 at 3:29 AM, Ben Chobot <bench@silentmedia.com> wrote:
>
> On Aug 5, 2012, at 11:12 AM, Fujii Masao wrote:
>
>> On Sat, Jul 28, 2012 at 2:00 AM, Ben Chobot <bench@silentmedia.com> wrote:
>>> We make heavy use of streaming replication on PG 9.1 and it's been great for
>>> us. We do have one issue with it, though, and that's when we switch master
>>> nodes - currently, the documentation says that you must run pg_basebackup on
>>> your old master to turn it into a slave. That makes sense when the old
>>> master had crashed, but it seems that in the case of a planned switch, we
>>> could do better. Here's what we tried that seemed to work... are we shooting
>>> ourselves in the foot?
>>>
>>> 1. Cleanly shut down the current master.
>>> 2. Pick a slave, turn it into the new master.
>>
>> Before promoting the standby, you have to confirm that all WAL files
>> the old master generated have been shipped to the standby which you'll promote. Because the
>> standby might terminate the replication before receiving all WAL
>> files. Note that there is no clean way to confirm that. For example, to confirm that, you need to
>> execute CHECKPOINT in the standby, run pg_controldata in both old master and
>> standby, and check whether their latest checkpoint locations are the same. You
>> may think to compare the latest checkpoint location in the old master and
>> pg_last_xlog_replay_location in the standby. But the former indicates
>> the *starting* location of the last WAL record (i.e., shutdown checkpoint WAL record). OTOH,
>> the latter indicates the *ending* location of it. So you should not compare them
>> without taking into consideration the above mismatch.
>>
>> If the standby failed to receive some WAL files, you need to manually copy them
>> in pg_xlog from the old master to the standby.
>
> Oh, I would have though that doing a clean shutdown of the old master (step 1) would have made sure that all the
unstreamedwal records would be flushed to any connected slaves as part of the master shutting down. In retrospect, I
don'tremember reading that anywhere, so I must have made that up because I wanted it to be that way. Is it wishful
thinking?

When clean shutdown is requested, the master sends all WAL records to
the standby,
but it doesn't wait for the standby to receive them. So there is no
guarantee that all WAL
records have been flushed to the standby. Walreceiver process in the
standby might
detect the termination of replication connection and exit before
receiving all WAL records.
Unfortunately I've encountered that case some times.

Regards,

--
Fujii Masao

pgsql-general by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: JSON in 9.2: limitations
Next
From: "Kevin Grittner"
Date:
Subject: Re: Feature Request - Postgres FDW