On Fri, Apr 12, 2013 at 7:57 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-04-12 02:29:01 +0900, Fujii Masao wrote:
>> On Thu, Apr 11, 2013 at 10:25 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>> >
>> > You just shut down the old master and let the standby catch
>> > up (takas a few microseconds ;) ) before you promote it.
>> >
>> > After this you can start up the former master with recovery.conf
>> > and it will follow nicely.
>>
>> No. When you shut down the old master, it might not have been
>> able to send all the WAL records to the standby. I have observed
>> this situation several times. So in your approach, new standby
>> might fail to catch up with the master nicely.
>
> It seems most of this thread is focusing on the wrong thing then. If we
> really are only talking about planned failover then we need to solve
> *that* not some ominous "don't flush data too early" which has
> noticeable performance and implementation complexity problems.
At least I'd like to talk about not only planned failover but also normal
failover.
> I guess youre observing that not everything is replicated because youre
> doing an immediate shutdown
No. I did fast shutdown.
At fast shutdown, after walsender sends the checkpoint record and
closes the replication connection, walreceiver can detect the close
of connection before receiving all WAL records. This means that,
even if walsender sends all WAL records, walreceiver cannot always
receive all of them.
> You could even teach the standby not to increment the timeline in that
> case since thats safe.
I don't think this is required thanks to recent Heikki's great efforts about
timelines.
Regards,
--
Fujii Masao