Home > mailing lists

Re: Issues with Quorum Commit - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Issues with Quorum Commit
Date	October 14, 2010 08:22:03
Msg-id	AANLkTikccYZmZCBs6U912_r0XKYD16Ynx1F-k8VZRoBW@mail.gmail.com Whole thread
In response to	Re: Issues with Quorum Commit (Fujii Masao <masao.fujii@gmail.com>)
List	pgsql-hackers

Tree view

On Wed, Oct 13, 2010 at 5:22 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Oct 13, 2010 at 3:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> There's another problem here we should think about, too.  Suppose you
>> have a master and two standbys.  The master dies.  You promote one of
>> the standbys, which turns out to be behind the other.  You then
>> repoint the other standby at the one you promoted.  Congratulations,
>> your database is now very possible corrupt, and you may very well get
>> no warning of that fact.  It seems to me that we would be well-advised
>> to install some kind of bullet-proof safeguard against this kind of
>> problem, so that you will KNOW that the standby needs to be re-synced.
>
> Yep. This is why I said it's not easy to implement that.
>
> To start the standby without taking a base backup from new master after
> failover, the user basically has to promote the standby which is ahead
> of the other standbys (e.g., by comparing pg_last_xlog_replay_location
> on each standby).
>
> As the safeguard, we seem to need to compare the location at the switch
> of the timeline on the master with the last replay location on the standby.
> If the latter location is ahead AND the timeline ID of the standby is not
> the same as that of the master, we should emit warning and terminate the
> replication connection.

That doesn't seem very bullet-proof.  You can accidentally corrupt a
standby even when only one time-line is involved. AFAIK, stopping a
standby, removing recovery.conf, and starting it up again does not
change time lines.  You can even shut down the standby, bring it up as
a master, generate a little WAL, shut it back down, and bring it back
up as a standby pointing to the same master.  It would be nice to
embed in each checkpoint record an identifier that changes randomly on
each transition to normal running, so that if you do something like
this we can notice and complain loudly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Itagaki Takahiro
Date: 14 October 2010, 08:11:37
Subject: Re: How to reliably detect if it's a promoting standby

From: Itagaki Takahiro
Date: 14 October 2010, 08:26:01
Subject: Re: string function - "format" function proposal

Re: Issues with Quorum Commit - Mailing list pgsql-hackers

Previous

Next