Re: Configuring synchronous replication - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Configuring synchronous replication
Date
Msg-id AANLkTi=DhqLuG+R4zX-cRJCEyECDmZnUk1cjM5ZJV9vJ@mail.gmail.com
Whole thread Raw
In response to Re: Configuring synchronous replication  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Configuring synchronous replication
List pgsql-hackers
On Fri, Sep 24, 2010 at 6:37 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> > Earlier you argued that centralizing parameters would make this nice and
>> > simple. Now you're pointing out that we aren't centralizing this at all,
>> > and it won't be simple. We'll have to have a standby.conf set up that is
>> > customised in advance for each standby that might become a master. Plus
>> > we may even need multiple standby.confs in case that we have multiple
>> > nodes down. This is exactly what I was seeking to avoid and exactly what
>> > I meant when I asked for an analysis of the failure modes.
>>
>> If you're operating on the notion that no reconfiguration will be
>> necessary when nodes go down, then we have very different notions of
>> what is realistic.  I think that "copy the new standby.conf file in
>> place" is going to be the least of the fine admin's problems.
>
> Earlier you argued that setting parameters on each standby was difficult
> and we should centralize things on the master. Now you tell us that
> actually we do need lots of settings on each standby and that to think
> otherwise is not realistic. That's a contradiction.

You've repeatedly accused me and others of contradicting ourselves.  I
don't think that's helpful in advancing the debate, and I don't think
it's what I'm doing.

The point I'm trying to make is that when failover happens, lots of
reconfiguration is going to be needed.  There is just no getting
around that.  Let's ignore synchronous replication entirely for a
moment.  You're running 9.0 and you have 10 slaves.  The master dies.
You promote a slave.  Guess what?  You need to look at each slave you
didn't promote and adjust primary_conninfo.  You also need to check
whether the slave has received an xlog record with a higher LSN than
the one you promoted.  If it has, you need to take a new base backup.
Otherwise, you may have data corruption - very possibly silent data
corruption.

Do you dispute this?  If so, on which point?

The reason I think that we should centralize parameters on the master
is because they affect *the behavior of the master*.  Controlling
whether the master will wait for the slave on the slave strikes me
(and others) as spooky action at a distance.  Configuring whether the
master will retain WAL for a disconnected slave on the slave is
outright byzantine.  Of course, configuring these parameters on the
master means that when the master changes, you're going to need a
configuration (possibly the same, possibly different) for said
parameters on the new master.  But since you may be doing a lot of
other adjustment at that point anyway (e.g. new base backups, changes
in the set of synchronous slaves) that doesn't seem like a big deal.

> The chain of argument used to support this as being a sensible design choice is broken or contradictory in more than
one
> place. I think we should be looking for a design using the KISS principle, while retaining sensible tuning options.

The KISS principle is exactly what I am attempting to apply.
Configuring parameters that affect the master on some machine other
than the master isn't KISS, to me.  You may find that broken or
contradictory, but I disagree.  I am attempting to disagree
respectfully, but statements like the above make me feel like you're
flaming, and that's getting under my skin.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: Re: Enable logging requires restart
Next
From: Robert Haas
Date:
Subject: Re: Enable logging requires restart