On Wed, 2008-09-10 at 11:07 +0200, Dimitri Fontaine wrote:
> Hi,
>
> Le mercredi 10 septembre 2008, Heikki Linnakangas a écrit :
> > Sure. That's the fundamental problem with synchronous replication.
> > That's why many people choose asynchronous replication instead. Clearly
> > at some point you'll want to give up and continue without the slave, or
> > kill the master and fail over to the slave. I'm wondering how that's
> > different than the lag between master and server in asynchronous
> > replication from the client's point of view.
>
> As a future user of this new facilities, the difference from client's POV is
> simple : in normal mode of operation, we want a strong guarantee that any
> COMMIT has made it to both the master and the slave at commit time. No lag
> whatsoever.
Agreed.
> You're considering lag as an option in case of failure, but I don't see this
> as acceptable when you need sync commit. In case of network timeout, cluster
> is down. So you want to either continue servicing in degraged mode or get the
> service down while you repair the cluster, but neither of those choice can be
> transparent to the admins, I'd argue.
>
> Of course, main use case is high availability, which tends to say you do not
> have the option to stop service,
We have a number of choices, at the point of failure:
* Does the whole primary server stay up (probably)?
* Do we continue to allow new transactions in degraded mode? (which
increases the risk of transaction loss if we continue at that time).
(The answer sounds like it will be "of course, stupid" but this cluster
may be part of an even higher level HA mechanism, so the answer isn't
always clear).
* For each transaction that is trying to commit: do we want to wait
forever? If not, how long? If we stop waiting, do we throw ERROR, or do
we say, lets get on with another transaction.
If the server is up, yet all connections in a session pool are stuck
waiting for their last commits to complete then most sysadmins would
agree that the server is actually "down". Since no useful work is
happening, or can be initiated - even read only. We don't need to
address that issue in the same way for all transactions, is all I'm
saying.
-- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support