Re: Synchronization levels in SR - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Synchronization levels in SR
Date
Msg-id 1274804936.6203.2110.camel@ebony
Whole thread Raw
In response to Synchronization levels in SR  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Synchronization levels in SR
Confused about the buffer pool size
Re: Synchronization levels in SR
List pgsql-hackers
On Mon, 2010-05-24 at 22:20 +0900, Fujii Masao wrote:

> Second, we need to discuss about how to specify the synch
> level. There are three approaches:
> 
> * Per standby
>   Since the purpose, location and H/W resource often differ
>   from one standby to another, specifying level per standby
>   (i.e., we set the level in recovery.conf) is a
>   straightforward approach, I think. For example, we can
>   choose #3 for high-availability standby near the master,
>   and choose #1 (async) for the disaster recovery standby
>   remote.
> 
> * Per transaction
>   Define the PGC_USERSET option specifying the level and
>   specify it on the master in response to the purpose of
>   transaction. In this approach, for example, we can choose
>   #4 for the transaction which should be visible on the
>   standby as soon as a "success" of the commit has been
>   returned to a client. We can also choose #1 for
>   time-critical but not mission-critical transaction.
> 
> * Mix
>   Allow users to specify the level per standby and
>   transaction at the same time, and then calculate the real
>   level from them by using some algorithm.
> 
> Which should we adopt for 9.1? I'd like to implement the
> "per-standby" approach at first since it's simple and seems
> to cover more use cases. Thought?

-1

Synchronous replication implies that a commit should wait. This wait is
experienced by the transaction, not by other parts of the system. If we
define robustness at the standby level then robustness depends upon
unseen administrators, as well as the current up/down state of standbys.
This is action-at-a-distance in its worst form. 

Imagine having 2 standbys, 1 synch, 1 async. If the synch server goes
down, performance will improve and robustness will have been lost. What
good would that be?

Imagine a standby connected over a long distance. DBA brings up standby
in synch mode accidentally and the primary server hits massive
performance problems without any way of directly controlling this.

The worst aspect of standby-level controls is that nobody ever knows how
safe a transaction is. There is no definition or test for us to check
exactly how safe any particular transaction is. Also, the lack of safety
occurs at the time when you least want it - when one of your servers is
already down.

So I call "per-standby" settings simple, and broken in multiple ways.

Putting the control in the hands of the transaction owner (i.e. on the
master) is exactly where the control should be. I personally like the
idea of that being a USERSET, though could live with system wide
settings if need be. But the control must be on the *master* not on the
standbys.

The best parameter we can specify is the number of servers that we wish
to wait for confirmation from. That is a definition that easily manages
the complexity of having various servers up/down at any one time. It
also survives misconfiguration more easily, as well as providing a
workaround if replicating across a bursty network where we can't
guarantee response times, even of the typical response time is good.

(We've discussed this many times before over a period of years and not
really sure why we have to re-discuss this repeatedly just because
people disagree. You don't mention the earlier discussions, not sure
why. If we want to follow the community process, then all previous
discussions need to be taken into account, unless things have changed -
which they haven't: same topic, same people, AFAICS.)


-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Synchronization levels in SR
Next
From: Simon Riggs
Date:
Subject: Re: Synchronization levels in SR