Home > mailing lists

Re: Configuring synchronous replication - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Configuring synchronous replication
Date	September 17, 2010 07:41:36
Msg-id	4C9345CF.8000708@enterprisedb.com Whole thread Raw
In response to	Re: Configuring synchronous replication (Simon Riggs <simon@2ndQuadrant.com>)
Responses	Re: Configuring synchronous replication Re: Configuring synchronous replication Re: Configuring synchronous replication Re: Configuring synchronous replication
List	pgsql-hackers

Tree view

On 17/09/10 12:49, Simon Riggs wrote:
> This isn't just about UI, there are significant and important
> differences between the proposals in terms of the capability and control
> they offer.

Sure. The point of focusing on the UI is that the UI demonstrates what 
capability and control a proposal offers.

>> So what should the user interface be like? Given the 1st and 2nd
>> requirement, we need standby registration. If some standbys are
>> important and others are not, the master needs to distinguish between
>> them to be able to determine that a transaction is safely delivered to
>> the important standbys.
>
> My patch provides those two requirements without standby registration,
> so we very clearly don't "need" standby registration.

It's still not clear to me how you would configure things like "wait for 
ack from reporting slave, but not other slaves" or "wait until replayed 
in the server on the west coast" in your proposal. Maybe it's possible, 
but doesn't seem very intuitive, requiring careful configuration in both 
the master and the slaves.

In your proposal, you also need to be careful not to connect e.g a test 
slave with "synchronous_replication_service = apply" to the master, or 
it will possible shadow a real production slave, acknowledging 
transactions that are not yet received by the real slave. It's certainly 
possible to screw up with standby registration too, but you have more 
direct control of the master behavior in the master, instead of 
distributing it across all slaves.

> The question is do we want standby registration on master and if so,
> why?

Well, aside from how to configure synchronous replication, standby 
registration would help with retaining the right amount of WAL in the 
master. wal_keep_segments doesn't guarantee that enough is retained, and 
OTOH when all standbys are connected you retain much more than might be 
required.

Giving names to slaves also allows you to view their status in the 
master in a more intuitive format. Something like:

postgres=# SELECT * FROM pg_slave_status ;    name    | connected |  received  |   fsyncd   |  applied
------------+-----------+------------+------------+------------ reporting  | t         | 0/26000020 | 0/26000020 |
0/25550020ha-standby | t         | 0/26000020 | 0/26000020 | 0/26000020 testserver | f         |            |
0/15000020|

(3 rows)

>> For the control between async/recv/fsync/replay, I like to think in
>> terms of
>> a) asynchronous vs synchronous
>> b) if it's synchronous, how synchronous is it? recv, fsync or replay?
>>
>> I think it makes most sense to set sync vs. async in the master, and the
>> level of synchronicity in the slave. Although I have sympathy for the
>> argument that it's simpler if you configure it all from the master side
>> as well.
>
> I have catered for such requests by suggesting a plugin that allows you
> to implement that complexity without overburdening the core code.

Well, plugins are certainly one possibility, but then we need to design 
the plugin API. I've been thinking along the lines of a proxy, which can 
implement whatever logic you want to decide when to send the 
acknowledgment. With a proxy as well, if we push any features people 
that want to a proxy or plugin, we need to make sure that the 
proxy/plugin has all the necessary information available.

> This strikes me as an "ad absurdum" argument. Since the above
> over-complexity would doubtless be seen as insane by Tom et al, it
> attempts to persuade that we don't need recv, fsync and apply either.
>
> Fujii has long talked about 4 levels of service also. Why change? I had
> thought that part was pretty much agreed between all of us.

Now you lost me. I agree that we need 4 levels of service (at least 
ultimately, not necessarily in the first phase).

> Without performance tests to demonstrate "why", these do sound hard to
> understand. But we should note that DRBD offers recv ("B") and fsync
> ("C") as separate options. And Oracle implements all 3 of recv, fsync
> and apply. Neither of them describe those options so simply and easily
> as the way we are proposing with a 4 valued enum (with async as the
> fourth option).
>
> If we have only one option for sync_rep = 'on' which of recv | fsync |
> apply would it implement? You don't mention that. Which do you choose?

You would choose between recv, fsync and apply in the slave, with a GUC.

> I no longer seek to persuade by words alone. The existence of my patch
> means that I think that only measurements and tests will show why I have
> been saying these things. We need performance tests.

I don't expect any meaningful differences in terms of performance 
between any of the discussed options. The big question right now is what 
features we provide and how they're configured. Performance will depend 
primarily on the mode you use, and secondarily on the implementation of 
the mode. It would be completely premature to do performance testing yet 
IMHO.

>> Putting all of that together. I think Fujii-san's standby.conf is pretty
>> close.
>
>> What it needs is the additional GUC for transaction-level control.
>
> The difference between the patches is not a simple matter of a GUC.
>
> My proposal allows a single standby to provide efficient replies to
> multiple requested durability levels all at the same time. With
> efficient use of network resources. ISTM that because the other patch
> cannot provide that you'd like to persuade us that we don't need that,
> ever. You won't sell me on that point, cos I can see lots of uses for
> it.

Simon, how the replies are sent is an implementation detail I haven't 
given much thought yet. The reason we delved into that discussion 
earlier was that you seemed to contradict yourself with the claims that 
you don't need to send more than one reply per transaction, and that the 
standby doesn't need to know the synchronization level. Other than that 
the curiosity about that contradiction, it doesn't seem like a very 
interesting detail to me right now. It's not a question that drives the 
rest of the design, but the other way round.

But FWIW, something like your proposal of sending 3 XLogRecPtrs in each 
reply seems like a good approach. I'm not sure about using walwriter. I 
can see that it helps with getting the 'recv' and 'replay' 
acknowledgments out faster, but I still have the scars from starting 
bgwriter during recovery.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Simon Riggs
Date: 17 September 2010, 07:03:24
Subject: Re: Configuring synchronous replication

From: SAKAMOTO Masahiko
Date: 17 September 2010, 07:47:49
Subject: Re: patch: SQL/MED(FDW) DDL

Re: Configuring synchronous replication - Mailing list pgsql-hackers

Previous

Next