Thread: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

On Mon, Mar 7, 2011 at 5:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Mar 7, 2011 at 7:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Efficient transaction-controlled synchronous replication.
>> If a standby is broadcasting reply messages and we have named
>> one or more standbys in synchronous_standby_names then allow
>> users who set synchronous_replication to wait for commit, which
>> then provides strict data integrity guarantees. Design avoids
>> sending and receiving transaction state information so minimises
>> bookkeeping overheads. We synchronize with the highest priority
>> standby that is connected and ready to synchronize. Other standbys
>> can be defined to takeover in case of standby failure.
>>
>> This version has very strict behaviour; more relaxed options
>> may be added at a later date.
>
> Pretty cool! I'd appreciate very much your efforts and contributions.

Here are another comments:

    if ((wrote_xlog && XactSyncCommit) || forceSyncCommit || nrels > 0 ||
SyncRepRequested())

Whenever synchronous_replication is TRUE, we disable synchronous_commit.
But, before disabling that, we should check also max_wal_senders and
synchronous_standby_names? Otherwise, synchronous_commit can
be disabled unexpectedly even in non replication case.

-            /* Let the master know that we received some data. */
-            XLogWalRcvSendReply();
-            XLogWalRcvSendHSFeedback();

This change completely eliminates the difference between write_location
and flush_location in pg_stat_replication. If this change is reasoable, we
should get rid of write_location from pg_stat_replication since it's useless.
If not, this change should be reverted. I'm not sure whether monitoring
the difference between write and flush locations is useful. But I guess that
someone thought so and that code was added.

+    /*
+     * Current location of the head of the queue. All waiters should have
+     * a waitLSN that follows this value, or they are currently being woken
+     * to remove themselves from the queue. Protected by SyncRepLock.
+     */
+    XLogRecPtr    lsn;

The comment ", or they are currently being woken to remove themselves
from the queue" is no longer required because the proc is currently removed
by walsender.

I found some typos. The attached patch fixes them.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment
On Mon, Mar 7, 2011 at 6:21 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Mar 7, 2011 at 5:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Mon, Mar 7, 2011 at 7:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Efficient transaction-controlled synchronous replication.
>>> If a standby is broadcasting reply messages and we have named
>>> one or more standbys in synchronous_standby_names then allow
>>> users who set synchronous_replication to wait for commit, which
>>> then provides strict data integrity guarantees. Design avoids
>>> sending and receiving transaction state information so minimises
>>> bookkeeping overheads. We synchronize with the highest priority
>>> standby that is connected and ready to synchronize. Other standbys
>>> can be defined to takeover in case of standby failure.
>>>
>>> This version has very strict behaviour; more relaxed options
>>> may be added at a later date.
>>
>> Pretty cool! I'd appreciate very much your efforts and contributions.
>
> Here are another comments:
>
>        if ((wrote_xlog && XactSyncCommit) || forceSyncCommit || nrels > 0 ||
> SyncRepRequested())
>
> Whenever synchronous_replication is TRUE, we disable synchronous_commit.
> But, before disabling that, we should check also max_wal_senders and
> synchronous_standby_names? Otherwise, synchronous_commit can
> be disabled unexpectedly even in non replication case.

Yeah, that's bad.  At the risk of repeating myself, I don't think this
code should be checking SyncRepRequested() in the first place.  If the
user has turned off synchronous_commit, then we should just commit
asynchronously, even if sync rep is otherwise in force.  Otherwise,
this if statement is going to get really complicated.   The logic is
already at least mildly wrong here anyway: clearly we do NOT need to
commit synchronously if the transaction has not written xlog, even if
sync rep is enabled.

> -                       /* Let the master know that we received some data. */
> -                       XLogWalRcvSendReply();
> -                       XLogWalRcvSendHSFeedback();
>
> This change completely eliminates the difference between write_location
> and flush_location in pg_stat_replication. If this change is reasoable, we
> should get rid of write_location from pg_stat_replication since it's useless.
> If not, this change should be reverted. I'm not sure whether monitoring
> the difference between write and flush locations is useful. But I guess that
> someone thought so and that code was added.

I could go either way on this but clearly we need to do one or the other.

> +       /*
> +        * Current location of the head of the queue. All waiters should have
> +        * a waitLSN that follows this value, or they are currently being woken
> +        * to remove themselves from the queue. Protected by SyncRepLock.
> +        */
> +       XLogRecPtr      lsn;
>
> The comment ", or they are currently being woken to remove themselves
> from the queue" is no longer required because the proc is currently removed
> by walsender.

Fixed.

> I found some typos. The attached patch fixes them.

Committed with minor changes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Fri, Mar 11, 2011 at 5:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>        if ((wrote_xlog && XactSyncCommit) || forceSyncCommit || nrels > 0 ||
>> SyncRepRequested())
>>
>> Whenever synchronous_replication is TRUE, we disable synchronous_commit.
>> But, before disabling that, we should check also max_wal_senders and
>> synchronous_standby_names? Otherwise, synchronous_commit can
>> be disabled unexpectedly even in non replication case.
>
> Yeah, that's bad.  At the risk of repeating myself, I don't think this
> code should be checking SyncRepRequested() in the first place.  If the
> user has turned off synchronous_commit, then we should just commit
> asynchronously, even if sync rep is otherwise in force.  Otherwise,
> this if statement is going to get really complicated.   The logic is
> already at least mildly wrong here anyway: clearly we do NOT need to
> commit synchronously if the transaction has not written xlog, even if
> sync rep is enabled.

Yeah, not to wait for replication when synchronous_commit is disabled
seems to be more reasonable.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Fri, Mar 11, 2011 at 5:46 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Mar 11, 2011 at 5:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>        if ((wrote_xlog && XactSyncCommit) || forceSyncCommit || nrels > 0 ||
>>> SyncRepRequested())
>>>
>>> Whenever synchronous_replication is TRUE, we disable synchronous_commit.
>>> But, before disabling that, we should check also max_wal_senders and
>>> synchronous_standby_names? Otherwise, synchronous_commit can
>>> be disabled unexpectedly even in non replication case.
>>
>> Yeah, that's bad.  At the risk of repeating myself, I don't think this
>> code should be checking SyncRepRequested() in the first place.  If the
>> user has turned off synchronous_commit, then we should just commit
>> asynchronously, even if sync rep is otherwise in force.  Otherwise,
>> this if statement is going to get really complicated.   The logic is
>> already at least mildly wrong here anyway: clearly we do NOT need to
>> commit synchronously if the transaction has not written xlog, even if
>> sync rep is enabled.
>
> Yeah, not to wait for replication when synchronous_commit is disabled
> seems to be more reasonable.

On further review, I've changed my mind.  Making synchronous_commit
trump synchronous_replication is appealing conceptually, but it's
going to lead to some weird corner cases.  For example, a transaction
that drops a non-temporary relation always commits synchronously; and
2PC also ignores synchronous_commit.  In the case where
synchronous_commit=off and synchronous_replication=on, we'd either
have to decide that these sorts of transactions aren't going to
replicate synchronously (which would give synchronous_commit a rather
long reach into areas it doesn't currently touch) or else that it's OK
for CREATE TABLE foo () to be totally asynchronous but that DROP TABLE
foo requires sync commit AND sync rep.  That's pretty weird.

What makes more sense to me after having thought about this more
carefully is to simply make a blanket rule that when
synchronous_replication=on, synchronous_commit has no effect.  That is
easy to understand and document.  I'm inclined to think it's OK to let
synchronous_replication have this effect even if max_wal_senders=0 or
synchronous_standby_names=''; you shouldn't turn
synchronous_replication on just for kicks, and I don't think we want
to complicate the test in RecordTransactionCommit() more than
necessary.  We should, however, adjust the logic so that a transaction
which has not written WAL can still commit asynchronously, because
such a transaction has only touched temp or unlogged tables and so
it's not important for it to make it to the standby, where that data
doesn't exist anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Thu, Mar 10, 2011 at 3:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> -                       /* Let the master know that we received some data. */
>> -                       XLogWalRcvSendReply();
>> -                       XLogWalRcvSendHSFeedback();
>>
>> This change completely eliminates the difference between write_location
>> and flush_location in pg_stat_replication. If this change is reasoable, we
>> should get rid of write_location from pg_stat_replication since it's useless.
>> If not, this change should be reverted. I'm not sure whether monitoring
>> the difference between write and flush locations is useful. But I guess that
>> someone thought so and that code was added.
>
> I could go either way on this but clearly we need to do one or the other.

I'm not really sure why this was part of the synchronous replication
patch, but after mulling it over I think it's probably right to rip
out write_location completely.  There shouldn't ordinarily be much of
a gap between write location and flush location, so it's probably not
worth the extra network overhead to keep track of it.  We might need
to re-add some form of this in the future if we have a version of
synchronous replication that only waits for confirmation of receipt
rather than for confirmation of flush, but we don't have that in 9.1,
so why bother?

Barring objections, I'll go do that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Thu, 2011-03-17 at 13:46 -0400, Robert Haas wrote:
> On Fri, Mar 11, 2011 at 5:46 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > On Fri, Mar 11, 2011 at 5:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >>>        if ((wrote_xlog && XactSyncCommit) || forceSyncCommit || nrels > 0 ||
> >>> SyncRepRequested())
> >>>
> >>> Whenever synchronous_replication is TRUE, we disable synchronous_commit.
> >>> But, before disabling that, we should check also max_wal_senders and
> >>> synchronous_standby_names? Otherwise, synchronous_commit can
> >>> be disabled unexpectedly even in non replication case.
> >>
> >> Yeah, that's bad.  At the risk of repeating myself, I don't think this
> >> code should be checking SyncRepRequested() in the first place.  If the
> >> user has turned off synchronous_commit, then we should just commit
> >> asynchronously, even if sync rep is otherwise in force.  Otherwise,
> >> this if statement is going to get really complicated.   The logic is
> >> already at least mildly wrong here anyway: clearly we do NOT need to
> >> commit synchronously if the transaction has not written xlog, even if
> >> sync rep is enabled.
> >
> > Yeah, not to wait for replication when synchronous_commit is disabled
> > seems to be more reasonable.
> 
> On further review, I've changed my mind.  Making synchronous_commit
> trump synchronous_replication is appealing conceptually, but it's
> going to lead to some weird corner cases.  For example, a transaction
> that drops a non-temporary relation always commits synchronously; and
> 2PC also ignores synchronous_commit.  In the case where
> synchronous_commit=off and synchronous_replication=on, we'd either
> have to decide that these sorts of transactions aren't going to
> replicate synchronously (which would give synchronous_commit a rather
> long reach into areas it doesn't currently touch) or else that it's OK
> for CREATE TABLE foo () to be totally asynchronous but that DROP TABLE
> foo requires sync commit AND sync rep.  That's pretty weird.
> 
> What makes more sense to me after having thought about this more
> carefully is to simply make a blanket rule that when
> synchronous_replication=on, synchronous_commit has no effect.  That is
> easy to understand and document.  I'm inclined to think it's OK to let
> synchronous_replication have this effect even if max_wal_senders=0 or
> synchronous_standby_names=''; you shouldn't turn
> synchronous_replication on just for kicks, and I don't think we want
> to complicate the test in RecordTransactionCommit() more than
> necessary.  We should, however, adjust the logic so that a transaction
> which has not written WAL can still commit asynchronously, because
> such a transaction has only touched temp or unlogged tables and so
> it's not important for it to make it to the standby, where that data
> doesn't exist anyway.

Agree to that.

Not read your other stuff yet, will do that later.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



On Fri, Mar 18, 2011 at 2:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Mar 10, 2011 at 3:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> -                       /* Let the master know that we received some data. */
>>> -                       XLogWalRcvSendReply();
>>> -                       XLogWalRcvSendHSFeedback();
>>>
>>> This change completely eliminates the difference between write_location
>>> and flush_location in pg_stat_replication. If this change is reasoable, we
>>> should get rid of write_location from pg_stat_replication since it's useless.
>>> If not, this change should be reverted. I'm not sure whether monitoring
>>> the difference between write and flush locations is useful. But I guess that
>>> someone thought so and that code was added.
>>
>> I could go either way on this but clearly we need to do one or the other.
>
> I'm not really sure why this was part of the synchronous replication
> patch, but after mulling it over I think it's probably right to rip
> out write_location completely.  There shouldn't ordinarily be much of
> a gap between write location and flush location, so it's probably not
> worth the extra network overhead to keep track of it.  We might need
> to re-add some form of this in the future if we have a version of
> synchronous replication that only waits for confirmation of receipt
> rather than for confirmation of flush, but we don't have that in 9.1,
> so why bother?
>
> Barring objections, I'll go do that.

I agree to get rid of write_location.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Fri, Mar 18, 2011 at 2:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On further review, I've changed my mind.  Making synchronous_commit
> trump synchronous_replication is appealing conceptually, but it's
> going to lead to some weird corner cases.  For example, a transaction
> that drops a non-temporary relation always commits synchronously; and
> 2PC also ignores synchronous_commit.  In the case where
> synchronous_commit=off and synchronous_replication=on, we'd either
> have to decide that these sorts of transactions aren't going to
> replicate synchronously (which would give synchronous_commit a rather
> long reach into areas it doesn't currently touch) or else that it's OK
> for CREATE TABLE foo () to be totally asynchronous but that DROP TABLE
> foo requires sync commit AND sync rep.  That's pretty weird.
>
> What makes more sense to me after having thought about this more
> carefully is to simply make a blanket rule that when
> synchronous_replication=on, synchronous_commit has no effect.  That is
> easy to understand and document.  I'm inclined to think it's OK to let
> synchronous_replication have this effect even if max_wal_senders=0 or
> synchronous_standby_names=''; you shouldn't turn
> synchronous_replication on just for kicks, and I don't think we want
> to complicate the test in RecordTransactionCommit() more than
> necessary.  We should, however, adjust the logic so that a transaction
> which has not written WAL can still commit asynchronously, because
> such a transaction has only touched temp or unlogged tables and so
> it's not important for it to make it to the standby, where that data
> doesn't exist anyway.

In the first place, I think that it's complicated to keep those two parameters
separately. What about merging them to one parameter? What I'm thinking
is to remove synchronous_replication and to increase the valid values of
synchronous_commit from on/off to async/local/remote/both. Each value
works as follows.
   async   = (synchronous_commit = off && synchronous_replication = off)   "async" makes a transaction do local WAL
flushand replication 
asynchronously.
   local     = (synchronous_commit = on && synchronous_replication = off)   "local" makes a transaction wait for only
localWAL flush. 
   remote = (synchronous_commit = off && synchronous_replication = on)   "remote" makes a transaction wait for only
replication.Local WAL flush is   performed by walwriter. This is useless in 9.1 because we always must   wait for local
WALflush when we wait for replication. But in the future,   if we'll be able to send WAL before WAL write (i.e., send
WALfrom   wal_buffers), this might become useful. In 9.1, it seems reasonable to   remove this value. 
   both     = (synchronous_commit = on && synchronous_replication = on)   "both" makes a transaction wait for local WAL
flushand replication. 

Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Fri, 2011-03-18 at 14:45 +0900, Fujii Masao wrote:
> On Fri, Mar 18, 2011 at 2:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Thu, Mar 10, 2011 at 3:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >>> -                       /* Let the master know that we received some data. */
> >>> -                       XLogWalRcvSendReply();
> >>> -                       XLogWalRcvSendHSFeedback();
> >>>
> >>> This change completely eliminates the difference between write_location
> >>> and flush_location in pg_stat_replication. If this change is reasoable, we
> >>> should get rid of write_location from pg_stat_replication since it's useless.
> >>> If not, this change should be reverted. I'm not sure whether monitoring
> >>> the difference between write and flush locations is useful. But I guess that
> >>> someone thought so and that code was added.
> >>
> >> I could go either way on this but clearly we need to do one or the other.
> >
> > I'm not really sure why this was part of the synchronous replication
> > patch, but after mulling it over I think it's probably right to rip
> > out write_location completely.  There shouldn't ordinarily be much of
> > a gap between write location and flush location, so it's probably not
> > worth the extra network overhead to keep track of it.  We might need
> > to re-add some form of this in the future if we have a version of
> > synchronous replication that only waits for confirmation of receipt
> > rather than for confirmation of flush, but we don't have that in 9.1,
> > so why bother?
> >
> > Barring objections, I'll go do that.
> 
> I agree to get rid of write_location.

No, don't remove it.

We seem to be just looking for things to tweak without any purpose.
Removing this adds nothing for us.

We will have the column in the future, it is there now, so leave it.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



On Fri, Mar 18, 2011 at 3:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I agree to get rid of write_location.
>
> No, don't remove it.
>
> We seem to be just looking for things to tweak without any purpose.
> Removing this adds nothing for us.
>
> We will have the column in the future, it is there now, so leave it.

Well then can we revert the part of your patch that causes it to not
actually work any more?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Fri, Mar 18, 2011 at 2:25 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> In the first place, I think that it's complicated to keep those two parameters
> separately. What about merging them to one parameter? What I'm thinking
> is to remove synchronous_replication and to increase the valid values of
> synchronous_commit from on/off to async/local/remote/both. Each value
> works as follows.
>
>    async   = (synchronous_commit = off && synchronous_replication = off)
>    "async" makes a transaction do local WAL flush and replication
> asynchronously.
>
>    local     = (synchronous_commit = on && synchronous_replication = off)
>    "local" makes a transaction wait for only local WAL flush.
>
>    remote = (synchronous_commit = off && synchronous_replication = on)
>    "remote" makes a transaction wait for only replication. Local WAL flush is
>    performed by walwriter. This is useless in 9.1 because we always must
>    wait for local WAL flush when we wait for replication. But in the future,
>    if we'll be able to send WAL before WAL write (i.e., send WAL from
>    wal_buffers), this might become useful. In 9.1, it seems reasonable to
>    remove this value.
>
>    both     = (synchronous_commit = on && synchronous_replication = on)
>    "both" makes a transaction wait for local WAL flush and replication.
>
> Thought?

Well, if we want to make this all use one parameter, the obvious way
to do it that wouldn't break backward compatibility is to remove the
synchronous_replication parameter altogether and let
synchronous_commit take on the values on/local/off, where on means
wait for sync rep if it's enabled (i.e.
synchronous_standby_names!=''&&max_wal_senders>0) or otherwise just
wait for local WAL flush, local means just wait for local WAL flush,
and off means commit asynchronously.

I'm OK with doing it that way if there's consensus on it, but I'm not
eager to break backward compatibility.  Simon/Heikki, any opinion on
that approach?

If we don't have consensus on that then I think we should just do what
I proposed above (and Simon agreed to).  I am not eager to spend any
longer than necessary hammering this out; I want to get to beta.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Thu, Mar 17, 2011 at 5:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> What makes more sense to me after having thought about this more
> carefully is to simply make a blanket rule that when
> synchronous_replication=on, synchronous_commit has no effect.  That is
> easy to understand and document.

For what it's worth "has no effect" doesn't make much sense to me.
It's a boolean, either commits are going to block or they're not.

What happened to the idea of a three-way switch?

synchronous_commit = off
synchronous_commit = disk
synchronous_commit = replica

With "on" being a synonym for "disk" for backwards compatibility.

Then we could add more options later for more complex conditions like
waiting for one server in each data centre or waiting for one of a
certain set of servers ignoring the less reliable mirrors, etc.

--
greg


On Fri, Mar 18, 2011 at 10:55 AM, Greg Stark <gsstark@mit.edu> wrote:
> On Thu, Mar 17, 2011 at 5:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> What makes more sense to me after having thought about this more
>> carefully is to simply make a blanket rule that when
>> synchronous_replication=on, synchronous_commit has no effect.  That is
>> easy to understand and document.
>
> For what it's worth "has no effect" doesn't make much sense to me.
> It's a boolean, either commits are going to block or they're not.
>
> What happened to the idea of a three-way switch?
>
> synchronous_commit = off
> synchronous_commit = disk
> synchronous_commit = replica
>
> With "on" being a synonym for "disk" for backwards compatibility.
>
> Then we could add more options later for more complex conditions like
> waiting for one server in each data centre or waiting for one of a
> certain set of servers ignoring the less reliable mirrors, etc.

This is similar to what I suggested upthread, except that I suggested
on/local/off, with the default being on.  That way if you set
synchronous_standby_names, you get synchronous replication without
changing another setting, but you can say local instead if for some
reason you want the middle behavior.  If we're going to do it all with
one GUC, I think that way makes more sense.  If you're running sync
rep, you might still have some transactions that you don't care about,
but that's what async commit is for.  It's a funny kind of transaction
that we're OK with losing if we have a failover but we're not OK with
losing if we have a local crash from which we recover without failing
over.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Fri, 2011-03-18 at 11:07 -0400, Robert Haas wrote:
> On Fri, Mar 18, 2011 at 10:55 AM, Greg Stark <gsstark@mit.edu> wrote:
> > On Thu, Mar 17, 2011 at 5:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> What makes more sense to me after having thought about this more
> >> carefully is to simply make a blanket rule that when
> >> synchronous_replication=on, synchronous_commit has no effect.  That is
> >> easy to understand and document.
> >
> > For what it's worth "has no effect" doesn't make much sense to me.
> > It's a boolean, either commits are going to block or they're not.
> >
> > What happened to the idea of a three-way switch?
> >
> > synchronous_commit = off
> > synchronous_commit = disk
> > synchronous_commit = replica
> >
> > With "on" being a synonym for "disk" for backwards compatibility.
> >
> > Then we could add more options later for more complex conditions like
> > waiting for one server in each data centre or waiting for one of a
> > certain set of servers ignoring the less reliable mirrors, etc.
> 
> This is similar to what I suggested upthread, except that I suggested
> on/local/off, with the default being on.  That way if you set
> synchronous_standby_names, you get synchronous replication without
> changing another setting, but you can say local instead if for some
> reason you want the middle behavior.  If we're going to do it all with
> one GUC, I think that way makes more sense.  If you're running sync
> rep, you might still have some transactions that you don't care about,
> but that's what async commit is for.  It's a funny kind of transaction
> that we're OK with losing if we have a failover but we're not OK with
> losing if we have a local crash from which we recover without failing
> over.

I much prefer a single switch, which is what I originally suggested.
Changing the meaning of synchronous_commit seems a problem.

durability = localmemory
durability = localdisk
(durability = remotereceive - has no meaning in current code)
durability = remotedisk
durability = remoteapply

it also allows us to have in the future

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



On Sat, Mar 19, 2011 at 12:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 18, 2011 at 10:55 AM, Greg Stark <gsstark@mit.edu> wrote:
>> On Thu, Mar 17, 2011 at 5:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> What makes more sense to me after having thought about this more
>>> carefully is to simply make a blanket rule that when
>>> synchronous_replication=on, synchronous_commit has no effect.  That is
>>> easy to understand and document.
>>
>> For what it's worth "has no effect" doesn't make much sense to me.
>> It's a boolean, either commits are going to block or they're not.
>>
>> What happened to the idea of a three-way switch?
>>
>> synchronous_commit = off
>> synchronous_commit = disk
>> synchronous_commit = replica
>>
>> With "on" being a synonym for "disk" for backwards compatibility.
>>
>> Then we could add more options later for more complex conditions like
>> waiting for one server in each data centre or waiting for one of a
>> certain set of servers ignoring the less reliable mirrors, etc.
>
> This is similar to what I suggested upthread, except that I suggested
> on/local/off, with the default being on.  That way if you set
> synchronous_standby_names, you get synchronous replication without
> changing another setting, but you can say local instead if for some
> reason you want the middle behavior.  If we're going to do it all with
> one GUC, I think that way makes more sense.  If you're running sync
> rep, you might still have some transactions that you don't care about,
> but that's what async commit is for.  It's a funny kind of transaction
> that we're OK with losing if we have a failover but we're not OK with
> losing if we have a local crash from which we recover without failing
> over.

I'm OK with this.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center