Thread: Sync Rep v17

Sync Rep v17

From

Simon Riggs

Date:

18 February 2011, 20:06:44

Well, good news all round.

v17 implements what I believe to be the final set of features for sync
rep. This one I'm actually fairly happy with. It can be enjoyed best at
DEBUG3.

The patch is very lite touch on a few areas of code, plus a chunk of
specific code, all on master-side. Pretty straight really. I'm sure
problems will be found, its not long since I completed this; thanks to
Daniel Farina for your help with patch assembly.

Which is just as well, because the other news is that I'm off on holiday
for a few days, which is most inconvenient. I won't be committing this
for at least a week and absent from the list. OTOH, I think its ready
for a final review and commit, so I'm OK if you commit or OK if you
leave it for me.

That's not the end of it. I can see a few things we could/should do in
this release, but this includes all the must-do things. Docs could do
with a little love also. So I expect work for me when I return.




Config Summary
==============

Most parameters are set on the primary. Set

  primary: synchronous_standby_names = 'node1, node2, node3'

which means that whichever of those standbys connect first will
become the main synchronous standby. Servers arriving later will
be potential standbys (standby standbys doesn't sound right...).
synchronous_standby_names can change at reload.

Currently, the standby_name is the application_name parameter
set in the primary_conninfo.

When we set this for a client, or in postgresql.conf

  primary: synchronous_replication = on

then we will wait at commit until the synchronous standby has
reached the WAL location of our commit point.

If the current synchronous standby dies then one of the other standbys
will take over. (I think it would be a great idea to make the
list a priority order, but I haven't had time to code that).

If none of the standbys are available, then we don't wait at all
if allow_standalone_primary is set.
allow_standalone_primary can change at reload.

Whatever happens, if you set sync_replication_timeout_client
then backends will give up waiting if some WALsender doesn't
wake them quickly enough.

You can generally leave these parameters at their default settings

  primary: sync_replication_timeout_client = 120s
  primary: allow_standalone_primary = on
  standby: wal_receiver_status_interval = 10s

--
 Simon Riggs           http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services

Attachment

sync_rep.v17.patch

Re: Sync Rep v17

From

Thom Brown

Date:

18 February 2011, 20:33:28

On 19 February 2011 00:06, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> Well, good news all round.
>
> v17 implements what I believe to be the final set of features for sync
> rep. This one I'm actually fairly happy with. It can be enjoyed best at
> DEBUG3.
>
> The patch is very lite touch on a few areas of code, plus a chunk of
> specific code, all on master-side. Pretty straight really. I'm sure
> problems will be found, its not long since I completed this; thanks to
> Daniel Farina for your help with patch assembly.
>
> Which is just as well, because the other news is that I'm off on holiday
> for a few days, which is most inconvenient. I won't be committing this
> for at least a week and absent from the list. OTOH, I think its ready
> for a final review and commit, so I'm OK if you commit or OK if you
> leave it for me.
>
> That's not the end of it. I can see a few things we could/should do in
> this release, but this includes all the must-do things. Docs could do
> with a little love also. So I expect work for me when I return.
>
>
>
>
> Config Summary
> ==============
>
> Most parameters are set on the primary. Set
>
>  primary: synchronous_standby_names = 'node1, node2, node3'
>
> which means that whichever of those standbys connect first will
> become the main synchronous standby. Servers arriving later will
> be potential standbys (standby standbys doesn't sound right...).
> synchronous_standby_names can change at reload.
>
> Currently, the standby_name is the application_name parameter
> set in the primary_conninfo.
>
> When we set this for a client, or in postgresql.conf
>
>  primary: synchronous_replication = on
>
> then we will wait at commit until the synchronous standby has
> reached the WAL location of our commit point.
>
> If the current synchronous standby dies then one of the other standbys
> will take over. (I think it would be a great idea to make the
> list a priority order, but I haven't had time to code that).
>
> If none of the standbys are available, then we don't wait at all
> if allow_standalone_primary is set.
> allow_standalone_primary can change at reload.
>
> Whatever happens, if you set sync_replication_timeout_client
> then backends will give up waiting if some WALsender doesn't
> wake them quickly enough.
>
> You can generally leave these parameters at their default settings
>
>  primary: sync_replication_timeout_client = 120s
>  primary: allow_standalone_primary = on
>  standby: wal_receiver_status_interval = 10s

I see the updated patch I provided last time to fix various errata and
spacing issues got lost in the last round of conversations.  Maybe
it's safer to provide a patch for the patch.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: Sync Rep v17

From

Simon Riggs

Date:

18 February 2011, 20:45:13

On Sat, 2011-02-19 at 00:32 +0000, Thom Brown wrote:

> I see the updated patch I provided last time to fix various errata and
> spacing issues got lost in the last round of conversations.  Maybe
> it's safer to provide a patch for the patch.

I'm sorry about that Thom, your changes were and are welcome. The docs
were assembled rather quickly about 2 hours ago and we'll definitely
need your care and attention to bring them to a good level of quality.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Alvaro Herrera

Date:

18 February 2011, 20:51:35

Excerpts from Thom Brown's message of vie feb 18 21:32:17 -0300 2011:

> I see the updated patch I provided last time to fix various errata and
> spacing issues got lost in the last round of conversations.  Maybe
> it's safer to provide a patch for the patch.

interdiff?

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Sync Rep v17

From

Robert Haas

Date:

18 February 2011, 21:45:57

On Fri, Feb 18, 2011 at 7:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The patch is very lite touch on a few areas of code, plus a chunk of
> specific code, all on master-side. Pretty straight really. I'm sure
> problems will be found, its not long since I completed this; thanks to
> Daniel Farina for your help with patch assembly.

This looks like it's in far better shape than the previous version.
Thanks to you and Dan for working on it.

As you have this coded, enabling synchronous_replication forces all
transactions to commit synchronously.  This means that, when
synchronous_replication=on, you get synchronous_commit=on behavior
even if you have synchronous_commit=off.  I'd prefer to see us go the
other way, and say that you can only get synchronous replication when
you're also getting synchronous commit.

Even if not, we should at least arrange the test so that we don't
force synchronous commit for transactions that haven't written XLOG.
That is, instead of:

((wrote_xlog && XactSyncCommit) || forceSyncCommit || nrels > 0 ||
SyncRepRequested())

...we should instead say:

((write_xlog && (XactSyncCommit || SyncRepRequested()) ||
forceSyncCommit || nrels > 0)

...but as I say I'd be inclined to instead keep the existing test, and
just skip SyncRepWaitForLSN() if we aren't committing synchronously.

I'm unsure of the value of sync_replication_timeout_client (which
incidentally is spelled replication_timeout_client in one place, and
elsewhere asynchornus -> asynchronous).  I think in practice if you
start hitting that timeout, your server will slow to such a crawl that
you might as well be down.  On the other hand, I see no particular
harm in leaving the option in there either, though I definitely think
the default should be changed to -1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sync Rep v17

From

Bruce Momjian

Date:

18 February 2011, 22:39:35

Simon Riggs wrote:
> Most parameters are set on the primary. Set
> 
>   primary: synchronous_standby_names = 'node1, node2, node3'
> 
> which means that whichever of those standbys connect first will
> become the main synchronous standby. Servers arriving later will
> be potential standbys (standby standbys doesn't sound right...).
> synchronous_standby_names can change at reload.
> 
> Currently, the standby_name is the application_name parameter
> set in the primary_conninfo.
> 
> When we set this for a client, or in postgresql.conf
> 
>   primary: synchronous_replication = on
> 
> then we will wait at commit until the synchronous standby has
> reached the WAL location of our commit point.
> 
> If the current synchronous standby dies then one of the other standbys
> will take over. (I think it would be a great idea to make the
> list a priority order, but I haven't had time to code that).
> 
> If none of the standbys are available, then we don't wait at all
> if allow_standalone_primary is set.
> allow_standalone_primary can change at reload.

Are we going to allow a command to be run when these things happen, like
archive_command does, or is that something for 9.2?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: Sync Rep v17

From

Simon Riggs

Date:

19 February 2011, 04:28:37

On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote:
> On Fri, Feb 18, 2011 at 7:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > The patch is very lite touch on a few areas of code, plus a chunk of
> > specific code, all on master-side. Pretty straight really. I'm sure
> > problems will be found, its not long since I completed this; thanks to
> > Daniel Farina for your help with patch assembly.
> 
> This looks like it's in far better shape than the previous version.
> Thanks to you and Dan for working on it.
> 
> As you have this coded, enabling synchronous_replication forces all
> transactions to commit synchronously.  This means that, when
> synchronous_replication=on, you get synchronous_commit=on behavior
> even if you have synchronous_commit=off.  I'd prefer to see us go the
> other way, and say that you can only get synchronous replication when
> you're also getting synchronous commit.

First, we should be clear to explain that you are referring to the fact
that the request synchronous_commit = off synchronous_replication = on
makes no sense in the way the replication system is currently designed,
even though it is a wish-list item to make it work in 9.2+

Since that combination makes no sense we need to decide how will we
react when it is requested. I think fail-safe is the best way. We should
make the default the safe way and allow performance options in
non-default paths.

Hence the above combination of options is taken in the patch to be the
same as synchronous_commit = on synchronous_replication = on

If you want performance, you can still get it with  synchronous_commit = off synchronous_replication = off
which does have meaning

So the way the patch is coded makes most sense for a safety feature. I
think it does need to be documented also.

Must fly now...

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

19 February 2011, 04:29:50

On Fri, 2011-02-18 at 21:39 -0500, Bruce Momjian wrote:
> Simon Riggs wrote:
> > Most parameters are set on the primary. Set
> > 
> >   primary: synchronous_standby_names = 'node1, node2, node3'
> > 
> > which means that whichever of those standbys connect first will
> > become the main synchronous standby. Servers arriving later will
> > be potential standbys (standby standbys doesn't sound right...).
> > synchronous_standby_names can change at reload.
> > 
> > Currently, the standby_name is the application_name parameter
> > set in the primary_conninfo.
> > 
> > When we set this for a client, or in postgresql.conf
> > 
> >   primary: synchronous_replication = on
> > 
> > then we will wait at commit until the synchronous standby has
> > reached the WAL location of our commit point.
> > 
> > If the current synchronous standby dies then one of the other standbys
> > will take over. (I think it would be a great idea to make the
> > list a priority order, but I haven't had time to code that).
> > 
> > If none of the standbys are available, then we don't wait at all
> > if allow_standalone_primary is set.
> > allow_standalone_primary can change at reload.
> 
> Are we going to allow a command to be run when these things happen, like
> archive_command does, or is that something for 9.2?

Not really sure which events you're speaking of, sorry. As I write, I
don't see a place or a reason to run a command, but that might change
with a longer explanation of what you're thinking. I wouldn't rule out
adding something like that in this release, but I'm not around for the
next week to add it.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

19 February 2011, 04:36:05

On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote:

> On the other hand, I see no particular
> harm in leaving the option in there either, though I definitely think
> the default should be changed to -1.

The default setting should be to *not* freeze up if you lose the
standby. That behaviour unexpectedly leads to an effective server down
situation, rather than 2 minutes of slow running. This is supposed to be
High Availability software so it will be bad for us if we allow that.

We've discussed this many times already. The people that wish to wait
forever have their wait-forever option, but lets not force it on
everyone.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Bruce Momjian

Date:

19 February 2011, 09:23:56

Simon Riggs wrote:
> On Fri, 2011-02-18 at 21:39 -0500, Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > Most parameters are set on the primary. Set
> > > 
> > >   primary: synchronous_standby_names = 'node1, node2, node3'
> > > 
> > > which means that whichever of those standbys connect first will
> > > become the main synchronous standby. Servers arriving later will
> > > be potential standbys (standby standbys doesn't sound right...).
> > > synchronous_standby_names can change at reload.
> > > 
> > > Currently, the standby_name is the application_name parameter
> > > set in the primary_conninfo.
> > > 
> > > When we set this for a client, or in postgresql.conf
> > > 
> > >   primary: synchronous_replication = on
> > > 
> > > then we will wait at commit until the synchronous standby has
> > > reached the WAL location of our commit point.
> > > 
> > > If the current synchronous standby dies then one of the other standbys
> > > will take over. (I think it would be a great idea to make the
> > > list a priority order, but I haven't had time to code that).
> > > 
> > > If none of the standbys are available, then we don't wait at all
> > > if allow_standalone_primary is set.
> > > allow_standalone_primary can change at reload.
> > 
> > Are we going to allow a command to be run when these things happen, like
> > archive_command does, or is that something for 9.2?
> 
> Not really sure which events you're speaking of, sorry. As I write, I
> don't see a place or a reason to run a command, but that might change
> with a longer explanation of what you're thinking. I wouldn't rule out
> adding something like that in this release, but I'm not around for the
> next week to add it.

Sorry.  I was thinking of allowing a command to alert an administrator
when we switch standby machines, or if we can't do synchronous standby
any longer.  I assume we put a message in the logs, and the admin could
have a script that monitors that, but I thought a pager-like command
would be helpful, though I don't think we do that in any other case, so
maybe it is overkill.  These are all optional ideas.

Also, I like that you are using application name here.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: Sync Rep v17

From

Josh Berkus

Date:

19 February 2011, 16:09:47

On 2/18/11 4:06 PM, Simon Riggs wrote:
> v17 implements what I believe to be the final set of features for sync
> rep. This one I'm actually fairly happy with. It can be enjoyed best at
> DEBUG3.

Yes, but what wine do I serve with it?  ;-)


--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com

Re: Sync Rep v17

From

Marti Raudsepp

Date:

19 February 2011, 17:06:47

On Sat, Feb 19, 2011 at 15:23, Bruce Momjian <bruce@momjian.us> wrote:
> Sorry.  I was thinking of allowing a command to alert an administrator
> when we switch standby machines, or if we can't do synchronous standby
> any longer.  I assume we put a message in the logs, and the admin could
> have a script that monitors that, but I thought a pager-like command
> would be helpful

For asynchronous notifications we already have LISTEN/NOTIFY. Perhaps
that could be used?

Regards,
Marti

Re: Sync Rep v17

From

Bruce Momjian

Date:

19 February 2011, 19:05:21

Marti Raudsepp wrote:
> On Sat, Feb 19, 2011 at 15:23, Bruce Momjian <bruce@momjian.us> wrote:
> > Sorry. ?I was thinking of allowing a command to alert an administrator
> > when we switch standby machines, or if we can't do synchronous standby
> > any longer. ?I assume we put a message in the logs, and the admin could
> > have a script that monitors that, but I thought a pager-like command
> > would be helpful
> 
> For asynchronous notifications we already have LISTEN/NOTIFY. Perhaps
> that could be used?

Well, those are going to require work to notify someone externally, like
send an email alert.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: Sync Rep v17

From

Jaime Casanova

Date:

19 February 2011, 19:34:09

On Sat, Feb 19, 2011 at 6:05 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Marti Raudsepp wrote:
>> On Sat, Feb 19, 2011 at 15:23, Bruce Momjian <bruce@momjian.us> wrote:
>> > Sorry. ?I was thinking of allowing a command to alert an administrator
>> > when we switch standby machines, or if we can't do synchronous standby
>> > any longer. ?I assume we put a message in the logs, and the admin could
>> > have a script that monitors that, but I thought a pager-like command
>> > would be helpful
>>
>> For asynchronous notifications we already have LISTEN/NOTIFY. Perhaps
>> that could be used?
>
> Well, those are going to require work to notify someone externally, like
> send an email alert.
>

although, that seems the work for a monitor tool.

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

Re: Sync Rep v17

From

Robert Haas

Date:

19 February 2011, 23:52:39

On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> First, we should be clear to explain that you are referring to the fact
> that the request
>  synchronous_commit = off
>  synchronous_replication = on
> makes no sense in the way the replication system is currently designed,
> even though it is a wish-list item to make it work in 9.2+

What exactly do you mean by "make it work"?  We can either (1) wait
for the local commit and the remote commit (synchronous_commit=on,
synchronous_replication=on), (2) wait for the local commit only
(synchronous_commit=on, synchronous_replication=off), or (3) wait for
neither (synchronous_commit=off, synchronous_replication=off).
There's no fourth possible behavior, AFAICS.

The question is whether synchronous_commit=off,
synchronous_replication=on should behave like (1) or (3); AFAICS
there's no fourth possible behavior.  You have it as #1; I'm arguing
it should be #3.  I realize it's an arguable point; I'm just arguing
for what makes most sense to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sync Rep v17

From

Robert Haas

Date:

20 February 2011, 00:26:50

On Sat, Feb 19, 2011 at 3:35 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote:
>> On the other hand, I see no particular
>> harm in leaving the option in there either, though I definitely think
>> the default should be changed to -1.
>
> The default setting should be to *not* freeze up if you lose the
> standby. That behaviour unexpectedly leads to an effective server down
> situation, rather than 2 minutes of slow running.

My understanding is that the parameter will wait on every commit, not
just once.  There's no mechanism to do anything else.  But I did some
testing this evening and actually it appears to not work at all.  I
hit walreceiver with a SIGSTOP and the commit never completes, even
after the two minute timeout.  Also, when I restarted walreceiver
after a long time, I got a server crash.

DEBUG:  write 0/3027BC8 flush 0/3014690 apply 0/3014690
DEBUG:  released 0 procs up to 0/3014690
DEBUG:  write 0/3027BC8 flush 0/3027BC8 apply 0/3014690
DEBUG:  released 2 procs up to 0/3027BC8
WARNING:  could not locate ourselves on wait queue
server closed the connection unexpectedlyThis probably means the server terminated abnormallybefore or while processing
therequest.

The connection to the server was lost. Attempting reset: DEBUG:
shmem_exit(-1): 0 callbacks to make
DEBUG:  proc_exit(-1): 0 callbacks to make
FATAL:  could not receive data from WAL stream: server closed the
connection unexpectedly    This probably means the server terminated abnormally    before or while processing the
request.
Failed.
!> LOG:  record with zero length at 0/3027BC8
DEBUG:  CommitTransaction
DEBUG:  name: unnamed; blockState:       STARTED; state: INPROGR,
xid/subid/cid: 0/1/0, nestlvl: 1, children:
DEBUG:  received replication command: IDENTIFY_SYSTEM
DEBUG:  received replication command: START_REPLICATION 0/3000000
LOG:  streaming replication successfully connected to primary
DEBUG:  standby "standby" is a potential synchronous standby
DEBUG:  write 0/0 flush 0/0 apply 0/3027BC8
DEBUG:  released 0 procs up to 0/0
DEBUG:  standby "standby" has now caught up with primary
DEBUG:  write 0/3027C18 flush 0/0 apply 0/3027BC8
DEBUG:  standby "standby" is now the synchronous replication standby
DEBUG:  released 0 procs up to 0/0
DEBUG:  write 0/3027C18 flush 0/3027C18 apply 0/3027BC8
DEBUG:  released 0 procs up to 0/3027C18
DEBUG:  write 0/3027C18 flush 0/3027C18 apply 0/3027C18
DEBUG:  released 0 procs up to 0/3027C18

(lots more copies of those last two messages)

I believe the problem is that the definition of IsOnSyncRepQueue is
bogus, so that the loop in SyncRepWaitOnQueue always takes the first
branch.

It was a little confusing to me setting this up that setting only
synchronous_replication did nothing; I had to also set
synchronous_standby_names.  We might need a cross-check there.  I
believe the docs for synchronous_replication also need some updating;
this part appears to be out of date:

+        between primary and standby. The commit wait will last until the
+        first reply from any standby. Multiple standby servers allow
+        increased availability and possibly increase performance as well.

The words "on the primary" in the next sentence may not be necessary
any more either, as I believe this parameter now has no effect
anywhere else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sync Rep v17

From

Jaime Casanova

Date:

20 February 2011, 03:12:48

On Sat, Feb 19, 2011 at 10:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> First, we should be clear to explain that you are referring to the fact
>> that the request
>>  synchronous_commit = off
>>  synchronous_replication = on
>> makes no sense in the way the replication system is currently designed,
>> even though it is a wish-list item to make it work in 9.2+
>
> What exactly do you mean by "make it work"?  We can either (1) wait
> for the local commit and the remote commit (synchronous_commit=on,
> synchronous_replication=on), (2) wait for the local commit only
> (synchronous_commit=on, synchronous_replication=off), or (3) wait for
> neither (synchronous_commit=off, synchronous_replication=off).
> There's no fourth possible behavior, AFAICS.
>
> The question is whether synchronous_commit=off,
> synchronous_replication=on should behave like (1) or (3); AFAICS
> there's no fourth possible behavior.  You have it as #1; I'm arguing
> it should be #3.  I realize it's an arguable point; I'm just arguing
> for what makes most sense to me.
>

IMHO, we should stick to the safest option.

considering that synchronous_replication to on means that we *want*
durability, and that synchronous_commit to off means we don't *care*
about durability. Then the real question here is: in the presence of
synchronous_replication to on (which means we want durability), are we
allowed to assume we can loss data?

one way to manage that is simply disallow that combination with an
error, maybe that is a bit strict but we haven't to make assumptions;
the other option is to keep safe which means keep durability so if you
want to risk some data then you should disable synchronous_replication
as well as synchronous_commit... maybe sending a message to the log
when you detect the conflicting situation.

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

Re: Sync Rep v17

From

Florian Pflug

Date:

20 February 2011, 08:20:26

On Feb20, 2011, at 08:12 , Jaime Casanova wrote:
> considering that synchronous_replication to on means that we *want*
> durability, and that synchronous_commit to off means we don't *care*
> about durability. Then the real question here is: in the presence of
> synchronous_replication to on (which means we want durability), are we
> allowed to assume we can loss data?

From the angle, shouldn't we turn synchronous_replication=on into a third
possible state of synchronous_commit?

We'd then have

synchronous_commit=off #Same as now
synchronous_commit=local #Same as synchronous_commit=on,                        #synchronous_replication=off
synchronous_commit=standby #Same as synchronous_commit=on,                          #synchronous_replication=on

> one way to manage that is simply disallow that combination with an
> error, maybe that is a bit strict but we haven't to make assumptions;
> the other option is to keep safe which means keep durability so if you
> want to risk some data then you should disable synchronous_replication
> as well as synchronous_commit... maybe sending a message to the log
> when you detect the conflicting situation.

The question is where we'd raise the error, though. Doing it on GUC
assignment makes the behaviour depend on assignment order. That's a
bad idea I believe, since it possibly breaks ALTER ROLE/DATEBASE SET ....

best regards,
Florian Pflug

Re: Sync Rep v17

From

Jaime Casanova

Date:

20 February 2011, 19:10:05

On Sun, Feb 20, 2011 at 7:20 AM, Florian Pflug <fgp@phlo.org> wrote:
> On Feb20, 2011, at 08:12 , Jaime Casanova wrote:
>> considering that synchronous_replication to on means that we *want*
>> durability, and that synchronous_commit to off means we don't *care*
>> about durability. Then the real question here is: in the presence of
>> synchronous_replication to on (which means we want durability), are we
>> allowed to assume we can loss data?
>
> From the angle, shouldn't we turn synchronous_replication=on into a third
> possible state of synchronous_commit?
>
> We'd then have
>
> synchronous_commit=off #Same as now
> synchronous_commit=local #Same as synchronous_commit=on,
>                         #synchronous_replication=off
> synchronous_commit=standby #Same as synchronous_commit=on,
>                           #synchronous_replication=on
>

that would be a little confuse and difficult to document. at least i
know that as far as we say something like this "to activate
synchronous replication set synchronous commit to standby" users
somehow will have the impression that locally the commit is
asynchronous (ok, a have had bad experiences with Ecuadorian users ;)

>> one way to manage that is simply disallow that combination with an
>> error, maybe that is a bit strict but we haven't to make assumptions;
>> the other option is to keep safe which means keep durability so if you
>> want to risk some data then you should disable synchronous_replication
>> as well as synchronous_commit... maybe sending a message to the log
>> when you detect the conflicting situation.
>
> The question is where we'd raise the error, though. Doing it on GUC
> assignment makes the behaviour depend on assignment order. That's a
> bad idea I believe, since it possibly breaks ALTER ROLE/DATEBASE SET ....
>

well, yeah... maybe is just to much worthless work... but we can check
before commit and send a NOTICE message

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

Re: Sync Rep v17

From

Florian Pflug

Date:

20 February 2011, 20:21:33

On Feb21, 2011, at 00:09 , Jaime Casanova wrote:
> On Sun, Feb 20, 2011 at 7:20 AM, Florian Pflug <fgp@phlo.org> wrote:
>> On Feb20, 2011, at 08:12 , Jaime Casanova wrote:
>>> considering that synchronous_replication to on means that we *want*
>>> durability, and that synchronous_commit to off means we don't *care*
>>> about durability. Then the real question here is: in the presence of
>>> synchronous_replication to on (which means we want durability), are we
>>> allowed to assume we can loss data?
>> 
>> From the angle, shouldn't we turn synchronous_replication=on into a third
>> possible state of synchronous_commit?
>> 
>> We'd then have
>> 
>> synchronous_commit=off #Same as now
>> synchronous_commit=local #Same as synchronous_commit=on,
>>                         #synchronous_replication=off
>> synchronous_commit=standby #Same as synchronous_commit=on,
>>                           #synchronous_replication=on
>> 
> 
> that would be a little confuse and difficult to document. at least i
> know that as far as we say something like this "to activate
> synchronous replication set synchronous commit to standby" users
> somehow will have the impression that locally the commit is
> asynchronous (ok, a have had bad experiences with Ecuadorian users ;)

I figured we'd say something like

'To guarantee durability of transactions except in the fail-over case,
set synchronous_commit to "local". To additionally guarantee durability
even in the case of a fail-over to a standby node, set synchronous_commit
to "standby". This causes a COMMIT to only report success once the commit
record has be acknowledged by the current synchronous standby node, and
will therefore induce a higher latency of COMMIT requests.'

Other possible names for the "standby" options that come to mind are
"replication", "replica" or "replicate". Or maybe "cluster", but that
seems confusing since we sometimes call the collection of databases managed
by one postgres instance a cluster.

best regards,
Florian Pflug

Re: Sync Rep v17

From

Fujii Masao

Date:

21 February 2011, 05:06:38

On Sat, Feb 19, 2011 at 9:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> Well, good news all round.
>
> v17 implements what I believe to be the final set of features for sync
> rep. This one I'm actually fairly happy with. It can be enjoyed best at
> DEBUG3.
>
> The patch is very lite touch on a few areas of code, plus a chunk of
> specific code, all on master-side. Pretty straight really. I'm sure
> problems will be found, its not long since I completed this; thanks to
> Daniel Farina for your help with patch assembly.
>
> Which is just as well, because the other news is that I'm off on holiday
> for a few days, which is most inconvenient. I won't be committing this
> for at least a week and absent from the list. OTOH, I think its ready
> for a final review and commit, so I'm OK if you commit or OK if you
> leave it for me.

Thanks for the patch!

Here are the comments:

PREPARE TRANSACTION and ROLLBACK PREPARED should wait for
replication as well as COMMIT PREPARED?

What if fast shutdown is requested while RecordTransactionCommit
is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete
until replication has been successfully done (i.e., until at least one
synchronous standby has connected to the master especially if
allow_standalone_primary is disabled). Is this OK?

We should emit WARNING when the synchronous standby with
wal_receiver_status_interval = 0 connects to the master. Because,
in that case, a transaction unexpectedly would wait for replication
infinitely.

+    /* Need a modifiable copy of string */
+    rawstring = pstrdup(SyncRepStandbyNames);
+
+    /* Parse string into list of identifiers */
+    if (!SplitIdentifierString(rawstring, ',', &elemlist))

pfree(rawstring) and list_free(elemlist) should be called also if
SplitIdentifierString returns TRUE. Otherwise, memory-leak would
happen.

+        ereport(FATAL,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+           errmsg("invalid list syntax for parameter
\"synchronous_standby_names\"")));
+        return false;

"return false" is not required here though that might be harmless.

I've read about a tenth of the patch, so I'll submit another comments
about the rest later.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Tatsuo Ishii

Date:

21 February 2011, 08:34:34

> Well, good news all round.
> 
> v17 implements what I believe to be the final set of features for sync
> rep. This one I'm actually fairly happy with. It can be enjoyed best at
> DEBUG3.
> 
> The patch is very lite touch on a few areas of code, plus a chunk of
> specific code, all on master-side. Pretty straight really. I'm sure
> problems will be found, its not long since I completed this; thanks to
> Daniel Farina for your help with patch assembly.

+       <primary><varname>synchronous_standby_names</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+        Specifies a list of standby names that can become the sole
+        synchronous standby. Other standby servers connect that are also on
+        the list become potential standbys. If the current synchronous standby
+        goes away it will be replaced with one of the potential standbys.
+        Specifying more than one standby name can allow very high availability.
+       </para>

Can anybody please enlighten me? I do not quite follow "Other standby
servers connect that are also on the list become potential standbys"
part.

Can I read this as "Other standby servers that are also on the list
become potential synchrnous standbys"?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: Sync Rep v17

From

Daniel Farina

Date:

21 February 2011, 18:56:06

On Mon, Feb 21, 2011 at 4:35 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> Well, good news all round.

Hello on this thread,

I'm taking a look at replication timeout with non-blocking which would
be "nice" but not required for this patch, in my understanding. But
before that, we're going to put this patch through some exercise via
pgbench to determine two things:

* How much performance is impacted

* Does it crash?

As it will be somewhat hard to prove the durability guarantees of
commit without special heroics, unless someone can suggest a
mechanism. Expect some results Real Soon.

-- 
fdr

Re: Sync Rep v17

From

Fujii Masao

Date:

21 February 2011, 23:51:10

On Tue, Feb 22, 2011 at 7:55 AM, Daniel Farina <daniel@heroku.com> wrote:
> I'm taking a look at replication timeout with non-blocking which would
> be "nice" but not required for this patch, in my understanding.

Why do you think so? You think sync_replication_timeout_client is sufficient
for sync rep?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Fujii Masao

Date:

22 February 2011, 01:38:17

On Mon, Feb 21, 2011 at 6:06 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> I've read about a tenth of the patch, so I'll submit another comments
> about the rest later.

Here are another comments:

SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
if the standby crashes while a transaction is waiting for replication,
it waits infinitely.

sync_rep_service and potential_sync_standby are not required to be in the
WalSnd shmem because only walsender accesses them.

+static bool
+SyncRepServiceAvailable(void)
+{
+    bool     result = false;
+
+    SpinLockAcquire(&WalSndCtl->ctlmutex);
+    result = WalSndCtl->sync_rep_service_available;
+    SpinLockRelease(&WalSndCtl->ctlmutex);
+
+    return result;
+}

volatile pointer needs to be used to prevent code rearrangement.

+    slock_t        ctlmutex;        /* locks shared variables shown above */

cltmutex should be initialized by calling SpinLockInit.

+            /*
+             * Stop providing the sync rep service, even if there are
+             * waiting backends.
+             */
+            {
+                SpinLockAcquire(&WalSndCtl->ctlmutex);
+                WalSndCtl->sync_rep_service_available = false;
+                SpinLockRelease(&WalSndCtl->ctlmutex);
+            }

sync_rep_service_available should be set to false only when
there is no synchronous walsender.

+    /*
+     * When we first start replication the standby will be behind the primary.
+     * For some applications, for example, synchronous replication, it is
+     * important to have a clear state for this initial catchup mode, so we
+     * can trigger actions when we change streaming state later. We may stay
+     * in this state for a long time, which is exactly why we want to be
+     * able to monitor whether or not we are still here.
+     */
+    WalSndSetState(WALSNDSTATE_CATCHUP);
+

The above has already been committed. Please remove that from the patch.

I don't like calling SyncRepReleaseWaiters for each feedback because
I guess that it's too frequent. How about receiving all the feedbacks available
from the socket, and then calling SyncRepReleaseWaiters as well as
walreceiver does?

+    bool        ownLatch;        /* do we own the above latch? */

We can just remove the ownLatch flag.

I've read about two-tenths of the patch, so I'll submit another comments
about the rest later. Sorry for the slow reviewing...

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Fujii Masao

Date:

22 February 2011, 01:42:51

On Mon, Feb 21, 2011 at 9:35 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:
> +       <primary><varname>synchronous_standby_names</> configuration parameter</primary>
> +      </indexterm>
> +      <listitem>
> +       <para>
> +        Specifies a list of standby names that can become the sole
> +        synchronous standby. Other standby servers connect that are also on
> +        the list become potential standbys. If the current synchronous standby
> +        goes away it will be replaced with one of the potential standbys.
> +        Specifying more than one standby name can allow very high availability.
> +       </para>
>
> Can anybody please enlighten me? I do not quite follow "Other standby
> servers connect that are also on the list become potential standbys"
> part.
>
> Can I read this as "Other standby servers that are also on the list
> become potential synchrnous standbys"?

I read that in the same way.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Marti Raudsepp

Date:

22 February 2011, 04:59:59

On Tue, Feb 22, 2011 at 07:38, Fujii Masao <masao.fujii@gmail.com> wrote:
> +       SpinLockAcquire(&WalSndCtl->ctlmutex);
> +       result = WalSndCtl->sync_rep_service_available;
> +       SpinLockRelease(&WalSndCtl->ctlmutex);

> volatile pointer needs to be used to prevent code rearrangement.

I don't think that's necessary. Spinlock functions already prevent
reordering using __asm__ __volatile__

Otherwise, surely they would be utterly broken?

Regards,
Marti

Re: Sync Rep v17

From

Andres Freund

Date:

22 February 2011, 06:09:17

On Tuesday 22 February 2011 09:59:21 Marti Raudsepp wrote:
> On Tue, Feb 22, 2011 at 07:38, Fujii Masao <masao.fujii@gmail.com> wrote:
> > +       SpinLockAcquire(&WalSndCtl->ctlmutex);
> > +       result = WalSndCtl->sync_rep_service_available;
> > +       SpinLockRelease(&WalSndCtl->ctlmutex);
> > 
> > volatile pointer needs to be used to prevent code rearrangement.
> 
> I don't think that's necessary. Spinlock functions already prevent
> reordering using __asm__ __volatile__
> 
> Otherwise, surely they would be utterly broken?
Its not the spinlock thats the problem but that "result" may be already loaded 
into a register. Thats not prohibited by __asm__ __volatile__.

Andres

Re: Sync Rep v17

From

Greg Smith

Date:

22 February 2011, 09:04:23

Daniel Farina wrote:
> As it will be somewhat hard to prove the durability guarantees of
> commit without special heroics, unless someone can suggest a
> mechanism.

Could you introduce a hack creating deterministic server side crashes in 
order to test this out?  The simplest thing that comes to mind is a rule 
like "kick shared memory in the teeth to force a crash after every 100 
commits", then see if #100 shows up as expected.  Pick two different 
small numbers for the interval and you could probably put that on both 
sides to simulate all sorts of badness.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Sync Rep v17

From

Tom Lane

Date:

22 February 2011, 11:41:41

Marti Raudsepp <marti@juffo.org> writes:
> On Tue, Feb 22, 2011 at 07:38, Fujii Masao <masao.fujii@gmail.com> wrote:
>> +       SpinLockAcquire(&WalSndCtl->ctlmutex);
>> +       result = WalSndCtl->sync_rep_service_available;
>> +       SpinLockRelease(&WalSndCtl->ctlmutex);

>> volatile pointer needs to be used to prevent code rearrangement.

> I don't think that's necessary. Spinlock functions already prevent
> reordering using __asm__ __volatile__

You're mistaken.  We started using that volatile-pointer convention
after noting that some compilers would misoptimize the code otherwise.

It's not a problem with LWLock-protected stuff because the LWLock calls
are actual out-of-line function calls, and the compiler knows it doesn't
know what those functions might do.  But gcc is a lot more willing to
reorder stuff around asm operations, so you can't assume that
SpinLockAcquire/SpinLockRelease are equally safe.  The way to prevent
optimization bugs is to make sure that the fetches/stores protected by a
spinlock are done through volatile pointers.
        regards, tom lane

Re: Sync Rep v17

From

Jaime Casanova

Date:

22 February 2011, 15:44:08

On Sat, Feb 19, 2011 at 11:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> DEBUG:  write 0/3027BC8 flush 0/3014690 apply 0/3014690
> DEBUG:  released 0 procs up to 0/3014690
> DEBUG:  write 0/3027BC8 flush 0/3027BC8 apply 0/3014690
> DEBUG:  released 2 procs up to 0/3027BC8
> WARNING:  could not locate ourselves on wait queue
> server closed the connection unexpectedly
>        This probably means the server terminated abnormally
>        before or while processing the request.
> The connection to the server was lost. Attempting reset: DEBUG:

you can make this happen more easily, i just run "pgbench -n -c10 -j10
test" and qot that warning and sometimes a segmentation fault and
sometimes a failed assertion

and the problematic code starts at
src/backend/replication/syncrep.c:277, here my suggestions on that
code.
still i get a failed assertion because of the second Assert (i think
we should just remove that one)

*************** SyncRepRemoveFromQueue(void)
*** 288,299 ****
                       if (proc->lwWaitLink == NULL)                               elog(WARNING, "could not locate
ourselves on wait queue");
!                       proc = proc->lwWaitLink;               }
               if (proc->lwWaitLink == NULL)   /* At tail */               {
!                       Assert(proc == MyProc);                       /* Remove ourselves from tail of queue */
             Assert(queue->tail == MyProc);                       queue->tail = proc; 
--- 288,300 ----
                       if (proc->lwWaitLink == NULL)                               elog(WARNING, "could not locate
ourselves on wait queue");
!                       else
!                               proc = proc->lwWaitLink;               }
               if (proc->lwWaitLink == NULL)   /* At tail */               {
!                       Assert(proc != MyProc);                       /* Remove ourselves from tail of queue */
             Assert(queue->tail == MyProc);                       queue->tail = proc; 

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

Re: Sync Rep v17

From

Daniel Farina

Date:

22 February 2011, 15:45:58

On Tue, Feb 22, 2011 at 5:04 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> Daniel Farina wrote:
>>
>> As it will be somewhat hard to prove the durability guarantees of
>> commit without special heroics, unless someone can suggest a
>> mechanism.
>
> Could you introduce a hack creating deterministic server side crashes in
> order to test this out?  The simplest thing that comes to mind is a rule
> like "kick shared memory in the teeth to force a crash after every 100
> commits", then see if #100 shows up as expected.  Pick two different small
> numbers for the interval and you could probably put that on both sides to
> simulate all sorts of badness.

I probably could via function, would a kill -9 also be of interest to you?

--
fdr

Re: Sync Rep v17

From

Daniel Farina

Date:

24 February 2011, 02:40:11

On Fri, Feb 18, 2011 at 4:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> Well, good news all round.
>
> v17 implements what I believe to be the final set of features for sync
> rep. This one I'm actually fairly happy with. It can be enjoyed best at
> DEBUG3.

I've been messing with this patch and am wondering if this behavior is expected:

I've been frobbing the server around (I was messing around with the
syncrep feature, but do not know if this is related just yet), and
came upon a case I do not expect: it would appear that prior to
establishing a connection to do streaming replication, the "startup
process" (which is recovering) is very slowly catching up (or so it
would be indicated by txid_current_snapshot()) and eating up enormous
amounts of memory, such as 6GB at a time in RES,  monotonically
increasing. Furthermore, the incrementation of the txid_snapshot is
very slow, and it doesn't seem like I'm coming close to making full
use of my resources: cpu and block devices are not very busy.  There
may have been a brief spurt of pgbench activity that would generate
such WAL traffic to replay.

I have not done a hard shutdown to my knowledge, and the server does
allow me to query relatively quickly as a standby.

Looks like I'm about to hit 7+GB. Is there something I'm missing?

-- 
fdr

Re: Sync Rep v17

From

Daniel Farina

Date:

24 February 2011, 02:42:34

On Wed, Feb 23, 2011 at 10:39 PM, Daniel Farina <daniel@heroku.com> wrote:
> On Fri, Feb 18, 2011 at 4:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>
>> Well, good news all round.
>>
>> v17 implements what I believe to be the final set of features for sync
>> rep. This one I'm actually fairly happy with. It can be enjoyed best at
>> DEBUG3.
>
> I've been messing with this patch and am wondering if this behavior is expected:
>
> I've been frobbing the server around (I was messing around with the
> syncrep feature, but do not know if this is related just yet), and
> came upon a case I do not expect: it would appear that prior to
> establishing a connection to do streaming replication, the "startup
> process" (which is recovering) is very slowly catching up (or so it
> would be indicated by txid_current_snapshot()) and eating up enormous
> amounts of memory, such as 6GB at a time in RES,  monotonically
> increasing. Furthermore, the incrementation of the txid_snapshot is
> very slow, and it doesn't seem like I'm coming close to making full
> use of my resources: cpu and block devices are not very busy.  There
> may have been a brief spurt of pgbench activity that would generate
> such WAL traffic to replay.
>
> I have not done a hard shutdown to my knowledge, and the server does
> allow me to query relatively quickly as a standby.

Oh, yes, this reproduces past shutdowns/startups, and there's quite a
few txids before I catch up. I'm also comfortable poking around with
gdb (I have already recompiled with debugging symbols and
optimizations off and was poking around, especially at
MemoryContextStats(TopMemoryContext), but was not rewarded.

--
fdr

Re: Sync Rep v17

From

Fujii Masao

Date:

24 February 2011, 09:08:43

On Tue, Feb 22, 2011 at 2:38 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> I've read about two-tenths of the patch, so I'll submit another comments
> about the rest later. Sorry for the slow reviewing...

Here are another comments:

+        {"synchronous_standby_names", PGC_SIGHUP, WAL_REPLICATION,
+            gettext_noop("List of potential standby names to synchronise with."),
+            NULL,
+            GUC_LIST_INPUT | GUC_IS_NAME

Why did you add GUC_IS_NAME here? I don't think that it's reasonable
to limit the length of this parameter to 63. Because dozens of standby
names might be specified in the parameter.

SyncRepQueue->qlock should be initialized by calling SpinLockInit?

+ * Portions Copyright (c) 2010-2010, PostgreSQL Global Development Group

Typo: s/2010/2011

sync_replication_timeout_client would mess up the "wait-forever" option.
So, when allow_standalone_primary is disabled, ISTM that
sync_replication_timeout_client should have no effect.

Please check max_wal_senders before calling SyncRepWaitForLSN for
non-replication case.

SyncRepRemoveFromQueue seems not to be as short-term as we can
use the spinlock. Instead, LW lock should be used there.

+            old_status = get_ps_display(&len);
+            new_status = (char *) palloc(len + 21 + 1);
+            memcpy(new_status, old_status, len);
+            strcpy(new_status + len, " waiting for sync rep");
+            set_ps_display(new_status, false);
+            new_status[len] = '\0'; /* truncate off " waiting" */

Updating the PS display should be skipped if update_process_title is false.

+        /*
+         * XXX extra code needed here to maintain sorted invariant.

Yeah, such a code is required. I think that we can shorten the time
it takes to find an insert position by searching the list backwards.
Because the given LSN is expected to be relatively new in the queue.

+         * Our approach should be same as racing car - slow in, fast out.
+         */

Really? Even when removing the entry from the queue, we need
to search the queue as well as we do in the add-entry case.
Why don't you make walsenders remove the entry from the queue,
instead?

+    long        timeout = SyncRepGetWaitTimeout();
<snip>
+            bool timeout = false;
<snip>
+            else if (timeout > 0 &&
+                TimestampDifferenceExceeds(GetCurrentTransactionStopTimestamp(),
+                                            now, timeout))
+            {
+                release = true;
+                timeout = true;
+            }

You seem to mix up two "timeout" variables.

+            if (proc->lwWaitLink == MyProc)
+            {
+                /*
+                 * Remove ourselves from middle of queue.
+                 * No need to touch head or tail.
+                 */
+                proc->lwWaitLink = MyProc->lwWaitLink;
+            }

When we find ourselves, we should break out of the loop soon,
instead of continuing the loop to the end?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Daniel Farina

Date:

24 February 2011, 22:13:53

On Tue, Feb 22, 2011 at 11:43 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
> On Sat, Feb 19, 2011 at 11:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> DEBUG:  write 0/3027BC8 flush 0/3014690 apply 0/3014690
>> DEBUG:  released 0 procs up to 0/3014690
>> DEBUG:  write 0/3027BC8 flush 0/3027BC8 apply 0/3014690
>> DEBUG:  released 2 procs up to 0/3027BC8
>> WARNING:  could not locate ourselves on wait queue
>> server closed the connection unexpectedly
>>        This probably means the server terminated abnormally
>>        before or while processing the request.
>> The connection to the server was lost. Attempting reset: DEBUG:
>
> you can make this happen more easily, i just run "pgbench -n -c10 -j10
> test" and qot that warning and sometimes a segmentation fault and
> sometimes a failed assertion

I have also reproduced this. Notably, things seem fine as long as
pgbench is confined to one backend, but as soon as two are used (-c 2)
by the feature I can get segfaults.

In the UI department, I am finding it somewhat difficult to affirm
that I am, in fact, synchronously replicating anything in the HA
scenario (where I never want to block. However, by enjoying the patch
at DEBUG3 and running what I think to be syncrepped and non-syncrepped
runs, I believe that I am not committing user error (seeing syncrep
waiting vs. lack thereof).  This is in part hard to confirm because
the single-backend performance (if DEBUG3 is to be believed, I will
write a real torture test later) of syncrep is actually very good, I
was expecting a more perceptible performance dropoff. But then again,
I imagine the real kicker will happen when we can run concurrent
backends. Also, Amazon EBS doesn't have the fastest disks, and within
an availability zone network latency is awfully low.

I can't quite explain what I was seeing before w.r.t.  memory usage,
and more pressingly, a very slow recover. I turned off hot standby and
was messing around and, before I knew it, the server was caught up. I
do not know if that was just coincidence (probably) or overhead
imposed by HS. The very high RES number was linux fooling me, as most
of it was SHR and in SHMEM.

--
fdr

Re: Sync Rep v17

From

Daniel Farina

Date:

24 February 2011, 22:26:28

With some more fooling around, I have also managed to get this elog(WARNING)
        if (proc->lwWaitLink == NULL)            elog(WARNING, "could not locate ourselves on wait queue");

-- 
fdr

Re: Sync Rep v17

From

Jaime Casanova

Date:

25 February 2011, 02:30:02

On Mon, Feb 21, 2011 at 4:06 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> PREPARE TRANSACTION and ROLLBACK PREPARED should wait for
> replication as well as COMMIT PREPARED?
>

maybe ROLLBACK PREPARED but i'm not sure... i'm pretty sure we don't
need to wait for PREPARE TRANSACTION, but i could be wrong

> What if fast shutdown is requested while RecordTransactionCommit
> is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete
> until replication has been successfully done (i.e., until at least one
> synchronous standby has connected to the master especially if
> allow_standalone_primary is disabled). Is this OK?
>

i guess that is debatable, IMHO if there is a synch standby then wait
(because the user request durability of the data),
if there isn't a synch rep wait until the timeout (even if
allow_standalone_primary is disabled) because probably this is
a miss configuration or an special case i'm trying to handle (network
broke, data center of standbies explode, etc).

> We should emit WARNING when the synchronous standby with
> wal_receiver_status_interval = 0 connects to the master. Because,
> in that case, a transaction unexpectedly would wait for replication
> infinitely.
>

actually i think we should reject such standby as a synch standby, and
look for another one in the synchronous_standby_names list

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

Re: Sync Rep v17

From

Yeb Havinga

Date:

25 February 2011, 11:42:37

On 2011-02-22 20:43, Jaime Casanova wrote:
>
> you can make this happen more easily, i just run "pgbench -n -c10 -j10
> test" and qot that warning and sometimes a segmentation fault and
> sometimes a failed assertion
>
> and the problematic code starts at
> src/backend/replication/syncrep.c:277, here my suggestions on that
> code.
> still i get a failed assertion because of the second Assert (i think
> we should just remove that one)
>
> *************** SyncRepRemoveFromQueue(void)
> *** 288,299 ****
>
>                          if (proc->lwWaitLink == NULL)
>                                  elog(WARNING, "could not locate
> ourselves on wait queue");
> !                       proc = proc->lwWaitLink;
>                  }
>
>                  if (proc->lwWaitLink == NULL)   /* At tail */
>                  {
> !                       Assert(proc == MyProc);
>                          /* Remove ourselves from tail of queue */
>                          Assert(queue->tail == MyProc);
>                          queue->tail = proc;
> --- 288,300 ----
>
>                          if (proc->lwWaitLink == NULL)
>                                  elog(WARNING, "could not locate
> ourselves on wait queue");
> !                       else
> !                               proc = proc->lwWaitLink;
>                  }
>
>                  if (proc->lwWaitLink == NULL)   /* At tail */
>                  {
> !                       Assert(proc != MyProc);
>                          /* Remove ourselves from tail of queue */
>                          Assert(queue->tail == MyProc);
>                          queue->tail = proc;
>
I also did some initial testing on this patch and got the queue related 
errors with > 1 clients. With the code change from Jaime above I still 
got a lot of 'not on queue warnings'.

I tried to understand how the queue was supposed to work - resulting in 
the changes below that also incorporates a suggestion from Fujii 
upthread, to early exit when myproc was found.

With the changes below all seems to work without warnings. I now see 
that the note about the list invariant is too short, better was: "if 
queue length = 1 then head = tail"

--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -274,6 +274,8 @@ SyncRepRemoveFromQueue(void)        }        else        {
+               bool found = false;
+                while (proc->lwWaitLink != NULL)                {                        /* Are we the next proc in
ourtraversal of the 
 
queue? */
@@ -284,17 +286,19 @@ SyncRepRemoveFromQueue(void)                                 * No need to touch head or tail.
                           */                                proc->lwWaitLink = MyProc->lwWaitLink;
 
+                               found = true;
+                               break;                        }

-                       if (proc->lwWaitLink == NULL)
-                               elog(WARNING, "could not locate 
ourselves on wait queue");                        proc = proc->lwWaitLink;                }
+               if (!found)
+                       elog(WARNING, "could not locate ourselves on 
wait queue");

-               if (proc->lwWaitLink == NULL)   /* At tail */
+               /* If MyProc was removed from the tail, maintain list 
invariant head==tail */
+               if (proc->lwWaitLink == NULL)                {
-                       Assert(proc == MyProc);
-                       /* Remove ourselves from tail of queue */
+                       Assert(proc != MyProc); /* impossible since that 
is the head=MyProc branch above */                        Assert(queue->tail == MyProc);
queue->tail= proc;                        proc->lwWaitLink = NULL;
 

I needed to add this to make the documentation compile

--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF;         You should also consider setting
<varname>hot_standby_feedback</>        as an alternative to using this parameter.
 
</para>
+ </listitem>
+ </varlistentry>
+ </variablelist></sect2>

<sect2 id="runtime-config-sync-rep">

regards,
Yeb Havinga

Re: Sync Rep v17

From

Jaime Casanova

Date:

25 February 2011, 15:41:02

On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havinga <yebhavinga@gmail.com> wrote:
>>
> I also did some initial testing on this patch and got the queue related
> errors with > 1 clients. With the code change from Jaime above I still got a
> lot of 'not on queue warnings'.
>
> I tried to understand how the queue was supposed to work - resulting in the
> changes below that also incorporates a suggestion from Fujii upthread, to
> early exit when myproc was found.
>

yes, looking at the code, the warning and your patch... it seems yours
is the right solution...
I'm compiling right now to test again and see the effects, Robert
maybe you can test your failure case again? i'm really sure it's
related to this...

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

Re: Sync Rep v17

From

Jeff Davis

Date:

25 February 2011, 21:03:52

On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote:
> Oh, yes, this reproduces past shutdowns/startups, and there's quite a
> few txids before I catch up. I'm also comfortable poking around with
> gdb (I have already recompiled with debugging symbols and
> optimizations off and was poking around, especially at
> MemoryContextStats(TopMemoryContext), but was not rewarded.

Where is all of that memory going during recovery? Recovery shouldn't
use much memory at all, as far as I can tell.

What's even allocating memory at all?

Regards,Jeff Davis

Re: Sync Rep v17

From

Daniel Farina

Date:

25 February 2011, 21:32:31

On Fri, Feb 25, 2011 at 4:52 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote:
>> Oh, yes, this reproduces past shutdowns/startups, and there's quite a
>> few txids before I catch up. I'm also comfortable poking around with
>> gdb (I have already recompiled with debugging symbols and
>> optimizations off and was poking around, especially at
>> MemoryContextStats(TopMemoryContext), but was not rewarded.
>
> Where is all of that memory going during recovery? Recovery shouldn't
> use much memory at all, as far as I can tell.
>
> What's even allocating memory at all?

I noticed this is RSS fooling with me. As pages get touched in shared
memory, for some reason RSS was constantly getting increased, along
with SHR at the same time.

Still, the long recovery time was mystifying to me, considering the
lack of unclean shutdowns.

-- 
fdr

Re: Sync Rep v17

From

Yeb Havinga

Date:

28 February 2011, 05:32:48

On 2011-02-25 20:40, Jaime Casanova wrote:
> On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havinga<yebhavinga@gmail.com>  wrote:
>> I also did some initial testing on this patch and got the queue related
>> errors with>  1 clients. With the code change from Jaime above I still got a
>> lot of 'not on queue warnings'.
>>
>> I tried to understand how the queue was supposed to work - resulting in the
>> changes below that also incorporates a suggestion from Fujii upthread, to
>> early exit when myproc was found
> yes, looking at the code, the warning and your patch... it seems yours
> is the right solution...
> I'm compiling right now to test again and see the effects, Robert
> maybe you can test your failure case again? i'm really sure it's
> related to this...
I did some more testing over the weekend with this patched v17 patch. 
Since you've posted a v18 patch, let me write some findings with the v17 
patch before continuing with the v18 patch.

The tests were done on a x86_64 platform, 1Gbit network interfaces, 3 
servers. Non default configuration changes are copy pasted at the end of 
this mail.

1) no automatic switch to other synchronous standby
- start master server, add synchronous standby 1
- change allow_standalone_primary to off
- add second synchronous standby
- wait until pg_stat_replication shows both standby's are in STREAMING state
- stop standby 1
what happens is that the master stalls, where I expected that it 
would've switched to standby 2 acknowledge commits.

The following thing was pilot error, but since I was test-piloting a new 
plane, I still think it might be usual feedback. In my opinion, any 
number and order of pg_ctl stops and starts on both the master and 
standby servers, as long as they are not with -m immediate, should never 
cause the state I reached.

2) reaching some sort of shutdown deadlock state
- start master server, add synchronous standby
- change allow_standalone_primary to off
then I did all sorts of test things, everything still ok. Then I wanted 
to shutdown everything, and maybe because of some symmetry (stack like) 
I did the following because I didn't think it through
- pg_ctl stop on standby (didn't actualy wait until done, but 
immediately in other terminal)
- pg_ctl stop on master
O wait.. master needs to sync transactions
- start standby again. but now: FATAL:  the database system is shutting down

There is no clean way to get out of this situation. 
allow_standalone_primary in the face of shutdowns might be tricky. Maybe 
shutdown must be prohibited to enter the shutting down phase in 
allow_standalone_primary = off together with no sync standby, that would 
allow for the sync standby to attach again.

3) PANIC on standby server
At some point a standby suddenly disconnected after I started a new 
pgbench run on a existing master/standby pair, with the following error 
in the logfile.

LOCATION:  libpqrcv_connect, libpqwalreceiver.c:171
PANIC:  XX000: heap_update_redo: failed to add tuple
CONTEXT:  xlog redo hot_update: rel 1663/16411/16424; tid 305453/15; new 
305453/102
LOCATION:  heap_xlog_update, heapam.c:4724
LOG:  00000: startup process (PID 32597) was terminated by signal 6: Aborted

This might be due to pilot error as well; I did a several tests over the 
weekend and after this error I was more alert on remembering immediate 
shutdowns/starting with a clean backup after that, and didn't see 
similar errors since.

4) The performance of the syncrep seems to be quite an improvement over 
the previous syncrep patches, I've seen tps-ses of O(650) where the 
others were more like O(20). The O(650) tps is limited by the speed of 
the standby server I used-at several times the master would halt only 
because of heavy disk activity at the standby. A warning in the docs 
might be right: be sure to use good IO hardware for your synchronous 
replicas! With that bottleneck gone, I suspect the current syncrep 
version can go beyond 1000tps over 1 Gbit.

regards,
Yeb Havinga

recovery.conf:
standby_mode = 'on'
primary_conninfo = 'host=mg73 user=repuser password=pwd 
application_name=standby1'
trigger_file = '/tmp/postgresql.trigger.5432'

postgresql.conf nondefault parameters:
log_error_verbosity = verbose
log_min_messages = warning
log_min_error_statement = warning
listen_addresses = '*'                # what IP address(es) to listen on;
search_path='\"$user\", public, hl7'
archive_mode = on
archive_command = 'test ! -f /data/backup_in_progress || cp -i %p 
/archive/%f < /dev/null'
checkpoint_completion_target = 0.9
checkpoint_segments = 16
default_statistics_target = 500
constraint_exclusion = on
max_connections = 120
maintenance_work_mem = 128MB
effective_cache_size = 1GB
work_mem = 44MB
wal_buffers = 8MB
shared_buffers = 128MB
wal_level = 'archive'
max_wal_senders = 4
wal_keep_segments = 1000 # 16000MB (for production increase this)
synchronous_standby_names = 'standby1,standby2,standby3'
synchronous_replication = on
allow_standalone_primary = off

Re: Sync Rep v17

From

Simon Riggs

Date:

28 February 2011, 14:40:12

On Mon, 2011-02-21 at 18:06 +0900, Fujii Masao wrote:

> Thanks for the patch!

Thanks for the review.

Code available at git://github.com/simon2ndQuadrant/postgres.git

> PREPARE TRANSACTION and ROLLBACK PREPARED should wait for
> replication as well as COMMIT PREPARED?

PREPARE - Yes
ROLLBACK - No

Further discussion welcome

> What if fast shutdown is requested while RecordTransactionCommit
> is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete
> until replication has been successfully done (i.e., until at least one
> synchronous standby has connected to the master especially if
> allow_standalone_primary is disabled). Is this OK?

A "behaviour" - important, though needs further discussion.

> We should emit WARNING when the synchronous standby with
> wal_receiver_status_interval = 0 connects to the master. Because,
> in that case, a transaction unexpectedly would wait for replication
> infinitely.

This can't happen because a WALSender only activates as a sync standby
once it has received a reply from the chosen standby.

> +    /* Need a modifiable copy of string */
> +    rawstring = pstrdup(SyncRepStandbyNames);
> +
> +    /* Parse string into list of identifiers */
> +    if (!SplitIdentifierString(rawstring, ',', &elemlist))
> 
> pfree(rawstring) and list_free(elemlist) should be called also if
> SplitIdentifierString returns TRUE. Otherwise, memory-leak would
> happen.

Fixed, thanks

> +        ereport(FATAL,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +           errmsg("invalid list syntax for parameter
> \"synchronous_standby_names\"")));
> +        return false;
> 
> "return false" is not required here though that might be harmless.

Compiler likes it.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

28 February 2011, 14:40:17

On Mon, 2011-02-28 at 10:31 +0100, Yeb Havinga wrote:

> 1) no automatic switch to other synchronous standby
> - start master server, add synchronous standby 1
> - change allow_standalone_primary to off
> - add second synchronous standby
> - wait until pg_stat_replication shows both standby's are in STREAMING state
> - stop standby 1
> what happens is that the master stalls, where I expected that it 
> would've switched to standby 2 acknowledge commits.
> 
> The following thing was pilot error, but since I was test-piloting a new 
> plane, I still think it might be usual feedback. In my opinion, any 
> number and order of pg_ctl stops and starts on both the master and 
> standby servers, as long as they are not with -m immediate, should never 
> cause the state I reached.

The behaviour of "allow_synchronous_standby = off" is pretty much
untested and does seem to have various gotchas in there.

> 2) reaching some sort of shutdown deadlock state
> - start master server, add synchronous standby
> - change allow_standalone_primary to off
> then I did all sorts of test things, everything still ok. Then I wanted 
> to shutdown everything, and maybe because of some symmetry (stack like) 
> I did the following because I didn't think it through
> - pg_ctl stop on standby (didn't actualy wait until done, but 
> immediately in other terminal)
> - pg_ctl stop on master
> O wait.. master needs to sync transactions
> - start standby again. but now: FATAL:  the database system is shutting down
> 
> There is no clean way to get out of this situation. 
> allow_standalone_primary in the face of shutdowns might be tricky. Maybe 
> shutdown must be prohibited to enter the shutting down phase in 
> allow_standalone_primary = off together with no sync standby, that would 
> allow for the sync standby to attach again.

The behaviour of "allow_synchronous_standby = off" is not something I'm
worried about personally and I've argued all along it sounds pretty
silly to me. If someone wants to spend some time defining how it
*should* work that might help matters. I'm inclined to remove it before
commit if it can't work cleanly, to be re-added at a later date if it
makes sense.

> 
> 3) PANIC on standby server
> At some point a standby suddenly disconnected after I started a new 
> pgbench run on a existing master/standby pair, with the following error 
> in the logfile.
> 
> LOCATION:  libpqrcv_connect, libpqwalreceiver.c:171
> PANIC:  XX000: heap_update_redo: failed to add tuple
> CONTEXT:  xlog redo hot_update: rel 1663/16411/16424; tid 305453/15; new 
> 305453/102
> LOCATION:  heap_xlog_update, heapam.c:4724
> LOG:  00000: startup process (PID 32597) was terminated by signal 6: Aborted
> 
> This might be due to pilot error as well; I did a several tests over the 
> weekend and after this error I was more alert on remembering immediate 
> shutdowns/starting with a clean backup after that, and didn't see 
> similar errors since.

Good. There are no changes in the patch for that section of code.

> 4) The performance of the syncrep seems to be quite an improvement over 
> the previous syncrep patches, I've seen tps-ses of O(650) where the 
> others were more like O(20). The O(650) tps is limited by the speed of 
> the standby server I used-at several times the master would halt only 
> because of heavy disk activity at the standby. A warning in the docs 
> might be right: be sure to use good IO hardware for your synchronous 
> replicas! With that bottleneck gone, I suspect the current syncrep 
> version can go beyond 1000tps over 1 Gbit.

Good, thanks.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

28 February 2011, 14:41:06

On Thu, 2011-02-24 at 22:08 +0900, Fujii Masao wrote:
> On Tue, Feb 22, 2011 at 2:38 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > I've read about two-tenths of the patch, so I'll submit another comments
> > about the rest later. Sorry for the slow reviewing...
> 
> Here are another comments:

Thanks for your comments
Code available at git://github.com/simon2ndQuadrant/postgres.git

> +        {"synchronous_standby_names", PGC_SIGHUP, WAL_REPLICATION,
> +            gettext_noop("List of potential standby names to synchronise with."),
> +            NULL,
> +            GUC_LIST_INPUT | GUC_IS_NAME
> 
> Why did you add GUC_IS_NAME here? I don't think that it's reasonable
> to limit the length of this parameter to 63. Because dozens of standby
> names might be specified in the parameter.

OK, misunderstanding by me causing bug. Fixed

> SyncRepQueue->qlock should be initialized by calling SpinLockInit?

Fixed

> + * Portions Copyright (c) 2010-2010, PostgreSQL Global Development Group
>
> Typo: s/2010/2011

Fixed

> sync_replication_timeout_client would mess up the "wait-forever" option.
> So, when allow_standalone_primary is disabled, ISTM that
> sync_replication_timeout_client should have no effect.

Agreed, done.

> Please check max_wal_senders before calling SyncRepWaitForLSN for
> non-replication case.

SyncRepWaitForLSN() handles this

> SyncRepRemoveFromQueue seems not to be as short-term as we can
> use the spinlock. Instead, LW lock should be used there.
> 
> +            old_status = get_ps_display(&len);
> +            new_status = (char *) palloc(len + 21 + 1);
> +            memcpy(new_status, old_status, len);
> +            strcpy(new_status + len, " waiting for sync rep");
> +            set_ps_display(new_status, false);
> +            new_status[len] = '\0'; /* truncate off " waiting" */
> 
> Updating the PS display should be skipped if update_process_title is false.

Fixed.

> +        /*
> +         * XXX extra code needed here to maintain sorted invariant.
> 
> Yeah, such a code is required. I think that we can shorten the time
> it takes to find an insert position by searching the list backwards.
> Because the given LSN is expected to be relatively new in the queue.

Sure, just skipped that because of time pressure. Will add.

> +         * Our approach should be same as racing car - slow in, fast out.
> +         */
> 
> Really? Even when removing the entry from the queue, we need
> to search the queue as well as we do in the add-entry case.
> Why don't you make walsenders remove the entry from the queue,
> instead?

This models wakeup behaviour of LWlocks

> +    long        timeout = SyncRepGetWaitTimeout();
> <snip>
> +            bool timeout = false;
> <snip>
> +            else if (timeout > 0 &&
> +                TimestampDifferenceExceeds(GetCurrentTransactionStopTimestamp(),
> +                                            now, timeout))
> +            {
> +                release = true;
> +                timeout = true;
> +            }
> 
> You seem to mix up two "timeout" variables.

Yes, good catch. Fixed.

> +            if (proc->lwWaitLink == MyProc)
> +            {
> +                /*
> +                 * Remove ourselves from middle of queue.
> +                 * No need to touch head or tail.
> +                 */
> +                proc->lwWaitLink = MyProc->lwWaitLink;
> +            }
> 
> When we find ourselves, we should break out of the loop soon,
> instead of continuing the loop to the end?

Incorporated in Yeb's patch

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

28 February 2011, 14:41:07

On Tue, 2011-02-22 at 14:38 +0900, Fujii Masao wrote:
> On Mon, Feb 21, 2011 at 6:06 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > I've read about a tenth of the patch, so I'll submit another comments
> > about the rest later.
> 
> Here are another comments:

Thanks for your comments
Code available at git://github.com/simon2ndQuadrant/postgres.git

> SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
> if the standby crashes while a transaction is waiting for replication,
> it waits infinitely.

Will think on this.

> sync_rep_service and potential_sync_standby are not required to be in the
> WalSnd shmem because only walsender accesses them.

For use in debug, if not later monitoring

> +static bool
> +SyncRepServiceAvailable(void)
> +{
> +    bool     result = false;
> +
> +    SpinLockAcquire(&WalSndCtl->ctlmutex);
> +    result = WalSndCtl->sync_rep_service_available;
> +    SpinLockRelease(&WalSndCtl->ctlmutex);
> +
> +    return result;
> +}

Fixed

> volatile pointer needs to be used to prevent code rearrangement.
> 
> +    slock_t        ctlmutex;        /* locks shared variables shown above */
> 
> cltmutex should be initialized by calling SpinLockInit.

Fixed

> +            /*
> +             * Stop providing the sync rep service, even if there are
> +             * waiting backends.
> +             */
> +            {
> +                SpinLockAcquire(&WalSndCtl->ctlmutex);
> +                WalSndCtl->sync_rep_service_available = false;
> +                SpinLockRelease(&WalSndCtl->ctlmutex);
> +            }
> 
> sync_rep_service_available should be set to false only when
> there is no synchronous walsender.

The way I had it is "correct" because  "if (MyWalSnd->sync_rep_service)"
then if we're the sync walsender, so if we stop being it, then there
isn't one. A potential walsender might then become the sync walsender.

I think you'd like it if there was no gap at the point the potential wal
sender takes over? Just not sure how to do that robustly at present.
Will think some more.

> +    /*
> +     * When we first start replication the standby will be behind the primary.
> +     * For some applications, for example, synchronous replication, it is
> +     * important to have a clear state for this initial catchup mode, so we
> +     * can trigger actions when we change streaming state later. We may stay
> +     * in this state for a long time, which is exactly why we want to be
> +     * able to monitor whether or not we are still here.
> +     */
> +    WalSndSetState(WALSNDSTATE_CATCHUP);
> +
> 
> The above has already been committed. Please remove that from the patch.

Removed

> I don't like calling SyncRepReleaseWaiters for each feedback because
> I guess that it's too frequent. How about receiving all the feedbacks available
> from the socket, and then calling SyncRepReleaseWaiters as well as
> walreceiver does?

Possibly, but an optimisation for later when we have behaviour correct.

> +    bool        ownLatch;        /* do we own the above latch? */
> 
> We can just remove the ownLatch flag.

Agreed, removed

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

28 February 2011, 14:41:08

On Fri, 2011-02-25 at 16:41 +0100, Yeb Havinga wrote:

> --- a/src/backend/replication/syncrep.c
> +++ b/src/backend/replication/syncrep.c
> @@ -274,6 +274,8 @@ SyncRepRemoveFromQueue(void)
>          }
>          else
>          {
> +               bool found = false;
> +
>                  while (proc->lwWaitLink != NULL)
>                  {
>                          /* Are we the next proc in our traversal of the 
> queue? */
> @@ -284,17 +286,19 @@ SyncRepRemoveFromQueue(void)
>                                   * No need to touch head or tail.
>                                   */
>                                  proc->lwWaitLink = MyProc->lwWaitLink;
> +                               found = true;
> +                               break;
>                          }
> 
> -                       if (proc->lwWaitLink == NULL)
> -                               elog(WARNING, "could not locate 
> ourselves on wait queue");
>                          proc = proc->lwWaitLink;
>                  }
> +               if (!found)
> +                       elog(WARNING, "could not locate ourselves on 
> wait queue");
> 
> -               if (proc->lwWaitLink == NULL)   /* At tail */
> +               /* If MyProc was removed from the tail, maintain list 
> invariant head==tail */
> +               if (proc->lwWaitLink == NULL)
>                  {
> -                       Assert(proc == MyProc);
> -                       /* Remove ourselves from tail of queue */
> +                       Assert(proc != MyProc); /* impossible since that 
> is the head=MyProc branch above */
>                          Assert(queue->tail == MyProc);
>                          queue->tail = proc;
>                          proc->lwWaitLink = NULL;

Used your suggested fix
Code available at git://github.com/simon2ndQuadrant/postgres.git

> I needed to add this to make the documentation compile
> 
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF;
>           You should also consider setting <varname>hot_standby_feedback</>
>           as an alternative to using this parameter.
> </para>
> + </listitem>
> + </varlistentry>
> + </variablelist></sect2>
> 
> <sect2 id="runtime-config-sync-rep">

Separate bug, will fix

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

28 February 2011, 15:17:22

On Thu, 2011-02-24 at 18:13 -0800, Daniel Farina wrote:

> I have also reproduced this. Notably, things seem fine as long as
> pgbench is confined to one backend, but as soon as two are used (-c 2)
> by the feature I can get segfaults.

Sorry that you all experienced this. I wasn't able to get concurrent
queue accesses even with -c 8, so I spent about half a day last week
investigating a possible spinlock locking flaw. That meant the code in
that area was untested, which is most obvious now. I guess that means I
should test on different hardware in future.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

28 February 2011, 15:32:42

On Mon, 2011-02-21 at 21:35 +0900, Tatsuo Ishii wrote:
> > Well, good news all round.
> > 
> > v17 implements what I believe to be the final set of features for sync
> > rep. This one I'm actually fairly happy with. It can be enjoyed best at
> > DEBUG3.
> > 
> > The patch is very lite touch on a few areas of code, plus a chunk of
> > specific code, all on master-side. Pretty straight really. I'm sure
> > problems will be found, its not long since I completed this; thanks to
> > Daniel Farina for your help with patch assembly.
> 
> +       <primary><varname>synchronous_standby_names</> configuration parameter</primary>
> +      </indexterm>
> +      <listitem>
> +       <para>
> +        Specifies a list of standby names that can become the sole
> +        synchronous standby. Other standby servers connect that are also on
> +        the list become potential standbys. If the current synchronous standby
> +        goes away it will be replaced with one of the potential standbys.
> +        Specifying more than one standby name can allow very high availability.
> +       </para>
> 
> Can anybody please enlighten me? I do not quite follow "Other standby
> servers connect that are also on the list become potential standbys"
> part.
> 
> Can I read this as "Other standby servers that are also on the list
> become potential synchrnous standbys"?

Yes

I have reworded it to see if that improves the explanation
Code available at git://github.com/simon2ndQuadrant/postgres.git

untagged text included here for clarity
synchronous_standby_names
Specifies a list of standby names that can become the solesynchronous standby.  At any time there can be only one
synchronousstandbyserver.  The first standby to connect that is listed herewill become the synchronous standby server.
Otherstandby serversthat connect will then become potential synchronous standbys.If the current synchronous standby
disconnectsfor whatever reasonit will be replaced with one of the potential standbys.Specifying more than one standby
namecan allow very high availability.

The standby name is currently taken as the application_name of thestandby, as set in the primary_conninfo on the
standby.Names arenot enforced for uniqueness, though clearly that could lead toconfusion and misconfiguration.
Specifyingmultiple standbys with thesame name does not allow more than one standby to be the currentsynchronous
standby.
If a standby is removed from the list of servers then it will stopbeing the synchronous standby, allowing another to
takeit's place.Standbys may also be added to the list without restarting the server.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

28 February 2011, 15:57:30

On Fri, 2011-02-25 at 16:41 +0100, Yeb Havinga wrote:

> I needed to add this to make the documentation compile
> 
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF;
>           You should also consider setting
> <varname>hot_standby_feedback</>
>           as an alternative to using this parameter.
> </para>
> + </listitem>
> + </varlistentry>
> + </variablelist></sect2>
> 
> <sect2 id="runtime-config-sync-rep">

Corrected, thanks.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

28 February 2011, 17:13:27

On Sat, 2011-02-19 at 22:52 -0500, Robert Haas wrote:
> On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > First, we should be clear to explain that you are referring to the fact
> > that the request
> >  synchronous_commit = off
> >  synchronous_replication = on
> > makes no sense in the way the replication system is currently designed,
> > even though it is a wish-list item to make it work in 9.2+
> 
> What exactly do you mean by "make it work"?  We can either (1) wait
> for the local commit and the remote commit (synchronous_commit=on,
> synchronous_replication=on), (2) wait for the local commit only
> (synchronous_commit=on, synchronous_replication=off), or (3) wait for
> neither (synchronous_commit=off, synchronous_replication=off).
> There's no fourth possible behavior, AFAICS.

Currently, no, since as we discussed earlier we currently need to fsync
WAL locally before it gets sent to standby.

> The question is whether synchronous_commit=off,
> synchronous_replication=on should behave like (1) or (3)

Yes, that is the right question.

> You have it as #1; I'm arguing
> it should be #3.  I realize it's an arguable point; I'm just arguing
> for what makes most sense to me.

Various comments follow on thread. We can pick this up once we've
committed the main patch.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

28 February 2011, 17:14:30

On Sat, 2011-02-19 at 23:26 -0500, Robert Haas wrote:

> I believe the problem is that the definition of IsOnSyncRepQueue is
> bogus, so that the loop in SyncRepWaitOnQueue always takes the first
> branch.

Sorry, don't see that. Jaime/Yeb fix applied.

> It was a little confusing to me setting this up that setting only
> synchronous_replication did nothing; I had to also set
> synchronous_standby_names.  We might need a cross-check there.  

I'm inclined to make an empty "synchronous_standby_names" mean that any
standby can become the sync standby. That seems more useful behaviour
and avoids the need for a cross-check (what exactly would we check??).

> I
> believe the docs for synchronous_replication also need some updating;
> this part appears to be out of date:
> 
> +        between primary and standby. The commit wait will last until
> the
> +        first reply from any standby. Multiple standby servers allow
> +        increased availability and possibly increase performance as
> well.

Agreed

> The words "on the primary" in the next sentence may not be necessary
> any more either, as I believe this parameter now has no effect
> anywhere else. 

Agreed

Docs changed: git://github.com/simon2ndQuadrant/postgres.git

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Robert Haas

Date:

28 February 2011, 17:22:33

On Mon, Feb 28, 2011 at 4:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Sat, 2011-02-19 at 23:26 -0500, Robert Haas wrote:
>
>> I believe the problem is that the definition of IsOnSyncRepQueue is
>> bogus, so that the loop in SyncRepWaitOnQueue always takes the first
>> branch.
>
> Sorry, don't see that. Jaime/Yeb fix applied.
>
>> It was a little confusing to me setting this up that setting only
>> synchronous_replication did nothing; I had to also set
>> synchronous_standby_names.  We might need a cross-check there.
>
> I'm inclined to make an empty "synchronous_standby_names" mean that any
> standby can become the sync standby. That seems more useful behaviour
> and avoids the need for a cross-check (what exactly would we check??).

Hmm, that is a little surprising but might be reasonable.  My thought
was that we would check that if synchronous_replication=on then
synchronous_standbys must be non-empty.  I think there ought to be
some way for the admin to turn synchronous replication *off* though,
in a way that an individual user cannot override.  How will we do
that?

> Docs changed: git://github.com/simon2ndQuadrant/postgres.git

I'm hoping you're going to post an updated patch once the current rash
of updates is all done.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sync Rep v17

From

Simon Riggs

Date:

28 February 2011, 17:36:51

On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote:

> > Docs changed: git://github.com/simon2ndQuadrant/postgres.git
> 
> I'm hoping you're going to post an updated patch once the current rash
> of updates is all done.

Immediately prior to commit, yes. 

Everybody else has been nudging me towards developing in public view,
commit by commit on a public repo. So that's what I'm doing now, as
promised earlier. That should help people object to specific commits if
they no likey.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Robert Haas

Date:

28 February 2011, 17:55:44

On Mon, Feb 28, 2011 at 4:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote:
>
>> > Docs changed: git://github.com/simon2ndQuadrant/postgres.git
>>
>> I'm hoping you're going to post an updated patch once the current rash
>> of updates is all done.
>
> Immediately prior to commit, yes.
>
> Everybody else has been nudging me towards developing in public view,
> commit by commit on a public repo. So that's what I'm doing now, as
> promised earlier. That should help people object to specific commits if
> they no likey.

It took a few days for the problems with the last version to shake
out.  I think you should give people about that much time again.  It's
not realistic to suppose that everyone will follow your git repo in
detail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sync Rep v17

From

Simon Riggs

Date:

28 February 2011, 18:41:25

On Mon, 2011-02-28 at 16:55 -0500, Robert Haas wrote:
> On Mon, Feb 28, 2011 at 4:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote:
> >
> >> > Docs changed: git://github.com/simon2ndQuadrant/postgres.git
> >>
> >> I'm hoping you're going to post an updated patch once the current rash
> >> of updates is all done.
> >
> > Immediately prior to commit, yes.
> >
> > Everybody else has been nudging me towards developing in public view,
> > commit by commit on a public repo. So that's what I'm doing now, as
> > promised earlier. That should help people object to specific commits if
> > they no likey.
> 
> It took a few days for the problems with the last version to shake
> out.  I think you should give people about that much time again.  It's
> not realistic to suppose that everyone will follow your git repo in
> detail.

Yeh, I'm not rushing to commit. And even afterwards I expect comments
that will mean I'm editing this for next month at least.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

28 February 2011, 20:19:54

On Mon, 2011-02-28 at 18:40 +0000, Simon Riggs wrote:
> > SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
> > if the standby crashes while a transaction is waiting for replication,
> > it waits infinitely.
> 
> Will think on this.

The behaviour seems correct to me:

If allow_standalone_primary = off then you wish to wait forever (at your
request...)

If allow_standalone_primary = on then we sit and wait until we hit
client timeout, which occurs even after last standby has gone.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Fujii Masao

Date:

01 March 2011, 02:25:52

On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, 2011-02-28 at 18:40 +0000, Simon Riggs wrote:
>> > SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
>> > if the standby crashes while a transaction is waiting for replication,
>> > it waits infinitely.
>>
>> Will think on this.
>
> The behaviour seems correct to me:
>
> If allow_standalone_primary = off then you wish to wait forever (at your
> request...)

No, I've never wished wait-forever option for now. I'd like to make
the primary work alone when there is no connected standby, for
high-availability.

> If allow_standalone_primary = on then we sit and wait until we hit
> client timeout, which occurs even after last standby has gone.

In that case, why do backends need to wait until the timeout occurs?
We can make those backends resume their transaction as soon as
the last standby has gone. No?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Fujii Masao

Date:

01 March 2011, 02:52:09

Thanks for update of the patch!

On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> SyncRepRemoveFromQueue seems not to be as short-term as we can
>> use the spinlock. Instead, LW lock should be used there.

You seem to have forgotten to fix the above-mentioned issue.
A spinlock can be used only for very short-term operation like
read/write of some shared-variables. The operation on the queue
is not short, so should be protected by LWLock, I think.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Fujii Masao

Date:

01 March 2011, 03:28:41

On Tue, Mar 1, 2011 at 3:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> PREPARE TRANSACTION and ROLLBACK PREPARED should wait for
>> replication as well as COMMIT PREPARED?
>
> PREPARE - Yes
> ROLLBACK - No
>
> Further discussion welcome

If we don't make ROLLBACK PREPARED wait for replication, we might need to
issue ROLLBACK PREPARED to new master again after failover, even if we've
already received a success indication of ROLLBACK PREPARED from old master.
This looks strange to me because, OTOH, in simple COMMIT/ROLLBACK case,
we don't need to issue that to new master again after failover.

>> What if fast shutdown is requested while RecordTransactionCommit
>> is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete
>> until replication has been successfully done (i.e., until at least one
>> synchronous standby has connected to the master especially if
>> allow_standalone_primary is disabled). Is this OK?
>
> A "behaviour" - important, though needs further discussion.

One of the scenarios which I'm concerned is:

1. The primary is running with allow_standalone_primary = on.
2. While some backends are waiting for replication, the user requests
fast shutdown.
3. Since the timeout expires, those backends stop waiting and return the success   indication to the client (but not
replicatedto the standby).

4. Since there is no backend waiting for replication, fast shutdown completes.
5. The clusterware like pacemaker detects the death of the primary and
triggers the   failover.
6. New primary doesn't have some transactions committed to the client, i.e.,   transaction lost happens!!

To avoid such a transaction lost, we should prevent the primary from
returning the
success indication to the client while fast shutdown is being executed, even if
allow_standalone_primary is enabled, I think. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Simon Riggs

Date:

01 March 2011, 03:57:13

On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote:
> On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Mon, 2011-02-28 at 18:40 +0000, Simon Riggs wrote:
> >> > SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
> >> > if the standby crashes while a transaction is waiting for replication,
> >> > it waits infinitely.
> >>
> >> Will think on this.
> >
> > The behaviour seems correct to me:
> >
> > If allow_standalone_primary = off then you wish to wait forever (at your
> > request...)
> 
> No, I've never wished wait-forever option for now. I'd like to make
> the primary work alone when there is no connected standby, for
> high-availability.

Good news, please excuse that reference.

> > If allow_standalone_primary = on then we sit and wait until we hit
> > client timeout, which occurs even after last standby has gone.
> 
> In that case, why do backends need to wait until the timeout occurs?
> We can make those backends resume their transaction as soon as
> the last standby has gone. No?

The guarantee provided is that we will wait for up to client timeout for
the sync standby to confirm. If we stop waiting right at the point that
an "event" occurs, it breaks the whole purpose of the feature.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

01 March 2011, 04:21:32

On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote:
> Thanks for update of the patch!
> 
> On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> SyncRepRemoveFromQueue seems not to be as short-term as we can
> >> use the spinlock. Instead, LW lock should be used there.
> 
> You seem to have forgotten to fix the above-mentioned issue.

Not forgotten.

> A spinlock can be used only for very short-term operation like
> read/write of some shared-variables. The operation on the queue
> is not short, so should be protected by LWLock, I think.

There's no need to sleep while holding locks and the operations are very
short in most cases. The code around it isn't trivial, but that's no
reason to use LWlocks.

LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
that in the current code. Other views welcome.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

01 March 2011, 04:29:21

On Tue, 2011-03-01 at 16:28 +0900, Fujii Masao wrote:
> On Tue, Mar 1, 2011 at 3:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> PREPARE TRANSACTION and ROLLBACK PREPARED should wait for
> >> replication as well as COMMIT PREPARED?
> >
> > PREPARE - Yes
> > ROLLBACK - No
> >
> > Further discussion welcome
> 
> If we don't make ROLLBACK PREPARED wait for replication, we might need to
> issue ROLLBACK PREPARED to new master again after failover, even if we've
> already received a success indication of ROLLBACK PREPARED from old master.
> This looks strange to me because, OTOH, in simple COMMIT/ROLLBACK case,
> we don't need to issue that to new master again after failover.

OK

> >> What if fast shutdown is requested while RecordTransactionCommit
> >> is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete
> >> until replication has been successfully done (i.e., until at least one
> >> synchronous standby has connected to the master especially if
> >> allow_standalone_primary is disabled). Is this OK?
> >
> > A "behaviour" - important, though needs further discussion.
> 
> One of the scenarios which I'm concerned is:
> 
> 1. The primary is running with allow_standalone_primary = on.
> 2. While some backends are waiting for replication, the user requests
> fast shutdown.
> 3. Since the timeout expires, those backends stop waiting and return the success
>     indication to the client (but not replicated to the standby).
> 4. Since there is no backend waiting for replication, fast shutdown completes.
> 5. The clusterware like pacemaker detects the death of the primary and
> triggers the
>     failover.
> 6. New primary doesn't have some transactions committed to the client, i.e.,
>     transaction lost happens!!
> 
> To avoid such a transaction lost, we should prevent the primary from
> returning the
> success indication to the client while fast shutdown is being executed, even if
> allow_standalone_primary is enabled, I think. Thought?

Walsenders don't shutdown until after they have sent the shutdown
checkpoint.

We could make them wait for the reply from the standby at that point.

I'll think some more, not convinced yet.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Fujii Masao

Date:

01 March 2011, 05:13:49

On Tue, Mar 1, 2011 at 5:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> A spinlock can be used only for very short-term operation like
>> read/write of some shared-variables. The operation on the queue
>> is not short, so should be protected by LWLock, I think.
>
> There's no need to sleep while holding locks and the operations are very
> short in most cases. The code around it isn't trivial, but that's no
> reason to use LWlocks.
>
> LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
> that in the current code. Other views welcome.

The following description in in src/backend/storage/lmgr/README
leads me to further think that the spinlock should not be used there.
Because, in the patch, while the spinlock is being held, whole of the
waiting list (if there are many concurrent running transactions, this
might be large) can be searched, SetLatch can be called and
elog(WARNING) can be called (though this should not happen).

----------------
* Spinlocks.  These are intended for *very* short-term locks.  If a lock
is to be held more than a few dozen instructions, or across any sort of
kernel call (or even a call to a nontrivial subroutine), don't use a
spinlock. Spinlocks are primarily used as infrastructure for lightweight
locks. They are implemented using a hardware atomic-test-and-set
instruction, if available.  Waiting processes busy-loop until they can
get the lock. There is no provision for deadlock detection, automatic
release on error, or any other nicety.  There is a timeout if the lock
cannot be gotten after a minute or so (which is approximately forever in
comparison to the intended lock hold time, so this is certainly an error
condition).
----------------

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Fujii Masao

Date:

01 March 2011, 05:44:43

On Tue, Mar 1, 2011 at 4:56 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> > If allow_standalone_primary = on then we sit and wait until we hit
>> > client timeout, which occurs even after last standby has gone.
>>
>> In that case, why do backends need to wait until the timeout occurs?
>> We can make those backends resume their transaction as soon as
>> the last standby has gone. No?
>
> The guarantee provided is that we will wait for up to client timeout for
> the sync standby to confirm. If we stop waiting right at the point that
> an "event" occurs, it breaks the whole purpose of the feature.

ISTM that at least people (including me) who want to use synchronous
replication for high-availability don't like this behavior. Because, when
there is one synchronous standby and it's shut down, the transactions
would get blocked until the timeout happens. The system down time
gets longer.

Of course, if walsender doesn't detect the termination of replication
connection, I can't complain that transactions wait until the timeout
happens. But, otherwise, releasing the waiting transactions would
be help for high-availability, I think.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Fujii Masao

Date:

01 March 2011, 05:53:45

On Tue, Mar 1, 2011 at 5:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> >> What if fast shutdown is requested while RecordTransactionCommit
>> >> is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete
>> >> until replication has been successfully done (i.e., until at least one
>> >> synchronous standby has connected to the master especially if
>> >> allow_standalone_primary is disabled). Is this OK?
>> >
>> > A "behaviour" - important, though needs further discussion.
>>
>> One of the scenarios which I'm concerned is:
>>
>> 1. The primary is running with allow_standalone_primary = on.
>> 2. While some backends are waiting for replication, the user requests
>> fast shutdown.
>> 3. Since the timeout expires, those backends stop waiting and return the success
>>     indication to the client (but not replicated to the standby).
>> 4. Since there is no backend waiting for replication, fast shutdown completes.
>> 5. The clusterware like pacemaker detects the death of the primary and
>> triggers the
>>     failover.
>> 6. New primary doesn't have some transactions committed to the client, i.e.,
>>     transaction lost happens!!
>>
>> To avoid such a transaction lost, we should prevent the primary from
>> returning the
>> success indication to the client while fast shutdown is being executed, even if
>> allow_standalone_primary is enabled, I think. Thought?
>
> Walsenders don't shutdown until after they have sent the shutdown
> checkpoint.
>
> We could make them wait for the reply from the standby at that point.

Right. But what if the replication connection is closed before #3?
In this case, obviously walsender cannot send WAL up to the
shutdown checkpoint.

> I'll think some more, not convinced yet.

Thanks! I'll think more, too, to make sync rep more reliable!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Yeb Havinga

Date:

01 March 2011, 06:28:02

On 2011-02-28 20:32, Simon Riggs wrote:
>
> I have reworded it to see if that improves the explanation
> Code available at git://github.com/simon2ndQuadrant/postgres.git
>
>
>   If a standby is removed from the list of servers then it will stop
>   being the synchronous standby, allowing another to take it's place.
How can I see at the master which one is the synchronous standby? It 
would be nice if it was an extra attribute in the pg_stat_replication 
view. As part of understanding syncrep better, we're going to work on that.

regards,
Yeb Havinga

Re: Sync Rep v17

From

Robert Haas

Date:

01 March 2011, 09:20:50

On Tue, Mar 1, 2011 at 3:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote:
>> Thanks for update of the patch!
>>
>> On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> >> SyncRepRemoveFromQueue seems not to be as short-term as we can
>> >> use the spinlock. Instead, LW lock should be used there.
>>
>> You seem to have forgotten to fix the above-mentioned issue.
>
> Not forgotten.
>
>> A spinlock can be used only for very short-term operation like
>> read/write of some shared-variables. The operation on the queue
>> is not short, so should be protected by LWLock, I think.
>
> There's no need to sleep while holding locks and the operations are very
> short in most cases. The code around it isn't trivial, but that's no
> reason to use LWlocks.
>
> LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
> that in the current code. Other views welcome.

An LWLock is a lot safer, in general, than a spinlock.  A spinlock
mustn't do anything that could emit an error or abort (among other
things).  I doubt that the performance cost of using an LWLock rather
than a spin lock here is enough to matter, and the spin lock seems
more likely to result in hard-to-find bugs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sync Rep v17

From

Tom Lane

Date:

01 March 2011, 11:02:36

Simon Riggs <simon@2ndQuadrant.com> writes:
> On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote:
>> A spinlock can be used only for very short-term operation like
>> read/write of some shared-variables. The operation on the queue
>> is not short, so should be protected by LWLock, I think.

> There's no need to sleep while holding locks and the operations are very
> short in most cases. The code around it isn't trivial, but that's no
> reason to use LWlocks.

What does "in most cases" mean?

> LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
> that in the current code. Other views welcome.

Simon, that is absolutely NOT acceptable.  Spinlocks are to be used only
for short straight-line code segments.  If the lock has any potential to
be held for more than nanoseconds, use an LWLock.  The contention costs
of the shortcut you propose are too high.
        regards, tom lane

Re: Sync Rep v17

From

Tom Lane

Date:

01 March 2011, 11:16:09

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Mar 1, 2011 at 3:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
>> that in the current code. Other views welcome.

> An LWLock is a lot safer, in general, than a spinlock.  A spinlock
> mustn't do anything that could emit an error or abort (among other
> things).  I doubt that the performance cost of using an LWLock rather
> than a spin lock here is enough to matter, and the spin lock seems
> more likely to result in hard-to-find bugs.

Well, stuck spinlocks aren't exactly hard to identify.  But I agree that
the lack of any release-on-error infrastructure is a killer reason not
to use a spinlock for anything but short straight-line code segments.
        regards, tom lane

Re: Sync Rep v17

From

Simon Riggs

Date:

01 March 2011, 14:18:49

On Tue, 2011-03-01 at 10:02 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote:
> >> A spinlock can be used only for very short-term operation like
> >> read/write of some shared-variables. The operation on the queue
> >> is not short, so should be protected by LWLock, I think.
> 
> > There's no need to sleep while holding locks and the operations are very
> > short in most cases. The code around it isn't trivial, but that's no
> > reason to use LWlocks.
> 
> What does "in most cases" mean?
> 
> > LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
> > that in the current code. Other views welcome.
> 
> Simon, that is absolutely NOT acceptable.  Spinlocks are to be used only
> for short straight-line code segments.  If the lock has any potential to
> be held for more than nanoseconds, use an LWLock.  The contention costs
> of the shortcut you propose are too high.

No problem to change.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

02 March 2011, 06:40:45

On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote:

> No, I've never wished wait-forever option for now. I'd like to make
> the primary work alone when there is no connected standby, for
> high-availability. 

allow_standalone_primary seems to need to be better through than it is
now, yet neither of us think its worth having.

If the people that want it can think it through a little better then it
might make this release, but I propose to remove it from this current
patch to allow us to commit with greater certainty and fewer bugs.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

02 March 2011, 07:23:40

On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote:
> On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Mon, 2011-02-28 at 18:40 +0000, Simon Riggs wrote:
> >> > SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
> >> > if the standby crashes while a transaction is waiting for replication,
> >> > it waits infinitely.
> >>
> >> Will think on this.
> >
> > The behaviour seems correct to me:

> > If allow_standalone_primary = on then we sit and wait until we hit
> > client timeout, which occurs even after last standby has gone.
> 
> In that case, why do backends need to wait until the timeout occurs?
> We can make those backends resume their transaction as soon as
> the last standby has gone. No?

I'm getting a little confused as to what you're asking for regarding
shutdowns and WALSender disconnects.

The current behaviour is that when a reply is received we attempt to
wake up backends waiting for an LSN. If that reply is not received
within a timeout then we just return to user anyway. You can wait
forever, if you choose, by setting the timeout to 0.

The WALSender deliberately does *not* wake waiting users if the standby
disconnects. Doing so would break the whole reason for having sync rep
in the first place. What we do is allow a potential standby to takeover
the role of sync standby, if one is available. Or the failing standby
can reconnect and then release waiters.

If we shutdown, then we wait for the shutdown commit record to be
transferred to our standby, so a normal or fast shutdown of the master
always leaves all connected standbys up to date. We already do that, so
sync rep doesn't touch that behaviour. If a standby is disconnected,
then it doesn't receive the shutdown checkpoint record.

The wait state for a commit does not persist when we shutdown and
restart.

Can you restate which bits of the above you think need to be changed?

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Yeb Havinga

Date:

02 March 2011, 08:56:37

On 2011-03-02 11:40, Simon Riggs wrote:
>
> allow_standalone_primary seems to need to be better through than it is
> now, yet neither of us think its worth having.
>
> If the people that want it can think it through a little better then it
> might make this release, but I propose to remove it from this current
> patch to allow us to commit with greater certainty and fewer bugs.
As somebody who actually thought having it was a good idea, +1 for 
remove. I can monitor having one or two sync standbys at all times and 
let bells ring when they fail, but especially when things might break 
and the standbys are gone, having allow_standalone_primary set to off is 
a very efficient way to limit your options then to effectively zero with 
the master.

regards,
Yeb Havinga

Re: Sync Rep v17

From

Robert Haas

Date:

02 March 2011, 09:28:56

On Wed, Mar 2, 2011 at 6:22 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The WALSender deliberately does *not* wake waiting users if the standby
> disconnects. Doing so would break the whole reason for having sync rep
> in the first place. What we do is allow a potential standby to takeover
> the role of sync standby, if one is available. Or the failing standby
> can reconnect and then release waiters.

If the transaction would have been allowed to commit without waiting
had the standby not been connected in the first place, then presumably
it should also be allowed to commit if the standby disconnects later,
too.  Otherwise, it doesn't seem very consistent.  A commit should
either wait for a disconnected standby to reconnect, or it should not
wait.  It shouldn't wait in some situations but not others, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sync Rep v17

From

Fujii Masao

Date:

02 March 2011, 10:30:52

On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The WALSender deliberately does *not* wake waiting users if the standby
> disconnects. Doing so would break the whole reason for having sync rep
> in the first place. What we do is allow a potential standby to takeover
> the role of sync standby, if one is available. Or the failing standby
> can reconnect and then release waiters.

If there is potential standby when synchronous standby has gone, I agree
that it's not good idea to release the waiting backends soon. In this case,
those backends should wait for next synchronous standby.

On the other hand, if there is no potential standby, I think that the waiting
backends should not wait for the timeout and should wake up as soon as
synchronous standby has gone. Otherwise, those backends suspend for
a long time (i.e., until the timeout expires), which would decrease the
high-availability, I'm afraid.

Keeping those backends waiting for the failed standby to reconnect is an
idea. But this looks like the behavior for "allow_standalone_primary = off".
If allow_standalone_primary = on, it looks more natural to make the
primary work alone without waiting the timeout.

> If we shutdown, then we wait for the shutdown commit record to be
> transferred to our standby, so a normal or fast shutdown of the master
> always leaves all connected standbys up to date. We already do that, so
> sync rep doesn't touch that behaviour. If a standby is disconnected,
> then it doesn't receive the shutdown checkpoint record.
>
> The wait state for a commit does not persist when we shutdown and
> restart.
>
> Can you restate which bits of the above you think need to be changed?

What I'm thinking is: when the waiting backends are released because
of the timeout while the fast shutdown is being done in the master,
those backends should not return the success indication to the client.
Of course, in that case, WAL has already been flushed in the master,
but I think that those backends should exit with FATAL error before
returning the success. This is for avoiding breaking the synchronous
replication rule, i.e., all the transaction which the client knows as
committed must be committed in the synchronous standby after failover.

If we allow those backends to return the success in that situation, the
following scenario which can cause a data loss can happen.

1. The primary is running with allow_standalone_primary = on. There   is only one (synchronous) standby connected.
2. The replication connection is closed because of the network outage.
3. While some backends are waiting for replication, the user requests   fast shutdown in the master.
4. Since the timeout expires, those backends stop waiting and return   the success indication to the client (but not
replicatedto the standby).

5. Since there is no backend waiting for replication, fast shutdown   completes.
6. The clusterware like pacemaker detects the death of the primary   and triggers the failover.
7. New primary doesn't have some transactions committed to the   client, i.e., transaction lost happens!!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Fujii Masao

Date:

02 March 2011, 10:40:26

On Wed, Mar 2, 2011 at 7:40 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote:
>
>> No, I've never wished wait-forever option for now. I'd like to make
>> the primary work alone when there is no connected standby, for
>> high-availability.
>
> allow_standalone_primary seems to need to be better through than it is
> now, yet neither of us think its worth having.
>
> If the people that want it can think it through a little better then it
> might make this release, but I propose to remove it from this current
> patch to allow us to commit with greater certainty and fewer bugs.

+1 to remove the "wait-forever" behavior and the parameter for 9.1.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Heikki Linnakangas

Date:

02 March 2011, 10:53:34

On 02.03.2011 12:40, Simon Riggs wrote:
> allow_standalone_primary seems to need to be better through than it is
> now, yet neither of us think its worth having.
>
> If the people that want it can think it through a little better then it
> might make this release, but I propose to remove it from this current
> patch to allow us to commit with greater certainty and fewer bugs.

If you leave it out, then let's rename the feature to "semi-synchronous 
replication" or such. The point of synchronous replication is 
zero-data-loss, and you don't achieve that with allow_standalone_primary=on.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Sync Rep v17

From

Robert Haas

Date:

02 March 2011, 11:07:11

On Wed, Mar 2, 2011 at 9:30 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> What I'm thinking is: when the waiting backends are released because
> of the timeout while the fast shutdown is being done in the master,
> those backends should not return the success indication to the client.
> Of course, in that case, WAL has already been flushed in the master,
> but I think that those backends should exit with FATAL error before
> returning the success. This is for avoiding breaking the synchronous
> replication rule, i.e., all the transaction which the client knows as
> committed must be committed in the synchronous standby after failover.

That seems like an extremely bad idea.  Now any client that assumes
that FATAL means his transaction didn't commit is broken.  Clients
should be entitled to assume that a successful COMMIT means the
transaction committed (with whatever the operative durability
guarantee is) and that an error means it rolled back.  If the
connection is closed before either one of those things happens, the
client can't assume anything.

It might be reasonable to COMMIT but also issue a warning message, or
to just close the connection without telling the client what happened,
but sending an error seems poor.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sync Rep v17

From

Robert Haas

Date:

02 March 2011, 11:08:18

On Wed, Mar 2, 2011 at 9:53 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 02.03.2011 12:40, Simon Riggs wrote:
>>
>> allow_standalone_primary seems to need to be better through than it is
>> now, yet neither of us think its worth having.
>>
>> If the people that want it can think it through a little better then it
>> might make this release, but I propose to remove it from this current
>> patch to allow us to commit with greater certainty and fewer bugs.
>
> If you leave it out, then let's rename the feature to "semi-synchronous
> replication" or such. The point of synchronous replication is
> zero-data-loss, and you don't achieve that with allow_standalone_primary=on.

I think that'd just be adding confusion.  Replication will still be
synchronous; it'll just be more likely to be not happening when you
think it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sync Rep v17

From

Heikki Linnakangas

Date:

02 March 2011, 11:11:37

On 02.03.2011 17:07, Robert Haas wrote:
> On Wed, Mar 2, 2011 at 9:30 AM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>> What I'm thinking is: when the waiting backends are released because
>> of the timeout while the fast shutdown is being done in the master,
>> those backends should not return the success indication to the client.
>> Of course, in that case, WAL has already been flushed in the master,
>> but I think that those backends should exit with FATAL error before
>> returning the success. This is for avoiding breaking the synchronous
>> replication rule, i.e., all the transaction which the client knows as
>> committed must be committed in the synchronous standby after failover.
>
> That seems like an extremely bad idea.  Now any client that assumes
> that FATAL means his transaction didn't commit is broken.  Clients
> should be entitled to assume that a successful COMMIT means the
> transaction committed (with whatever the operative durability
> guarantee is) and that an error means it rolled back.  If the
> connection is closed before either one of those things happens, the
> client can't assume anything.

To achieve the effect Fujii is looking for, we would have to silently 
drop the connection. That would correctly leave the client not knowing 
whether the transaction committed or not.

> It might be reasonable to COMMIT but also issue a warning message, or
> to just close the connection without telling the client what happened,
> but sending an error seems poor.

Yeah, I guess that would work too, if the client knows to watch out for 
those warnings.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Sync Rep v17

From

Heikki Linnakangas

Date:

02 March 2011, 11:23:15

On 02.03.2011 17:07, Robert Haas wrote:
> On Wed, Mar 2, 2011 at 9:53 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> On 02.03.2011 12:40, Simon Riggs wrote:
>>>
>>> allow_standalone_primary seems to need to be better through than it is
>>> now, yet neither of us think its worth having.
>>>
>>> If the people that want it can think it through a little better then it
>>> might make this release, but I propose to remove it from this current
>>> patch to allow us to commit with greater certainty and fewer bugs.
>>
>> If you leave it out, then let's rename the feature to "semi-synchronous
>> replication" or such. The point of synchronous replication is
>> zero-data-loss, and you don't achieve that with allow_standalone_primary=on.
>
> I think that'd just be adding confusion.  Replication will still be
> synchronous; it'll just be more likely to be not happening when you
> think it is.

The defining property of synchronous replication is that when a 
transaction is acknowledged as committed to the client, it has also been 
replicated to the standby. You don't achieve that with 
allow_standalone_primary=on, plain and simple. That's fine for a lot of 
people, most people don't actually want synchronous replication because 
they're not willing to pay the availability penalty. But IMHO it would 
be disingenuous to call it synchronous replication if you can't achieve 
zero data loss with it.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Sync Rep v17

From

Dimitri Fontaine

Date:

02 March 2011, 11:58:24

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> To achieve the effect Fujii is looking for, we would have to silently drop
> the connection. That would correctly leave the client not knowing whether
> the transaction committed or not.

+1

>> It might be reasonable to COMMIT but also issue a warning message, or
>> to just close the connection without telling the client what happened,
>> but sending an error seems poor.
>
> Yeah, I guess that would work too, if the client knows to watch out for
> those warnings.

-1

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Sync Rep v17

From

"Kevin Grittner"

Date:

02 March 2011, 12:20:09

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> The defining property of synchronous replication is that when a 
> transaction is acknowledged as committed to the client, it has
> also been replicated to the standby. You don't achieve that with 
> allow_standalone_primary=on, plain and simple. That's fine for a
> lot of people, most people don't actually want synchronous
> replication because they're not willing to pay the availability
> penalty. But IMHO it would be disingenuous to call it synchronous
> replication if you can't achieve zero data loss with it.
Right.  While there may be more people who favor high availability
than the guarantees of synchronous replication, let's not blur the
lines by mislabeling things.  It's not synchronous replication if a
commit returns successfully without the data being persisted on a
second server.
-Kevin

Re: Sync Rep v17

From

Aidan Van Dyk

Date:

02 March 2011, 12:31:34

On Wed, Mar 2, 2011 at 2:30 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

> 1. The primary is running with allow_standalone_primary = on. There
>    is only one (synchronous) standby connected.

OK.  Explicitly configured to allow the master to report as commited
stuff which isn't on a/any slave.

> 7. New primary doesn't have some transactions committed to the
>    client, i.e., transaction lost happens!!

And this is a surprise?

I'm not saying there isn't a better way to to sequence/control a
shutdown to make this risk less, but isn't that the whole point of the
"allow_standalone_primary" debate/option?

"If there isn't a sync slave for whatever reason, just march on, I'll
deal with the transactions that are committed and not replicated some
other way".

I guess complaining that it shouldn't be possible to "just march on
when no sync slave is available" is one possible way oof "dealing
with" them ;-)

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Sync Rep v17

From

Merlin Moncure

Date:

02 March 2011, 13:39:47

On Wed, Mar 2, 2011 at 9:58 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> To achieve the effect Fujii is looking for, we would have to silently drop
>> the connection. That would correctly leave the client not knowing whether
>> the transaction committed or not.
>
> +1
>
>>> It might be reasonable to COMMIT but also issue a warning message, or
>>> to just close the connection without telling the client what happened,
>>> but sending an error seems poor.
>>
>> Yeah, I guess that would work too, if the client knows to watch out for
>> those warnings.
>
> -1

yeah, unless by warning, you meant 'error'.

merlin

Re: Sync Rep v17

From

Robert Haas

Date:

02 March 2011, 13:41:03

On Wed, Mar 2, 2011 at 12:39 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Mar 2, 2011 at 9:58 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>>> To achieve the effect Fujii is looking for, we would have to silently drop
>>> the connection. That would correctly leave the client not knowing whether
>>> the transaction committed or not.
>>
>> +1
>>
>>>> It might be reasonable to COMMIT but also issue a warning message, or
>>>> to just close the connection without telling the client what happened,
>>>> but sending an error seems poor.
>>>
>>> Yeah, I guess that would work too, if the client knows to watch out for
>>> those warnings.
>>
>> -1
>
> yeah, unless by warning, you meant 'error'.

Well, as mentioned upthread, throwing an error when the transaction is
actually committed seems poor.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sync Rep v17

From

Simon Riggs

Date:

02 March 2011, 15:28:57

On Wed, 2011-03-02 at 17:23 +0200, Heikki Linnakangas wrote:
> On 02.03.2011 17:07, Robert Haas wrote:
> > On Wed, Mar 2, 2011 at 9:53 AM, Heikki Linnakangas
> > <heikki.linnakangas@enterprisedb.com>  wrote:
> >> On 02.03.2011 12:40, Simon Riggs wrote:
> >>>
> >>> allow_standalone_primary seems to need to be better through than it is
> >>> now, yet neither of us think its worth having.
> >>>
> >>> If the people that want it can think it through a little better then it
> >>> might make this release, but I propose to remove it from this current
> >>> patch to allow us to commit with greater certainty and fewer bugs.
> >>
> >> If you leave it out, then let's rename the feature to "semi-synchronous
> >> replication" or such. The point of synchronous replication is
> >> zero-data-loss, and you don't achieve that with allow_standalone_primary=on.
> >
> > I think that'd just be adding confusion.  Replication will still be
> > synchronous; it'll just be more likely to be not happening when you
> > think it is.
> 
> The defining property of synchronous replication is that when a 
> transaction is acknowledged as committed to the client, it has also been 
> replicated to the standby. You don't achieve that with 
> allow_standalone_primary=on, plain and simple. That's fine for a lot of 
> people, most people don't actually want synchronous replication because 
> they're not willing to pay the availability penalty. But IMHO it would 
> be disingenuous to call it synchronous replication if you can't achieve 
> zero data loss with it.

I agree with some of what you say, but that focuses on one corner case
and not on the whole solution.

Yes, if we pick the allow_standalone_primary=on behaviour AND you are
stupid enough to lose *all* of your sync standbys then you may get data
loss.

What I've spent a lot of time doing is trying to ensure that we never
lose all our sync standbys, via clear recommendation to use multiple
servers AND functionality to allow that the standbys work together to
give true high availability without data loss. The current design allows
for an arbitrary number of potential standby servers so your risk of
data loss can be as many 9s as you care to make it.

Not really bothered what we call it, but if you intend to make a song
and dance about whether it is "proper" or not, then I would object to
that because you're not presenting the full situation.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Simon Riggs

Date:

02 March 2011, 15:49:20

On Wed, 2011-03-02 at 16:53 +0200, Heikki Linnakangas wrote:
> On 02.03.2011 12:40, Simon Riggs wrote:
> > allow_standalone_primary seems to need to be better through than it is
> > now, yet neither of us think its worth having.
> >
> > If the people that want it can think it through a little better then it
> > might make this release, but I propose to remove it from this current
> > patch to allow us to commit with greater certainty and fewer bugs.
> 
> If you leave it out, then let's rename the feature to "semi-synchronous 
> replication" or such. The point of synchronous replication is 
> zero-data-loss, and you don't achieve that with allow_standalone_primary=on.

The reason I have suggested leaving that parameter out is because the
behaviour is not fully specified and Yeb has reported cases that don't
(yet) make sense. If you want to fully specify it then we could yet add
it, assuming we have time.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Heikki Linnakangas

Date:

02 March 2011, 16:10:31

On 02.03.2011 21:48, Simon Riggs wrote:
> On Wed, 2011-03-02 at 16:53 +0200, Heikki Linnakangas wrote:
>> On 02.03.2011 12:40, Simon Riggs wrote:
>>> allow_standalone_primary seems to need to be better through than it is
>>> now, yet neither of us think its worth having.
>>>
>>> If the people that want it can think it through a little better then it
>>> might make this release, but I propose to remove it from this current
>>> patch to allow us to commit with greater certainty and fewer bugs.
>>
>> If you leave it out, then let's rename the feature to "semi-synchronous
>> replication" or such. The point of synchronous replication is
>> zero-data-loss, and you don't achieve that with allow_standalone_primary=on.
>
> The reason I have suggested leaving that parameter out is because the
> behaviour is not fully specified and Yeb has reported cases that don't
> (yet) make sense. If you want to fully specify it then we could yet add
> it, assuming we have time.

Fair enough. All I'm saying is that if we end up shipping without that 
parameter (implying allow_standalone_primary=on), we need to call the 
feature something else. The GUCs and code can probably stay as it is, 
but we shouldn't use the term "synchronous replication" in the 
documentation, and release notes and such.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Sync Rep v17

From

"Kevin Grittner"

Date:

02 March 2011, 16:26:48

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> All I'm saying is that if we end up shipping without that
> parameter (implying allow_standalone_primary=on), we need to call
> the feature something else. The GUCs and code can probably stay as
> it is, but we shouldn't use the term "synchronous replication" in
> the documentation, and release notes and such.
I think including "synchronous" is OK as long as it's properly
qualified.  Off-hand thoughts in no particular order:
semi-synchronous
conditionally synchronous
synchronous with automatic failover to standalone
Perhaps the qualifications can be relaxed in some places but not
others?  The documentation should certainly emphasize that there is
no guarantee that a successful commit means that the data is on at
least two separate servers, if no such guarantee exists.
If we expect that losing all replicas is such a terribly thin long
shot that we don't need to worry about this difference, it is such a
long shot that we don't need to worry about "wait forever" behavior,
either; and we should just implement *that* so that no qualification
is needed.  I think that is an odd assumption, though; and I think a
HA failover to weaker persistence guarantees in exchange for
increased up-time would be popular.
-Kevin

Re: Sync Rep v17

From

"Joshua D. Drake"

Date:

02 March 2011, 16:37:07

On Wed, 2011-03-02 at 14:26 -0600, Kevin Grittner wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
>  
> > All I'm saying is that if we end up shipping without that
> > parameter (implying allow_standalone_primary=on), we need to call
> > the feature something else. The GUCs and code can probably stay as
> > it is, but we shouldn't use the term "synchronous replication" in
> > the documentation, and release notes and such.
>  
> I think including "synchronous" is OK as long as it's properly
> qualified.  Off-hand thoughts in no particular order:
>  
> semi-synchronous 

You mean asynchronous

> conditionally synchronous

You mean asynchronous

JD




-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

Re: Sync Rep v17

From

Simon Riggs

Date:

02 March 2011, 16:39:37

On Wed, 2011-03-02 at 22:10 +0200, Heikki Linnakangas wrote:
> On 02.03.2011 21:48, Simon Riggs wrote:
> > On Wed, 2011-03-02 at 16:53 +0200, Heikki Linnakangas wrote:
> >> On 02.03.2011 12:40, Simon Riggs wrote:
> >>> allow_standalone_primary seems to need to be better through than it is
> >>> now, yet neither of us think its worth having.
> >>>
> >>> If the people that want it can think it through a little better then it
> >>> might make this release, but I propose to remove it from this current
> >>> patch to allow us to commit with greater certainty and fewer bugs.
> >>
> >> If you leave it out, then let's rename the feature to "semi-synchronous
> >> replication" or such. The point of synchronous replication is
> >> zero-data-loss, and you don't achieve that with allow_standalone_primary=on.
> >
> > The reason I have suggested leaving that parameter out is because the
> > behaviour is not fully specified and Yeb has reported cases that don't
> > (yet) make sense. If you want to fully specify it then we could yet add
> > it, assuming we have time.
> 
> Fair enough. All I'm saying is that if we end up shipping without that 
> parameter (implying allow_standalone_primary=on), we need to call the 
> feature something else. The GUCs and code can probably stay as it is, 
> but we shouldn't use the term "synchronous replication" in the 
> documentation, and release notes and such.

allow_standalone_primary=off means "wait forever". It does nothing to
reduce data loss since you can't replicate to a server that isn't there.

As we discussed it, allow_standalone_primary=off was not a persistent
state, so shutting down the database would simply leave the data
committed. Which gives the same problem, but implicitly.

Truly "synchronous" requires two-phase commit, which this never was. So
the absence or presence of the poorly specified parameter called
allow_standalone_primary should have no bearing on what we call this
feature.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Andrew Dunstan

Date:

02 March 2011, 16:44:59

On 03/02/2011 03:39 PM, Simon Riggs wrote:
> Truly "synchronous" requires two-phase commit, which this never was. So
> the absence or presence of the poorly specified parameter called
> allow_standalone_primary should have no bearing on what we call this
> feature.
>

I haven't been following this very closely, but to me this screams out 
that we simply must not call it "synchronous".

Just my $0.02 worth.

cheers

andrew

Re: Sync Rep v17

From

"Kevin Grittner"

Date:

02 March 2011, 16:46:02

Simon Riggs <simon@2ndQuadrant.com> wrote:
> allow_standalone_primary=off means "wait forever". It does nothing
> to reduce data loss since you can't replicate to a server that
> isn't there.
Unless you're pulling from some persistent source which will then
feel free to discard what you have retrieved.  You can't assume
otherwise; it's not that rare a pattern for interfaces.  We use such
a pattern for accepting criminal complaints from district attorneys
and sending warrant information to police agencies.  Waiting a long
time (it won't *actually* be forever) is better than losing
information.
In other words, making a persistence promise which is not kept can
lose data on the client side, if the clients actually trust your
guarantees.
-Kevin

Re: Sync Rep v17

From

Robert Haas

Date:

02 March 2011, 16:50:45

On Wed, Mar 2, 2011 at 3:45 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Simon Riggs <simon@2ndQuadrant.com> wrote:
>
>> allow_standalone_primary=off means "wait forever". It does nothing
>> to reduce data loss since you can't replicate to a server that
>> isn't there.
>
> Unless you're pulling from some persistent source which will then
> feel free to discard what you have retrieved.  You can't assume
> otherwise; it's not that rare a pattern for interfaces.  We use such
> a pattern for accepting criminal complaints from district attorneys
> and sending warrant information to police agencies.  Waiting a long
> time (it won't *actually* be forever) is better than losing
> information.
>
> In other words, making a persistence promise which is not kept can
> lose data on the client side, if the clients actually trust your
> guarantees.

I agree.  I assumed that when Simon was talking about removing
allow_standalone_primary, he meant making the code always behave as if
it were turned OFF.  The common scenario here is bound to be:

1. Everything is humming along.
2. The network link between the master and standby drops.
3. Then it comes back up again.

After (2) and before (3), what should the behavior the master be?  It
seems clear to me that it should WAIT.  Otherwise, a crash on the
master now leaves you with transactions that were confirmed committed
but not actually replicated to the standby.  If you were OK with that
scenario, you would have used asynchronous replication in the first
place.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sync Rep v17

From

Simon Riggs

Date:

02 March 2011, 17:02:23

On Wed, 2011-03-02 at 15:50 -0500, Robert Haas wrote:

> I assumed that when Simon was talking about removing
> allow_standalone_primary, he meant making the code always behave as if
> it were turned OFF. 

That is the part that is currently not fully specified, so no that is
not currently included in the patch.

That isn't double-talk for "and I will not include it".

What I mean is I'd rather have something than nothing, whatever we
decide to call it.

But the people that want it had better come up with a clear definition
of how it will actually work, covering all cases, not just "splish
splash, it kinda works and we'll let Simon will work out the rest".

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

"Kevin Grittner"

Date:

02 March 2011, 17:11:48

Simon Riggs <simon@2ndQuadrant.com> wrote:
> On Wed, 2011-03-02 at 15:50 -0500, Robert Haas wrote:
> 
>> I assumed that when Simon was talking about removing
>> allow_standalone_primary, he meant making the code always behave
>> as if it were turned OFF. 
> 
> That is the part that is currently not fully specified, so no that
> is not currently included in the patch.
> 
> That isn't double-talk for "and I will not include it".
> 
> What I mean is I'd rather have something than nothing, whatever we
> decide to call it.
+1 on that.
> But the people that want it had better come up with a clear
> definition of how it will actually work
What is ill-defined?  I would have thought that the commit request
would hang indefinitely until the server was able to provide its
usual guarantees.  I'm not clear on what cases aren't covered by
that.
-Kevin

Re: Sync Rep v17

From

Simon Riggs

Date:

02 March 2011, 17:13:45

On Wed, 2011-03-02 at 15:44 -0500, Andrew Dunstan wrote:
> 
> On 03/02/2011 03:39 PM, Simon Riggs wrote:
> > Truly "synchronous" requires two-phase commit, which this never was. So
> > the absence or presence of the poorly specified parameter called
> > allow_standalone_primary should have no bearing on what we call this
> > feature.
> >
> 
> I haven't been following this very closely, but to me this screams out 
> that we simply must not call it "synchronous".

As long as we describe it via its characteristics, then I'll be happy:

* significantly reduces the possibility of data loss in a sensibly
configured cluster

* allow arbitrary N+k resilience that can meet and easily exceed"5 nines" data durability

* isn't two phase commit

* isn't a magic bullet that will protect your data even after your
hardware fails or is disconnected

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Tom Lane

Date:

02 March 2011, 17:16:33

Simon Riggs <simon@2ndQuadrant.com> writes:
> On Wed, 2011-03-02 at 22:10 +0200, Heikki Linnakangas wrote:
>> Fair enough. All I'm saying is that if we end up shipping without that 
>> parameter (implying allow_standalone_primary=on), we need to call the 
>> feature something else. The GUCs and code can probably stay as it is, 
>> but we shouldn't use the term "synchronous replication" in the 
>> documentation, and release notes and such.

> allow_standalone_primary=off means "wait forever". It does nothing to
> reduce data loss since you can't replicate to a server that isn't there.

This is irrelevant to the point.  The point is that sync rep implies
that we will not *tell a client* its data is committed unless the commit
is down to disk in two places.  I agree with the people saying that not
having this parameter makes it not real sync rep.
        regards, tom lane

Re: Sync Rep v17

From

Dimitri Fontaine

Date:

02 March 2011, 17:19:32

Robert Haas <robertmhaas@gmail.com> writes:
> 1. Everything is humming along.
> 2. The network link between the master and standby drops.
> 3. Then it comes back up again.
>
> After (2) and before (3), what should the behavior the master be?  It
> seems clear to me that it should WAIT.  Otherwise, a crash on the

That just means you want data high availability, not service HA.  Some
people want the *service* to stay available in such a situation.

> master now leaves you with transactions that were confirmed committed
> but not actually replicated to the standby.  If you were OK with that
> scenario, you would have used asynchronous replication in the first
> place.

What is so hard to understand in "worst case scenario" being different
than "expected conditions".  We all know that getting the last percent
is more expensive than getting the 99 first one.  We have no reason to
force people into building for the last percent whatever their context.

So, what cases need to be defined wrt forbidding the primary to continue
alone?
- in flight commit
  blocked until we can offer the asked for durability, wait forever
- shutdown request
  blocked until standby acknowledge the final checkpoint
  are immediate shutdown requests permitted? what do they do?

What other cases are to be fully designed here?  Please note that above
list is just a way to start the design, not a definitive proposal from
me.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Sync Rep v17

From

Andrew Dunstan

Date:

02 March 2011, 17:24:35


On 03/02/2011 04:13 PM, Simon Riggs wrote:
> On Wed, 2011-03-02 at 15:44 -0500, Andrew Dunstan wrote:
>> On 03/02/2011 03:39 PM, Simon Riggs wrote:
>>> Truly "synchronous" requires two-phase commit, which this never was. So
>>> the absence or presence of the poorly specified parameter called
>>> allow_standalone_primary should have no bearing on what we call this
>>> feature.
>>>
>> I haven't been following this very closely, but to me this screams out
>> that we simply must not call it "synchronous".
> As long as we describe it via its characteristics, then I'll be happy:
>
> * significantly reduces the possibility of data loss in a sensibly
> configured cluster
>
> * allow arbitrary N+k resilience that can meet and easily exceed
>   "5 nines" data durability
>
> * isn't two phase commit
>
> * isn't a magic bullet that will protect your data even after your
> hardware fails or is disconnected
>


Ok, so let's call it "enhanced safety" or something else that isn't a 
term of art.

cheers

andrew

Re: Sync Rep v17

From

Robert Haas

Date:

02 March 2011, 17:33:29

On Wed, Mar 2, 2011 at 4:19 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> 1. Everything is humming along.
>> 2. The network link between the master and standby drops.
>> 3. Then it comes back up again.
>>
>> After (2) and before (3), what should the behavior the master be?  It
>> seems clear to me that it should WAIT.  Otherwise, a crash on the
>
> That just means you want data high availability, not service HA.  Some
> people want the *service* to stay available in such a situation.
>
>> master now leaves you with transactions that were confirmed committed
>> but not actually replicated to the standby.  If you were OK with that
>> scenario, you would have used asynchronous replication in the first
>> place.
>
> What is so hard to understand in "worst case scenario" being different
> than "expected conditions".  We all know that getting the last percent
> is more expensive than getting the 99 first one.  We have no reason to
> force people into building for the last percent whatever their context.

I don't understand how synchronous replication with
allow_standalone_primary=on gives you ANY extra nines.  AFAICS, the
only point of having synchronous replication is that you wait to
acknowledge the commit to the client until the commit record has been
replicated.  Doing that only when the standby happens to be connected
doesn't seem like it helps much.

If the master is up, then it doesn't really matter what the standby
does; we don't need high availability in that case, because we have
just plain regular old availability.

If the master goes down, then we need to know that we haven't lost any
confirmed-committed transactions.  With allow_standalone_primary=off,
we don't know that.  They might be, or they might not be.  Even if we
have 100 separate standbys, there is no way of knowing whether there
was a time period just before the crash during which the master
couldn't get out to the Internet, and some commits by clients on the
local network went through.  Maybe with some careful network
engineering you can convince yourself that that isn't very likely, but
I sure wouldn't bet on it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Sync Rep v17

From

Yeb Havinga

Date:

02 March 2011, 18:11:02

On 2011-03-02 21:26, Kevin Grittner wrote:
>
> I think including "synchronous" is OK as long as it's properly
> qualified.  Off-hand thoughts in no particular order:
>
> semi-synchronous
> conditionally synchronous
> synchronous with automatic failover to standalone
It would be good to name the concept equal to how other DBMSses call it, 
if they have a similar concept - don't know if Mysql's semisynchronous 
replication is the same, but after a quick read it sounds like it does.

regards,
Yeb Havinga

Re: Sync Rep v17

From

Simon Riggs

Date:

02 March 2011, 18:19:19

On Wed, 2011-03-02 at 16:16 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On Wed, 2011-03-02 at 22:10 +0200, Heikki Linnakangas wrote:
> >> Fair enough. All I'm saying is that if we end up shipping without that 
> >> parameter (implying allow_standalone_primary=on), we need to call the 
> >> feature something else. The GUCs and code can probably stay as it is, 
> >> but we shouldn't use the term "synchronous replication" in the 
> >> documentation, and release notes and such.
> 
> > allow_standalone_primary=off means "wait forever". It does nothing to
> > reduce data loss since you can't replicate to a server that isn't there.
> 
> This is irrelevant to the point.  The point is that sync rep implies
> that we will not *tell a client* its data is committed unless the commit
> is down to disk in two places.  I agree with the people saying that not
> having this parameter makes it not real sync rep.

Depends what you think the point is.

Your comments go exactly to *my* point which is that the behaviour I'm
looking to commit maximises data durability *and* availability and is
the real choice that people will make. Claiming it "isn't real" will
make people scared of it, when its actually what they have been waiting
for. There is nothing half-arsed or unreal about what is being
delivered.

Let's not get hung up on textbook definitions, lets do the useful thing.

And to repeat, I am not against the other way. It has its place, in a
few cases. And I'm not against including it in this release either,
given we have a good definition.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

"Kevin Grittner"

Date:

02 March 2011, 18:20:05

Yeb Havinga <yebhavinga@gmail.com> wrote:
> On 2011-03-02 21:26, Kevin Grittner wrote:
>>
>> I think including "synchronous" is OK as long as it's properly
>> qualified.  Off-hand thoughts in no particular order:
>>
>> semi-synchronous
>> conditionally synchronous
>> synchronous with automatic failover to standalone
> It would be good to name the concept equal to how other DBMSses
> call it,  if they have a similar concept - don't know if Mysql's
> semisynchronous replication is the same, but after a quick read it
> sounds like it does.
I had no idea MySQL used that terminology; it just seemed apt for
describing a setup which is synchronous except when it isn't. Using
the same terminology for equivalent functionality has its pluses,
but might there be an trademark or other IP issues here?
-Kevin

Re: Sync Rep v17

From

Simon Riggs

Date:

02 March 2011, 18:23:41

On Wed, 2011-03-02 at 16:24 -0500, Andrew Dunstan wrote:
> 
> On 03/02/2011 04:13 PM, Simon Riggs wrote:
> > On Wed, 2011-03-02 at 15:44 -0500, Andrew Dunstan wrote:
> >> On 03/02/2011 03:39 PM, Simon Riggs wrote:
> >>> Truly "synchronous" requires two-phase commit, which this never was. So
> >>> the absence or presence of the poorly specified parameter called
> >>> allow_standalone_primary should have no bearing on what we call this
> >>> feature.
> >>>
> >> I haven't been following this very closely, but to me this screams out
> >> that we simply must not call it "synchronous".
> > As long as we describe it via its characteristics, then I'll be happy:
> >
> > * significantly reduces the possibility of data loss in a sensibly
> > configured cluster
> >
> > * allow arbitrary N+k resilience that can meet and easily exceed
> >   "5 nines" data durability
> >
> > * isn't two phase commit
> >
> > * isn't a magic bullet that will protect your data even after your
> > hardware fails or is disconnected
> >
> 
> 
> Ok, so let's call it "enhanced safety" or something else that isn't a 
> term of art.

Good plan.

Oracle avoided the whole issue by referring to the two modes as "maximum
availability" and "maximum protection". I'm not sure if that is patented
or copyright etc, but I'm guessing they had this exact same discussion,
just a little less friendly.

Perhaps we can coin the term "High Durability". 

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Fujii Masao

Date:

02 March 2011, 23:07:56

On Thu, Mar 3, 2011 at 5:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I agree.  I assumed that when Simon was talking about removing
> allow_standalone_primary, he meant making the code always behave as if
> it were turned OFF.

I feel the same thing.. Despite his saying, the patch implements
sync_replication_timeout_client, and if its value is too large,
the primary behaves like "wait-forever". Though the primary
behaves like "standalone" only when it's started first without
connected standbys. So if we have
sync_replication_timeout_client, allow_standalone_primary
looks no longer useful.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Fujii Masao

Date:

03 March 2011, 00:24:26

On Thu, Mar 3, 2011 at 6:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I don't understand how synchronous replication with
> allow_standalone_primary=on gives you ANY extra nines.

When you start the primary (or when there is one connected standby and
it crashes), allow_standalone_primary = on allows the database service
to proceed. OTOH, setting the parameter to off keeps the service stopping
until new standby has connected and has caught up with the primary. This
would cause long service down time, and decrease the availability.

Of course, running the primary alone has the risk. If its disk gets corrupted
before new standby appears, some committed transactions are lost. But
we can decrease this risk to a certain extent by using RAID or something
to the storage. So I think that some systems can accept the risk and prefer
the availability of the database service. Josh explained clearly before why
allow_standalone_primary = off is required for his case.
http://archives.postgresql.org/message-id/4CAE2488.9020207%40agliodbs.com

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Fujii Masao

Date:

03 March 2011, 00:35:55

On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> To achieve the effect Fujii is looking for, we would have to silently drop
> the connection. That would correctly leave the client not knowing whether
> the transaction committed or not.

Yeah, this seems to make more sense.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Sync Rep v17

From

Simon Riggs

Date:

03 March 2011, 02:14:11

On Thu, 2011-03-03 at 13:35 +0900, Fujii Masao wrote:
> On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
> > To achieve the effect Fujii is looking for, we would have to silently drop
> > the connection. That would correctly leave the client not knowing whether
> > the transaction committed or not.
> 
> Yeah, this seems to make more sense.

How do you propose we do that?

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Sync Rep v17

From

Tom Lane

Date:

03 March 2011, 03:14:15

Fujii Masao <masao.fujii@gmail.com> writes:
> On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> To achieve the effect Fujii is looking for, we would have to silently drop
>> the connection. That would correctly leave the client not knowing whether
>> the transaction committed or not.

> Yeah, this seems to make more sense.

It was pointed out that sending an ERROR would not do because it would
likely lead to client code assuming the transaction failed, which might
or might not be the case.  But maybe we could send a WARNING and then
close the connection?  That would give humans a clue what had happened,
but not do anything to the state of automated clients.
        regards, tom lane

Re: Sync Rep v17

From

Simon Riggs

Date:

03 March 2011, 04:25:51

On Thu, 2011-03-03 at 02:14 -0500, Tom Lane wrote:
> Fujii Masao <masao.fujii@gmail.com> writes:
> > On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas
> > <heikki.linnakangas@enterprisedb.com> wrote:
> >> To achieve the effect Fujii is looking for, we would have to silently drop
> >> the connection. That would correctly leave the client not knowing whether
> >> the transaction committed or not.
>
> > Yeah, this seems to make more sense.
>
> It was pointed out that sending an ERROR would not do because it would
> likely lead to client code assuming the transaction failed, which might
> or might not be the case.  But maybe we could send a WARNING and then
> close the connection?  That would give humans a clue what had happened,
> but not do anything to the state of automated clients.

So when we perform a Fast Shutdown we want to do something fairly
similar to quickdie()?

Please review the attached patch.

--
 Simon Riggs           http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services

Attachment

cancel_syncrep_at_shutdown.v1.patch

Re: Sync Rep v17

From

Dimitri Fontaine

Date:

03 March 2011, 04:58:38

Robert Haas <robertmhaas@gmail.com> writes:
> I don't understand how synchronous replication with
> allow_standalone_primary=on gives you ANY extra nines.  AFAICS, the
> only point of having synchronous replication is that you wait to
> acknowledge the commit to the client until the commit record has been
> replicated.  Doing that only when the standby happens to be connected
> doesn't seem like it helps much.

Because you're still thinking in terms of data availability, rather than
in terms of service availability.  With the later in mind, what you want
is to be able to continue servicing from the standby should the primary
crash, and you want a good guarantee about the standby's data.

Of course in such a situation you will have some monitoring to ensure
that the standby remains in sync, and you want to know that at failover
time.  But a standby failure, when you want service availability, should
never bring the service down.  It's what happens, though, if you've been
setting up *data* availability, because there there's no choice.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Sync Rep v17

From

Robert Haas

Date:

03 March 2011, 09:19:55

On Wed, Mar 2, 2011 at 5:10 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:
> On 2011-03-02 21:26, Kevin Grittner wrote:
>>
>> I think including "synchronous" is OK as long as it's properly
>> qualified.  Off-hand thoughts in no particular order:
>>
>> semi-synchronous
>> conditionally synchronous
>> synchronous with automatic failover to standalone
>
> It would be good to name the concept equal to how other DBMSses call it, if
> they have a similar concept - don't know if Mysql's semisynchronous
> replication is the same, but after a quick read it sounds like it does.

Here's the link:

http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html

I think this is mostly about how many slaves have to ack the commit.
It's not entirely clear to me what happens if a slave is set up but
not connected at the moment.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company