Thread: Sync Rep v17
Well, good news all round. v17 implements what I believe to be the final set of features for sync rep. This one I'm actually fairly happy with. It can be enjoyed best at DEBUG3. The patch is very lite touch on a few areas of code, plus a chunk of specific code, all on master-side. Pretty straight really. I'm sure problems will be found, its not long since I completed this; thanks to Daniel Farina for your help with patch assembly. Which is just as well, because the other news is that I'm off on holiday for a few days, which is most inconvenient. I won't be committing this for at least a week and absent from the list. OTOH, I think its ready for a final review and commit, so I'm OK if you commit or OK if you leave it for me. That's not the end of it. I can see a few things we could/should do in this release, but this includes all the must-do things. Docs could do with a little love also. So I expect work for me when I return. Config Summary ============== Most parameters are set on the primary. Set primary: synchronous_standby_names = 'node1, node2, node3' which means that whichever of those standbys connect first will become the main synchronous standby. Servers arriving later will be potential standbys (standby standbys doesn't sound right...). synchronous_standby_names can change at reload. Currently, the standby_name is the application_name parameter set in the primary_conninfo. When we set this for a client, or in postgresql.conf primary: synchronous_replication = on then we will wait at commit until the synchronous standby has reached the WAL location of our commit point. If the current synchronous standby dies then one of the other standbys will take over. (I think it would be a great idea to make the list a priority order, but I haven't had time to code that). If none of the standbys are available, then we don't wait at all if allow_standalone_primary is set. allow_standalone_primary can change at reload. Whatever happens, if you set sync_replication_timeout_client then backends will give up waiting if some WALsender doesn't wake them quickly enough. You can generally leave these parameters at their default settings primary: sync_replication_timeout_client = 120s primary: allow_standalone_primary = on standby: wal_receiver_status_interval = 10s -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
On 19 February 2011 00:06, Simon Riggs <simon@2ndquadrant.com> wrote: > > Well, good news all round. > > v17 implements what I believe to be the final set of features for sync > rep. This one I'm actually fairly happy with. It can be enjoyed best at > DEBUG3. > > The patch is very lite touch on a few areas of code, plus a chunk of > specific code, all on master-side. Pretty straight really. I'm sure > problems will be found, its not long since I completed this; thanks to > Daniel Farina for your help with patch assembly. > > Which is just as well, because the other news is that I'm off on holiday > for a few days, which is most inconvenient. I won't be committing this > for at least a week and absent from the list. OTOH, I think its ready > for a final review and commit, so I'm OK if you commit or OK if you > leave it for me. > > That's not the end of it. I can see a few things we could/should do in > this release, but this includes all the must-do things. Docs could do > with a little love also. So I expect work for me when I return. > > > > > Config Summary > ============== > > Most parameters are set on the primary. Set > > primary: synchronous_standby_names = 'node1, node2, node3' > > which means that whichever of those standbys connect first will > become the main synchronous standby. Servers arriving later will > be potential standbys (standby standbys doesn't sound right...). > synchronous_standby_names can change at reload. > > Currently, the standby_name is the application_name parameter > set in the primary_conninfo. > > When we set this for a client, or in postgresql.conf > > primary: synchronous_replication = on > > then we will wait at commit until the synchronous standby has > reached the WAL location of our commit point. > > If the current synchronous standby dies then one of the other standbys > will take over. (I think it would be a great idea to make the > list a priority order, but I haven't had time to code that). > > If none of the standbys are available, then we don't wait at all > if allow_standalone_primary is set. > allow_standalone_primary can change at reload. > > Whatever happens, if you set sync_replication_timeout_client > then backends will give up waiting if some WALsender doesn't > wake them quickly enough. > > You can generally leave these parameters at their default settings > > primary: sync_replication_timeout_client = 120s > primary: allow_standalone_primary = on > standby: wal_receiver_status_interval = 10s I see the updated patch I provided last time to fix various errata and spacing issues got lost in the last round of conversations. Maybe it's safer to provide a patch for the patch. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
On Sat, 2011-02-19 at 00:32 +0000, Thom Brown wrote: > I see the updated patch I provided last time to fix various errata and > spacing issues got lost in the last round of conversations. Maybe > it's safer to provide a patch for the patch. I'm sorry about that Thom, your changes were and are welcome. The docs were assembled rather quickly about 2 hours ago and we'll definitely need your care and attention to bring them to a good level of quality. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Excerpts from Thom Brown's message of vie feb 18 21:32:17 -0300 2011: > I see the updated patch I provided last time to fix various errata and > spacing issues got lost in the last round of conversations. Maybe > it's safer to provide a patch for the patch. interdiff? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Feb 18, 2011 at 7:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > The patch is very lite touch on a few areas of code, plus a chunk of > specific code, all on master-side. Pretty straight really. I'm sure > problems will be found, its not long since I completed this; thanks to > Daniel Farina for your help with patch assembly. This looks like it's in far better shape than the previous version. Thanks to you and Dan for working on it. As you have this coded, enabling synchronous_replication forces all transactions to commit synchronously. This means that, when synchronous_replication=on, you get synchronous_commit=on behavior even if you have synchronous_commit=off. I'd prefer to see us go the other way, and say that you can only get synchronous replication when you're also getting synchronous commit. Even if not, we should at least arrange the test so that we don't force synchronous commit for transactions that haven't written XLOG. That is, instead of: ((wrote_xlog && XactSyncCommit) || forceSyncCommit || nrels > 0 || SyncRepRequested()) ...we should instead say: ((write_xlog && (XactSyncCommit || SyncRepRequested()) || forceSyncCommit || nrels > 0) ...but as I say I'd be inclined to instead keep the existing test, and just skip SyncRepWaitForLSN() if we aren't committing synchronously. I'm unsure of the value of sync_replication_timeout_client (which incidentally is spelled replication_timeout_client in one place, and elsewhere asynchornus -> asynchronous). I think in practice if you start hitting that timeout, your server will slow to such a crawl that you might as well be down. On the other hand, I see no particular harm in leaving the option in there either, though I definitely think the default should be changed to -1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Simon Riggs wrote: > Most parameters are set on the primary. Set > > primary: synchronous_standby_names = 'node1, node2, node3' > > which means that whichever of those standbys connect first will > become the main synchronous standby. Servers arriving later will > be potential standbys (standby standbys doesn't sound right...). > synchronous_standby_names can change at reload. > > Currently, the standby_name is the application_name parameter > set in the primary_conninfo. > > When we set this for a client, or in postgresql.conf > > primary: synchronous_replication = on > > then we will wait at commit until the synchronous standby has > reached the WAL location of our commit point. > > If the current synchronous standby dies then one of the other standbys > will take over. (I think it would be a great idea to make the > list a priority order, but I haven't had time to code that). > > If none of the standbys are available, then we don't wait at all > if allow_standalone_primary is set. > allow_standalone_primary can change at reload. Are we going to allow a command to be run when these things happen, like archive_command does, or is that something for 9.2? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote: > On Fri, Feb 18, 2011 at 7:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > The patch is very lite touch on a few areas of code, plus a chunk of > > specific code, all on master-side. Pretty straight really. I'm sure > > problems will be found, its not long since I completed this; thanks to > > Daniel Farina for your help with patch assembly. > > This looks like it's in far better shape than the previous version. > Thanks to you and Dan for working on it. > > As you have this coded, enabling synchronous_replication forces all > transactions to commit synchronously. This means that, when > synchronous_replication=on, you get synchronous_commit=on behavior > even if you have synchronous_commit=off. I'd prefer to see us go the > other way, and say that you can only get synchronous replication when > you're also getting synchronous commit. First, we should be clear to explain that you are referring to the fact that the request synchronous_commit = off synchronous_replication = on makes no sense in the way the replication system is currently designed, even though it is a wish-list item to make it work in 9.2+ Since that combination makes no sense we need to decide how will we react when it is requested. I think fail-safe is the best way. We should make the default the safe way and allow performance options in non-default paths. Hence the above combination of options is taken in the patch to be the same as synchronous_commit = on synchronous_replication = on If you want performance, you can still get it with synchronous_commit = off synchronous_replication = off which does have meaning So the way the patch is coded makes most sense for a safety feature. I think it does need to be documented also. Must fly now... -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Fri, 2011-02-18 at 21:39 -0500, Bruce Momjian wrote: > Simon Riggs wrote: > > Most parameters are set on the primary. Set > > > > primary: synchronous_standby_names = 'node1, node2, node3' > > > > which means that whichever of those standbys connect first will > > become the main synchronous standby. Servers arriving later will > > be potential standbys (standby standbys doesn't sound right...). > > synchronous_standby_names can change at reload. > > > > Currently, the standby_name is the application_name parameter > > set in the primary_conninfo. > > > > When we set this for a client, or in postgresql.conf > > > > primary: synchronous_replication = on > > > > then we will wait at commit until the synchronous standby has > > reached the WAL location of our commit point. > > > > If the current synchronous standby dies then one of the other standbys > > will take over. (I think it would be a great idea to make the > > list a priority order, but I haven't had time to code that). > > > > If none of the standbys are available, then we don't wait at all > > if allow_standalone_primary is set. > > allow_standalone_primary can change at reload. > > Are we going to allow a command to be run when these things happen, like > archive_command does, or is that something for 9.2? Not really sure which events you're speaking of, sorry. As I write, I don't see a place or a reason to run a command, but that might change with a longer explanation of what you're thinking. I wouldn't rule out adding something like that in this release, but I'm not around for the next week to add it. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote: > On the other hand, I see no particular > harm in leaving the option in there either, though I definitely think > the default should be changed to -1. The default setting should be to *not* freeze up if you lose the standby. That behaviour unexpectedly leads to an effective server down situation, rather than 2 minutes of slow running. This is supposed to be High Availability software so it will be bad for us if we allow that. We've discussed this many times already. The people that wish to wait forever have their wait-forever option, but lets not force it on everyone. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Simon Riggs wrote: > On Fri, 2011-02-18 at 21:39 -0500, Bruce Momjian wrote: > > Simon Riggs wrote: > > > Most parameters are set on the primary. Set > > > > > > primary: synchronous_standby_names = 'node1, node2, node3' > > > > > > which means that whichever of those standbys connect first will > > > become the main synchronous standby. Servers arriving later will > > > be potential standbys (standby standbys doesn't sound right...). > > > synchronous_standby_names can change at reload. > > > > > > Currently, the standby_name is the application_name parameter > > > set in the primary_conninfo. > > > > > > When we set this for a client, or in postgresql.conf > > > > > > primary: synchronous_replication = on > > > > > > then we will wait at commit until the synchronous standby has > > > reached the WAL location of our commit point. > > > > > > If the current synchronous standby dies then one of the other standbys > > > will take over. (I think it would be a great idea to make the > > > list a priority order, but I haven't had time to code that). > > > > > > If none of the standbys are available, then we don't wait at all > > > if allow_standalone_primary is set. > > > allow_standalone_primary can change at reload. > > > > Are we going to allow a command to be run when these things happen, like > > archive_command does, or is that something for 9.2? > > Not really sure which events you're speaking of, sorry. As I write, I > don't see a place or a reason to run a command, but that might change > with a longer explanation of what you're thinking. I wouldn't rule out > adding something like that in this release, but I'm not around for the > next week to add it. Sorry. I was thinking of allowing a command to alert an administrator when we switch standby machines, or if we can't do synchronous standby any longer. I assume we put a message in the logs, and the admin could have a script that monitors that, but I thought a pager-like command would be helpful, though I don't think we do that in any other case, so maybe it is overkill. These are all optional ideas. Also, I like that you are using application name here. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 2/18/11 4:06 PM, Simon Riggs wrote: > v17 implements what I believe to be the final set of features for sync > rep. This one I'm actually fairly happy with. It can be enjoyed best at > DEBUG3. Yes, but what wine do I serve with it? ;-) -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Sat, Feb 19, 2011 at 15:23, Bruce Momjian <bruce@momjian.us> wrote: > Sorry. I was thinking of allowing a command to alert an administrator > when we switch standby machines, or if we can't do synchronous standby > any longer. I assume we put a message in the logs, and the admin could > have a script that monitors that, but I thought a pager-like command > would be helpful For asynchronous notifications we already have LISTEN/NOTIFY. Perhaps that could be used? Regards, Marti
Marti Raudsepp wrote: > On Sat, Feb 19, 2011 at 15:23, Bruce Momjian <bruce@momjian.us> wrote: > > Sorry. ?I was thinking of allowing a command to alert an administrator > > when we switch standby machines, or if we can't do synchronous standby > > any longer. ?I assume we put a message in the logs, and the admin could > > have a script that monitors that, but I thought a pager-like command > > would be helpful > > For asynchronous notifications we already have LISTEN/NOTIFY. Perhaps > that could be used? Well, those are going to require work to notify someone externally, like send an email alert. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Sat, Feb 19, 2011 at 6:05 PM, Bruce Momjian <bruce@momjian.us> wrote: > Marti Raudsepp wrote: >> On Sat, Feb 19, 2011 at 15:23, Bruce Momjian <bruce@momjian.us> wrote: >> > Sorry. ?I was thinking of allowing a command to alert an administrator >> > when we switch standby machines, or if we can't do synchronous standby >> > any longer. ?I assume we put a message in the logs, and the admin could >> > have a script that monitors that, but I thought a pager-like command >> > would be helpful >> >> For asynchronous notifications we already have LISTEN/NOTIFY. Perhaps >> that could be used? > > Well, those are going to require work to notify someone externally, like > send an email alert. > although, that seems the work for a monitor tool. -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL
On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > First, we should be clear to explain that you are referring to the fact > that the request > synchronous_commit = off > synchronous_replication = on > makes no sense in the way the replication system is currently designed, > even though it is a wish-list item to make it work in 9.2+ What exactly do you mean by "make it work"? We can either (1) wait for the local commit and the remote commit (synchronous_commit=on, synchronous_replication=on), (2) wait for the local commit only (synchronous_commit=on, synchronous_replication=off), or (3) wait for neither (synchronous_commit=off, synchronous_replication=off). There's no fourth possible behavior, AFAICS. The question is whether synchronous_commit=off, synchronous_replication=on should behave like (1) or (3); AFAICS there's no fourth possible behavior. You have it as #1; I'm arguing it should be #3. I realize it's an arguable point; I'm just arguing for what makes most sense to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Feb 19, 2011 at 3:35 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote: >> On the other hand, I see no particular >> harm in leaving the option in there either, though I definitely think >> the default should be changed to -1. > > The default setting should be to *not* freeze up if you lose the > standby. That behaviour unexpectedly leads to an effective server down > situation, rather than 2 minutes of slow running. My understanding is that the parameter will wait on every commit, not just once. There's no mechanism to do anything else. But I did some testing this evening and actually it appears to not work at all. I hit walreceiver with a SIGSTOP and the commit never completes, even after the two minute timeout. Also, when I restarted walreceiver after a long time, I got a server crash. DEBUG: write 0/3027BC8 flush 0/3014690 apply 0/3014690 DEBUG: released 0 procs up to 0/3014690 DEBUG: write 0/3027BC8 flush 0/3027BC8 apply 0/3014690 DEBUG: released 2 procs up to 0/3027BC8 WARNING: could not locate ourselves on wait queue server closed the connection unexpectedlyThis probably means the server terminated abnormallybefore or while processing therequest. The connection to the server was lost. Attempting reset: DEBUG: shmem_exit(-1): 0 callbacks to make DEBUG: proc_exit(-1): 0 callbacks to make FATAL: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Failed. !> LOG: record with zero length at 0/3027BC8 DEBUG: CommitTransaction DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children: DEBUG: received replication command: IDENTIFY_SYSTEM DEBUG: received replication command: START_REPLICATION 0/3000000 LOG: streaming replication successfully connected to primary DEBUG: standby "standby" is a potential synchronous standby DEBUG: write 0/0 flush 0/0 apply 0/3027BC8 DEBUG: released 0 procs up to 0/0 DEBUG: standby "standby" has now caught up with primary DEBUG: write 0/3027C18 flush 0/0 apply 0/3027BC8 DEBUG: standby "standby" is now the synchronous replication standby DEBUG: released 0 procs up to 0/0 DEBUG: write 0/3027C18 flush 0/3027C18 apply 0/3027BC8 DEBUG: released 0 procs up to 0/3027C18 DEBUG: write 0/3027C18 flush 0/3027C18 apply 0/3027C18 DEBUG: released 0 procs up to 0/3027C18 (lots more copies of those last two messages) I believe the problem is that the definition of IsOnSyncRepQueue is bogus, so that the loop in SyncRepWaitOnQueue always takes the first branch. It was a little confusing to me setting this up that setting only synchronous_replication did nothing; I had to also set synchronous_standby_names. We might need a cross-check there. I believe the docs for synchronous_replication also need some updating; this part appears to be out of date: + between primary and standby. The commit wait will last until the + first reply from any standby. Multiple standby servers allow + increased availability and possibly increase performance as well. The words "on the primary" in the next sentence may not be necessary any more either, as I believe this parameter now has no effect anywhere else. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Feb 19, 2011 at 10:52 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> First, we should be clear to explain that you are referring to the fact >> that the request >> synchronous_commit = off >> synchronous_replication = on >> makes no sense in the way the replication system is currently designed, >> even though it is a wish-list item to make it work in 9.2+ > > What exactly do you mean by "make it work"? We can either (1) wait > for the local commit and the remote commit (synchronous_commit=on, > synchronous_replication=on), (2) wait for the local commit only > (synchronous_commit=on, synchronous_replication=off), or (3) wait for > neither (synchronous_commit=off, synchronous_replication=off). > There's no fourth possible behavior, AFAICS. > > The question is whether synchronous_commit=off, > synchronous_replication=on should behave like (1) or (3); AFAICS > there's no fourth possible behavior. You have it as #1; I'm arguing > it should be #3. I realize it's an arguable point; I'm just arguing > for what makes most sense to me. > IMHO, we should stick to the safest option. considering that synchronous_replication to on means that we *want* durability, and that synchronous_commit to off means we don't *care* about durability. Then the real question here is: in the presence of synchronous_replication to on (which means we want durability), are we allowed to assume we can loss data? one way to manage that is simply disallow that combination with an error, maybe that is a bit strict but we haven't to make assumptions; the other option is to keep safe which means keep durability so if you want to risk some data then you should disable synchronous_replication as well as synchronous_commit... maybe sending a message to the log when you detect the conflicting situation. -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL
On Feb20, 2011, at 08:12 , Jaime Casanova wrote: > considering that synchronous_replication to on means that we *want* > durability, and that synchronous_commit to off means we don't *care* > about durability. Then the real question here is: in the presence of > synchronous_replication to on (which means we want durability), are we > allowed to assume we can loss data? From the angle, shouldn't we turn synchronous_replication=on into a third possible state of synchronous_commit? We'd then have synchronous_commit=off #Same as now synchronous_commit=local #Same as synchronous_commit=on, #synchronous_replication=off synchronous_commit=standby #Same as synchronous_commit=on, #synchronous_replication=on > one way to manage that is simply disallow that combination with an > error, maybe that is a bit strict but we haven't to make assumptions; > the other option is to keep safe which means keep durability so if you > want to risk some data then you should disable synchronous_replication > as well as synchronous_commit... maybe sending a message to the log > when you detect the conflicting situation. The question is where we'd raise the error, though. Doing it on GUC assignment makes the behaviour depend on assignment order. That's a bad idea I believe, since it possibly breaks ALTER ROLE/DATEBASE SET .... best regards, Florian Pflug
On Sun, Feb 20, 2011 at 7:20 AM, Florian Pflug <fgp@phlo.org> wrote: > On Feb20, 2011, at 08:12 , Jaime Casanova wrote: >> considering that synchronous_replication to on means that we *want* >> durability, and that synchronous_commit to off means we don't *care* >> about durability. Then the real question here is: in the presence of >> synchronous_replication to on (which means we want durability), are we >> allowed to assume we can loss data? > > From the angle, shouldn't we turn synchronous_replication=on into a third > possible state of synchronous_commit? > > We'd then have > > synchronous_commit=off #Same as now > synchronous_commit=local #Same as synchronous_commit=on, > #synchronous_replication=off > synchronous_commit=standby #Same as synchronous_commit=on, > #synchronous_replication=on > that would be a little confuse and difficult to document. at least i know that as far as we say something like this "to activate synchronous replication set synchronous commit to standby" users somehow will have the impression that locally the commit is asynchronous (ok, a have had bad experiences with Ecuadorian users ;) >> one way to manage that is simply disallow that combination with an >> error, maybe that is a bit strict but we haven't to make assumptions; >> the other option is to keep safe which means keep durability so if you >> want to risk some data then you should disable synchronous_replication >> as well as synchronous_commit... maybe sending a message to the log >> when you detect the conflicting situation. > > The question is where we'd raise the error, though. Doing it on GUC > assignment makes the behaviour depend on assignment order. That's a > bad idea I believe, since it possibly breaks ALTER ROLE/DATEBASE SET .... > well, yeah... maybe is just to much worthless work... but we can check before commit and send a NOTICE message -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL
On Feb21, 2011, at 00:09 , Jaime Casanova wrote: > On Sun, Feb 20, 2011 at 7:20 AM, Florian Pflug <fgp@phlo.org> wrote: >> On Feb20, 2011, at 08:12 , Jaime Casanova wrote: >>> considering that synchronous_replication to on means that we *want* >>> durability, and that synchronous_commit to off means we don't *care* >>> about durability. Then the real question here is: in the presence of >>> synchronous_replication to on (which means we want durability), are we >>> allowed to assume we can loss data? >> >> From the angle, shouldn't we turn synchronous_replication=on into a third >> possible state of synchronous_commit? >> >> We'd then have >> >> synchronous_commit=off #Same as now >> synchronous_commit=local #Same as synchronous_commit=on, >> #synchronous_replication=off >> synchronous_commit=standby #Same as synchronous_commit=on, >> #synchronous_replication=on >> > > that would be a little confuse and difficult to document. at least i > know that as far as we say something like this "to activate > synchronous replication set synchronous commit to standby" users > somehow will have the impression that locally the commit is > asynchronous (ok, a have had bad experiences with Ecuadorian users ;) I figured we'd say something like 'To guarantee durability of transactions except in the fail-over case, set synchronous_commit to "local". To additionally guarantee durability even in the case of a fail-over to a standby node, set synchronous_commit to "standby". This causes a COMMIT to only report success once the commit record has be acknowledged by the current synchronous standby node, and will therefore induce a higher latency of COMMIT requests.' Other possible names for the "standby" options that come to mind are "replication", "replica" or "replicate". Or maybe "cluster", but that seems confusing since we sometimes call the collection of databases managed by one postgres instance a cluster. best regards, Florian Pflug
On Sat, Feb 19, 2011 at 9:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Well, good news all round. > > v17 implements what I believe to be the final set of features for sync > rep. This one I'm actually fairly happy with. It can be enjoyed best at > DEBUG3. > > The patch is very lite touch on a few areas of code, plus a chunk of > specific code, all on master-side. Pretty straight really. I'm sure > problems will be found, its not long since I completed this; thanks to > Daniel Farina for your help with patch assembly. > > Which is just as well, because the other news is that I'm off on holiday > for a few days, which is most inconvenient. I won't be committing this > for at least a week and absent from the list. OTOH, I think its ready > for a final review and commit, so I'm OK if you commit or OK if you > leave it for me. Thanks for the patch! Here are the comments: PREPARE TRANSACTION and ROLLBACK PREPARED should wait for replication as well as COMMIT PREPARED? What if fast shutdown is requested while RecordTransactionCommit is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete until replication has been successfully done (i.e., until at least one synchronous standby has connected to the master especially if allow_standalone_primary is disabled). Is this OK? We should emit WARNING when the synchronous standby with wal_receiver_status_interval = 0 connects to the master. Because, in that case, a transaction unexpectedly would wait for replication infinitely. + /* Need a modifiable copy of string */ + rawstring = pstrdup(SyncRepStandbyNames); + + /* Parse string into list of identifiers */ + if (!SplitIdentifierString(rawstring, ',', &elemlist)) pfree(rawstring) and list_free(elemlist) should be called also if SplitIdentifierString returns TRUE. Otherwise, memory-leak would happen. + ereport(FATAL, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid list syntax for parameter \"synchronous_standby_names\""))); + return false; "return false" is not required here though that might be harmless. I've read about a tenth of the patch, so I'll submit another comments about the rest later. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
> Well, good news all round. > > v17 implements what I believe to be the final set of features for sync > rep. This one I'm actually fairly happy with. It can be enjoyed best at > DEBUG3. > > The patch is very lite touch on a few areas of code, plus a chunk of > specific code, all on master-side. Pretty straight really. I'm sure > problems will be found, its not long since I completed this; thanks to > Daniel Farina for your help with patch assembly. + <primary><varname>synchronous_standby_names</> configuration parameter</primary> + </indexterm> + <listitem> + <para> + Specifies a list of standby names that can become the sole + synchronous standby. Other standby servers connect that are also on + the list become potential standbys. If the current synchronous standby + goes away it will be replaced with one of the potential standbys. + Specifying more than one standby name can allow very high availability. + </para> Can anybody please enlighten me? I do not quite follow "Other standby servers connect that are also on the list become potential standbys" part. Can I read this as "Other standby servers that are also on the list become potential synchrnous standbys"? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
On Mon, Feb 21, 2011 at 4:35 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> Well, good news all round. Hello on this thread, I'm taking a look at replication timeout with non-blocking which would be "nice" but not required for this patch, in my understanding. But before that, we're going to put this patch through some exercise via pgbench to determine two things: * How much performance is impacted * Does it crash? As it will be somewhat hard to prove the durability guarantees of commit without special heroics, unless someone can suggest a mechanism. Expect some results Real Soon. -- fdr
On Tue, Feb 22, 2011 at 7:55 AM, Daniel Farina <daniel@heroku.com> wrote: > I'm taking a look at replication timeout with non-blocking which would > be "nice" but not required for this patch, in my understanding. Why do you think so? You think sync_replication_timeout_client is sufficient for sync rep? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Feb 21, 2011 at 6:06 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > I've read about a tenth of the patch, so I'll submit another comments > about the rest later. Here are another comments: SyncRepReleaseWaiters should be called when walsender exits. Otherwise, if the standby crashes while a transaction is waiting for replication, it waits infinitely. sync_rep_service and potential_sync_standby are not required to be in the WalSnd shmem because only walsender accesses them. +static bool +SyncRepServiceAvailable(void) +{ + bool result = false; + + SpinLockAcquire(&WalSndCtl->ctlmutex); + result = WalSndCtl->sync_rep_service_available; + SpinLockRelease(&WalSndCtl->ctlmutex); + + return result; +} volatile pointer needs to be used to prevent code rearrangement. + slock_t ctlmutex; /* locks shared variables shown above */ cltmutex should be initialized by calling SpinLockInit. + /* + * Stop providing the sync rep service, even if there are + * waiting backends. + */ + { + SpinLockAcquire(&WalSndCtl->ctlmutex); + WalSndCtl->sync_rep_service_available = false; + SpinLockRelease(&WalSndCtl->ctlmutex); + } sync_rep_service_available should be set to false only when there is no synchronous walsender. + /* + * When we first start replication the standby will be behind the primary. + * For some applications, for example, synchronous replication, it is + * important to have a clear state for this initial catchup mode, so we + * can trigger actions when we change streaming state later. We may stay + * in this state for a long time, which is exactly why we want to be + * able to monitor whether or not we are still here. + */ + WalSndSetState(WALSNDSTATE_CATCHUP); + The above has already been committed. Please remove that from the patch. I don't like calling SyncRepReleaseWaiters for each feedback because I guess that it's too frequent. How about receiving all the feedbacks available from the socket, and then calling SyncRepReleaseWaiters as well as walreceiver does? + bool ownLatch; /* do we own the above latch? */ We can just remove the ownLatch flag. I've read about two-tenths of the patch, so I'll submit another comments about the rest later. Sorry for the slow reviewing... Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Feb 21, 2011 at 9:35 PM, Tatsuo Ishii <ishii@postgresql.org> wrote: > + <primary><varname>synchronous_standby_names</> configuration parameter</primary> > + </indexterm> > + <listitem> > + <para> > + Specifies a list of standby names that can become the sole > + synchronous standby. Other standby servers connect that are also on > + the list become potential standbys. If the current synchronous standby > + goes away it will be replaced with one of the potential standbys. > + Specifying more than one standby name can allow very high availability. > + </para> > > Can anybody please enlighten me? I do not quite follow "Other standby > servers connect that are also on the list become potential standbys" > part. > > Can I read this as "Other standby servers that are also on the list > become potential synchrnous standbys"? I read that in the same way. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Feb 22, 2011 at 07:38, Fujii Masao <masao.fujii@gmail.com> wrote: > + SpinLockAcquire(&WalSndCtl->ctlmutex); > + result = WalSndCtl->sync_rep_service_available; > + SpinLockRelease(&WalSndCtl->ctlmutex); > volatile pointer needs to be used to prevent code rearrangement. I don't think that's necessary. Spinlock functions already prevent reordering using __asm__ __volatile__ Otherwise, surely they would be utterly broken? Regards, Marti
On Tuesday 22 February 2011 09:59:21 Marti Raudsepp wrote: > On Tue, Feb 22, 2011 at 07:38, Fujii Masao <masao.fujii@gmail.com> wrote: > > + SpinLockAcquire(&WalSndCtl->ctlmutex); > > + result = WalSndCtl->sync_rep_service_available; > > + SpinLockRelease(&WalSndCtl->ctlmutex); > > > > volatile pointer needs to be used to prevent code rearrangement. > > I don't think that's necessary. Spinlock functions already prevent > reordering using __asm__ __volatile__ > > Otherwise, surely they would be utterly broken? Its not the spinlock thats the problem but that "result" may be already loaded into a register. Thats not prohibited by __asm__ __volatile__. Andres
Daniel Farina wrote: > As it will be somewhat hard to prove the durability guarantees of > commit without special heroics, unless someone can suggest a > mechanism. Could you introduce a hack creating deterministic server side crashes in order to test this out? The simplest thing that comes to mind is a rule like "kick shared memory in the teeth to force a crash after every 100 commits", then see if #100 shows up as expected. Pick two different small numbers for the interval and you could probably put that on both sides to simulate all sorts of badness. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
Marti Raudsepp <marti@juffo.org> writes: > On Tue, Feb 22, 2011 at 07:38, Fujii Masao <masao.fujii@gmail.com> wrote: >> + SpinLockAcquire(&WalSndCtl->ctlmutex); >> + result = WalSndCtl->sync_rep_service_available; >> + SpinLockRelease(&WalSndCtl->ctlmutex); >> volatile pointer needs to be used to prevent code rearrangement. > I don't think that's necessary. Spinlock functions already prevent > reordering using __asm__ __volatile__ You're mistaken. We started using that volatile-pointer convention after noting that some compilers would misoptimize the code otherwise. It's not a problem with LWLock-protected stuff because the LWLock calls are actual out-of-line function calls, and the compiler knows it doesn't know what those functions might do. But gcc is a lot more willing to reorder stuff around asm operations, so you can't assume that SpinLockAcquire/SpinLockRelease are equally safe. The way to prevent optimization bugs is to make sure that the fetches/stores protected by a spinlock are done through volatile pointers. regards, tom lane
On Sat, Feb 19, 2011 at 11:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > DEBUG: write 0/3027BC8 flush 0/3014690 apply 0/3014690 > DEBUG: released 0 procs up to 0/3014690 > DEBUG: write 0/3027BC8 flush 0/3027BC8 apply 0/3014690 > DEBUG: released 2 procs up to 0/3027BC8 > WARNING: could not locate ourselves on wait queue > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: DEBUG: you can make this happen more easily, i just run "pgbench -n -c10 -j10 test" and qot that warning and sometimes a segmentation fault and sometimes a failed assertion and the problematic code starts at src/backend/replication/syncrep.c:277, here my suggestions on that code. still i get a failed assertion because of the second Assert (i think we should just remove that one) *************** SyncRepRemoveFromQueue(void) *** 288,299 **** if (proc->lwWaitLink == NULL) elog(WARNING, "could not locate ourselves on wait queue"); ! proc = proc->lwWaitLink; } if (proc->lwWaitLink == NULL) /* At tail */ { ! Assert(proc == MyProc); /* Remove ourselves from tail of queue */ Assert(queue->tail == MyProc); queue->tail = proc; --- 288,300 ---- if (proc->lwWaitLink == NULL) elog(WARNING, "could not locate ourselves on wait queue"); ! else ! proc = proc->lwWaitLink; } if (proc->lwWaitLink == NULL) /* At tail */ { ! Assert(proc != MyProc); /* Remove ourselves from tail of queue */ Assert(queue->tail == MyProc); queue->tail = proc; -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL
On Tue, Feb 22, 2011 at 5:04 AM, Greg Smith <greg@2ndquadrant.com> wrote: > Daniel Farina wrote: >> >> As it will be somewhat hard to prove the durability guarantees of >> commit without special heroics, unless someone can suggest a >> mechanism. > > Could you introduce a hack creating deterministic server side crashes in > order to test this out? The simplest thing that comes to mind is a rule > like "kick shared memory in the teeth to force a crash after every 100 > commits", then see if #100 shows up as expected. Pick two different small > numbers for the interval and you could probably put that on both sides to > simulate all sorts of badness. I probably could via function, would a kill -9 also be of interest to you? -- fdr
On Fri, Feb 18, 2011 at 4:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Well, good news all round. > > v17 implements what I believe to be the final set of features for sync > rep. This one I'm actually fairly happy with. It can be enjoyed best at > DEBUG3. I've been messing with this patch and am wondering if this behavior is expected: I've been frobbing the server around (I was messing around with the syncrep feature, but do not know if this is related just yet), and came upon a case I do not expect: it would appear that prior to establishing a connection to do streaming replication, the "startup process" (which is recovering) is very slowly catching up (or so it would be indicated by txid_current_snapshot()) and eating up enormous amounts of memory, such as 6GB at a time in RES, monotonically increasing. Furthermore, the incrementation of the txid_snapshot is very slow, and it doesn't seem like I'm coming close to making full use of my resources: cpu and block devices are not very busy. There may have been a brief spurt of pgbench activity that would generate such WAL traffic to replay. I have not done a hard shutdown to my knowledge, and the server does allow me to query relatively quickly as a standby. Looks like I'm about to hit 7+GB. Is there something I'm missing? -- fdr
On Wed, Feb 23, 2011 at 10:39 PM, Daniel Farina <daniel@heroku.com> wrote: > On Fri, Feb 18, 2011 at 4:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> >> Well, good news all round. >> >> v17 implements what I believe to be the final set of features for sync >> rep. This one I'm actually fairly happy with. It can be enjoyed best at >> DEBUG3. > > I've been messing with this patch and am wondering if this behavior is expected: > > I've been frobbing the server around (I was messing around with the > syncrep feature, but do not know if this is related just yet), and > came upon a case I do not expect: it would appear that prior to > establishing a connection to do streaming replication, the "startup > process" (which is recovering) is very slowly catching up (or so it > would be indicated by txid_current_snapshot()) and eating up enormous > amounts of memory, such as 6GB at a time in RES, monotonically > increasing. Furthermore, the incrementation of the txid_snapshot is > very slow, and it doesn't seem like I'm coming close to making full > use of my resources: cpu and block devices are not very busy. There > may have been a brief spurt of pgbench activity that would generate > such WAL traffic to replay. > > I have not done a hard shutdown to my knowledge, and the server does > allow me to query relatively quickly as a standby. Oh, yes, this reproduces past shutdowns/startups, and there's quite a few txids before I catch up. I'm also comfortable poking around with gdb (I have already recompiled with debugging symbols and optimizations off and was poking around, especially at MemoryContextStats(TopMemoryContext), but was not rewarded. -- fdr
On Tue, Feb 22, 2011 at 2:38 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > I've read about two-tenths of the patch, so I'll submit another comments > about the rest later. Sorry for the slow reviewing... Here are another comments: + {"synchronous_standby_names", PGC_SIGHUP, WAL_REPLICATION, + gettext_noop("List of potential standby names to synchronise with."), + NULL, + GUC_LIST_INPUT | GUC_IS_NAME Why did you add GUC_IS_NAME here? I don't think that it's reasonable to limit the length of this parameter to 63. Because dozens of standby names might be specified in the parameter. SyncRepQueue->qlock should be initialized by calling SpinLockInit? + * Portions Copyright (c) 2010-2010, PostgreSQL Global Development Group Typo: s/2010/2011 sync_replication_timeout_client would mess up the "wait-forever" option. So, when allow_standalone_primary is disabled, ISTM that sync_replication_timeout_client should have no effect. Please check max_wal_senders before calling SyncRepWaitForLSN for non-replication case. SyncRepRemoveFromQueue seems not to be as short-term as we can use the spinlock. Instead, LW lock should be used there. + old_status = get_ps_display(&len); + new_status = (char *) palloc(len + 21 + 1); + memcpy(new_status, old_status, len); + strcpy(new_status + len, " waiting for sync rep"); + set_ps_display(new_status, false); + new_status[len] = '\0'; /* truncate off " waiting" */ Updating the PS display should be skipped if update_process_title is false. + /* + * XXX extra code needed here to maintain sorted invariant. Yeah, such a code is required. I think that we can shorten the time it takes to find an insert position by searching the list backwards. Because the given LSN is expected to be relatively new in the queue. + * Our approach should be same as racing car - slow in, fast out. + */ Really? Even when removing the entry from the queue, we need to search the queue as well as we do in the add-entry case. Why don't you make walsenders remove the entry from the queue, instead? + long timeout = SyncRepGetWaitTimeout(); <snip> + bool timeout = false; <snip> + else if (timeout > 0 && + TimestampDifferenceExceeds(GetCurrentTransactionStopTimestamp(), + now, timeout)) + { + release = true; + timeout = true; + } You seem to mix up two "timeout" variables. + if (proc->lwWaitLink == MyProc) + { + /* + * Remove ourselves from middle of queue. + * No need to touch head or tail. + */ + proc->lwWaitLink = MyProc->lwWaitLink; + } When we find ourselves, we should break out of the loop soon, instead of continuing the loop to the end? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Feb 22, 2011 at 11:43 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote: > On Sat, Feb 19, 2011 at 11:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> DEBUG: write 0/3027BC8 flush 0/3014690 apply 0/3014690 >> DEBUG: released 0 procs up to 0/3014690 >> DEBUG: write 0/3027BC8 flush 0/3027BC8 apply 0/3014690 >> DEBUG: released 2 procs up to 0/3027BC8 >> WARNING: could not locate ourselves on wait queue >> server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> The connection to the server was lost. Attempting reset: DEBUG: > > you can make this happen more easily, i just run "pgbench -n -c10 -j10 > test" and qot that warning and sometimes a segmentation fault and > sometimes a failed assertion I have also reproduced this. Notably, things seem fine as long as pgbench is confined to one backend, but as soon as two are used (-c 2) by the feature I can get segfaults. In the UI department, I am finding it somewhat difficult to affirm that I am, in fact, synchronously replicating anything in the HA scenario (where I never want to block. However, by enjoying the patch at DEBUG3 and running what I think to be syncrepped and non-syncrepped runs, I believe that I am not committing user error (seeing syncrep waiting vs. lack thereof). This is in part hard to confirm because the single-backend performance (if DEBUG3 is to be believed, I will write a real torture test later) of syncrep is actually very good, I was expecting a more perceptible performance dropoff. But then again, I imagine the real kicker will happen when we can run concurrent backends. Also, Amazon EBS doesn't have the fastest disks, and within an availability zone network latency is awfully low. I can't quite explain what I was seeing before w.r.t. memory usage, and more pressingly, a very slow recover. I turned off hot standby and was messing around and, before I knew it, the server was caught up. I do not know if that was just coincidence (probably) or overhead imposed by HS. The very high RES number was linux fooling me, as most of it was SHR and in SHMEM. -- fdr
With some more fooling around, I have also managed to get this elog(WARNING) if (proc->lwWaitLink == NULL) elog(WARNING, "could not locate ourselves on wait queue"); -- fdr
On Mon, Feb 21, 2011 at 4:06 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > > PREPARE TRANSACTION and ROLLBACK PREPARED should wait for > replication as well as COMMIT PREPARED? > maybe ROLLBACK PREPARED but i'm not sure... i'm pretty sure we don't need to wait for PREPARE TRANSACTION, but i could be wrong > What if fast shutdown is requested while RecordTransactionCommit > is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete > until replication has been successfully done (i.e., until at least one > synchronous standby has connected to the master especially if > allow_standalone_primary is disabled). Is this OK? > i guess that is debatable, IMHO if there is a synch standby then wait (because the user request durability of the data), if there isn't a synch rep wait until the timeout (even if allow_standalone_primary is disabled) because probably this is a miss configuration or an special case i'm trying to handle (network broke, data center of standbies explode, etc). > We should emit WARNING when the synchronous standby with > wal_receiver_status_interval = 0 connects to the master. Because, > in that case, a transaction unexpectedly would wait for replication > infinitely. > actually i think we should reject such standby as a synch standby, and look for another one in the synchronous_standby_names list -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL
On 2011-02-22 20:43, Jaime Casanova wrote: > > you can make this happen more easily, i just run "pgbench -n -c10 -j10 > test" and qot that warning and sometimes a segmentation fault and > sometimes a failed assertion > > and the problematic code starts at > src/backend/replication/syncrep.c:277, here my suggestions on that > code. > still i get a failed assertion because of the second Assert (i think > we should just remove that one) > > *************** SyncRepRemoveFromQueue(void) > *** 288,299 **** > > if (proc->lwWaitLink == NULL) > elog(WARNING, "could not locate > ourselves on wait queue"); > ! proc = proc->lwWaitLink; > } > > if (proc->lwWaitLink == NULL) /* At tail */ > { > ! Assert(proc == MyProc); > /* Remove ourselves from tail of queue */ > Assert(queue->tail == MyProc); > queue->tail = proc; > --- 288,300 ---- > > if (proc->lwWaitLink == NULL) > elog(WARNING, "could not locate > ourselves on wait queue"); > ! else > ! proc = proc->lwWaitLink; > } > > if (proc->lwWaitLink == NULL) /* At tail */ > { > ! Assert(proc != MyProc); > /* Remove ourselves from tail of queue */ > Assert(queue->tail == MyProc); > queue->tail = proc; > I also did some initial testing on this patch and got the queue related errors with > 1 clients. With the code change from Jaime above I still got a lot of 'not on queue warnings'. I tried to understand how the queue was supposed to work - resulting in the changes below that also incorporates a suggestion from Fujii upthread, to early exit when myproc was found. With the changes below all seems to work without warnings. I now see that the note about the list invariant is too short, better was: "if queue length = 1 then head = tail" --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -274,6 +274,8 @@ SyncRepRemoveFromQueue(void) } else { + bool found = false; + while (proc->lwWaitLink != NULL) { /* Are we the next proc in ourtraversal of the queue? */ @@ -284,17 +286,19 @@ SyncRepRemoveFromQueue(void) * No need to touch head or tail. */ proc->lwWaitLink = MyProc->lwWaitLink; + found = true; + break; } - if (proc->lwWaitLink == NULL) - elog(WARNING, "could not locate ourselves on wait queue"); proc = proc->lwWaitLink; } + if (!found) + elog(WARNING, "could not locate ourselves on wait queue"); - if (proc->lwWaitLink == NULL) /* At tail */ + /* If MyProc was removed from the tail, maintain list invariant head==tail */ + if (proc->lwWaitLink == NULL) { - Assert(proc == MyProc); - /* Remove ourselves from tail of queue */ + Assert(proc != MyProc); /* impossible since that is the head=MyProc branch above */ Assert(queue->tail == MyProc); queue->tail= proc; proc->lwWaitLink = NULL; I needed to add this to make the documentation compile --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF; You should also consider setting <varname>hot_standby_feedback</> as an alternative to using this parameter. </para> + </listitem> + </varlistentry> + </variablelist></sect2> <sect2 id="runtime-config-sync-rep"> regards, Yeb Havinga
On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havinga <yebhavinga@gmail.com> wrote: >> > I also did some initial testing on this patch and got the queue related > errors with > 1 clients. With the code change from Jaime above I still got a > lot of 'not on queue warnings'. > > I tried to understand how the queue was supposed to work - resulting in the > changes below that also incorporates a suggestion from Fujii upthread, to > early exit when myproc was found. > yes, looking at the code, the warning and your patch... it seems yours is the right solution... I'm compiling right now to test again and see the effects, Robert maybe you can test your failure case again? i'm really sure it's related to this... -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL
On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote: > Oh, yes, this reproduces past shutdowns/startups, and there's quite a > few txids before I catch up. I'm also comfortable poking around with > gdb (I have already recompiled with debugging symbols and > optimizations off and was poking around, especially at > MemoryContextStats(TopMemoryContext), but was not rewarded. Where is all of that memory going during recovery? Recovery shouldn't use much memory at all, as far as I can tell. What's even allocating memory at all? Regards,Jeff Davis
On Fri, Feb 25, 2011 at 4:52 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote: >> Oh, yes, this reproduces past shutdowns/startups, and there's quite a >> few txids before I catch up. I'm also comfortable poking around with >> gdb (I have already recompiled with debugging symbols and >> optimizations off and was poking around, especially at >> MemoryContextStats(TopMemoryContext), but was not rewarded. > > Where is all of that memory going during recovery? Recovery shouldn't > use much memory at all, as far as I can tell. > > What's even allocating memory at all? I noticed this is RSS fooling with me. As pages get touched in shared memory, for some reason RSS was constantly getting increased, along with SHR at the same time. Still, the long recovery time was mystifying to me, considering the lack of unclean shutdowns. -- fdr
On 2011-02-25 20:40, Jaime Casanova wrote: > On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havinga<yebhavinga@gmail.com> wrote: >> I also did some initial testing on this patch and got the queue related >> errors with> 1 clients. With the code change from Jaime above I still got a >> lot of 'not on queue warnings'. >> >> I tried to understand how the queue was supposed to work - resulting in the >> changes below that also incorporates a suggestion from Fujii upthread, to >> early exit when myproc was found > yes, looking at the code, the warning and your patch... it seems yours > is the right solution... > I'm compiling right now to test again and see the effects, Robert > maybe you can test your failure case again? i'm really sure it's > related to this... I did some more testing over the weekend with this patched v17 patch. Since you've posted a v18 patch, let me write some findings with the v17 patch before continuing with the v18 patch. The tests were done on a x86_64 platform, 1Gbit network interfaces, 3 servers. Non default configuration changes are copy pasted at the end of this mail. 1) no automatic switch to other synchronous standby - start master server, add synchronous standby 1 - change allow_standalone_primary to off - add second synchronous standby - wait until pg_stat_replication shows both standby's are in STREAMING state - stop standby 1 what happens is that the master stalls, where I expected that it would've switched to standby 2 acknowledge commits. The following thing was pilot error, but since I was test-piloting a new plane, I still think it might be usual feedback. In my opinion, any number and order of pg_ctl stops and starts on both the master and standby servers, as long as they are not with -m immediate, should never cause the state I reached. 2) reaching some sort of shutdown deadlock state - start master server, add synchronous standby - change allow_standalone_primary to off then I did all sorts of test things, everything still ok. Then I wanted to shutdown everything, and maybe because of some symmetry (stack like) I did the following because I didn't think it through - pg_ctl stop on standby (didn't actualy wait until done, but immediately in other terminal) - pg_ctl stop on master O wait.. master needs to sync transactions - start standby again. but now: FATAL: the database system is shutting down There is no clean way to get out of this situation. allow_standalone_primary in the face of shutdowns might be tricky. Maybe shutdown must be prohibited to enter the shutting down phase in allow_standalone_primary = off together with no sync standby, that would allow for the sync standby to attach again. 3) PANIC on standby server At some point a standby suddenly disconnected after I started a new pgbench run on a existing master/standby pair, with the following error in the logfile. LOCATION: libpqrcv_connect, libpqwalreceiver.c:171 PANIC: XX000: heap_update_redo: failed to add tuple CONTEXT: xlog redo hot_update: rel 1663/16411/16424; tid 305453/15; new 305453/102 LOCATION: heap_xlog_update, heapam.c:4724 LOG: 00000: startup process (PID 32597) was terminated by signal 6: Aborted This might be due to pilot error as well; I did a several tests over the weekend and after this error I was more alert on remembering immediate shutdowns/starting with a clean backup after that, and didn't see similar errors since. 4) The performance of the syncrep seems to be quite an improvement over the previous syncrep patches, I've seen tps-ses of O(650) where the others were more like O(20). The O(650) tps is limited by the speed of the standby server I used-at several times the master would halt only because of heavy disk activity at the standby. A warning in the docs might be right: be sure to use good IO hardware for your synchronous replicas! With that bottleneck gone, I suspect the current syncrep version can go beyond 1000tps over 1 Gbit. regards, Yeb Havinga recovery.conf: standby_mode = 'on' primary_conninfo = 'host=mg73 user=repuser password=pwd application_name=standby1' trigger_file = '/tmp/postgresql.trigger.5432' postgresql.conf nondefault parameters: log_error_verbosity = verbose log_min_messages = warning log_min_error_statement = warning listen_addresses = '*' # what IP address(es) to listen on; search_path='\"$user\", public, hl7' archive_mode = on archive_command = 'test ! -f /data/backup_in_progress || cp -i %p /archive/%f < /dev/null' checkpoint_completion_target = 0.9 checkpoint_segments = 16 default_statistics_target = 500 constraint_exclusion = on max_connections = 120 maintenance_work_mem = 128MB effective_cache_size = 1GB work_mem = 44MB wal_buffers = 8MB shared_buffers = 128MB wal_level = 'archive' max_wal_senders = 4 wal_keep_segments = 1000 # 16000MB (for production increase this) synchronous_standby_names = 'standby1,standby2,standby3' synchronous_replication = on allow_standalone_primary = off
On Mon, 2011-02-21 at 18:06 +0900, Fujii Masao wrote: > Thanks for the patch! Thanks for the review. Code available at git://github.com/simon2ndQuadrant/postgres.git > PREPARE TRANSACTION and ROLLBACK PREPARED should wait for > replication as well as COMMIT PREPARED? PREPARE - Yes ROLLBACK - No Further discussion welcome > What if fast shutdown is requested while RecordTransactionCommit > is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete > until replication has been successfully done (i.e., until at least one > synchronous standby has connected to the master especially if > allow_standalone_primary is disabled). Is this OK? A "behaviour" - important, though needs further discussion. > We should emit WARNING when the synchronous standby with > wal_receiver_status_interval = 0 connects to the master. Because, > in that case, a transaction unexpectedly would wait for replication > infinitely. This can't happen because a WALSender only activates as a sync standby once it has received a reply from the chosen standby. > + /* Need a modifiable copy of string */ > + rawstring = pstrdup(SyncRepStandbyNames); > + > + /* Parse string into list of identifiers */ > + if (!SplitIdentifierString(rawstring, ',', &elemlist)) > > pfree(rawstring) and list_free(elemlist) should be called also if > SplitIdentifierString returns TRUE. Otherwise, memory-leak would > happen. Fixed, thanks > + ereport(FATAL, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("invalid list syntax for parameter > \"synchronous_standby_names\""))); > + return false; > > "return false" is not required here though that might be harmless. Compiler likes it. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Mon, 2011-02-28 at 10:31 +0100, Yeb Havinga wrote: > 1) no automatic switch to other synchronous standby > - start master server, add synchronous standby 1 > - change allow_standalone_primary to off > - add second synchronous standby > - wait until pg_stat_replication shows both standby's are in STREAMING state > - stop standby 1 > what happens is that the master stalls, where I expected that it > would've switched to standby 2 acknowledge commits. > > The following thing was pilot error, but since I was test-piloting a new > plane, I still think it might be usual feedback. In my opinion, any > number and order of pg_ctl stops and starts on both the master and > standby servers, as long as they are not with -m immediate, should never > cause the state I reached. The behaviour of "allow_synchronous_standby = off" is pretty much untested and does seem to have various gotchas in there. > 2) reaching some sort of shutdown deadlock state > - start master server, add synchronous standby > - change allow_standalone_primary to off > then I did all sorts of test things, everything still ok. Then I wanted > to shutdown everything, and maybe because of some symmetry (stack like) > I did the following because I didn't think it through > - pg_ctl stop on standby (didn't actualy wait until done, but > immediately in other terminal) > - pg_ctl stop on master > O wait.. master needs to sync transactions > - start standby again. but now: FATAL: the database system is shutting down > > There is no clean way to get out of this situation. > allow_standalone_primary in the face of shutdowns might be tricky. Maybe > shutdown must be prohibited to enter the shutting down phase in > allow_standalone_primary = off together with no sync standby, that would > allow for the sync standby to attach again. The behaviour of "allow_synchronous_standby = off" is not something I'm worried about personally and I've argued all along it sounds pretty silly to me. If someone wants to spend some time defining how it *should* work that might help matters. I'm inclined to remove it before commit if it can't work cleanly, to be re-added at a later date if it makes sense. > > 3) PANIC on standby server > At some point a standby suddenly disconnected after I started a new > pgbench run on a existing master/standby pair, with the following error > in the logfile. > > LOCATION: libpqrcv_connect, libpqwalreceiver.c:171 > PANIC: XX000: heap_update_redo: failed to add tuple > CONTEXT: xlog redo hot_update: rel 1663/16411/16424; tid 305453/15; new > 305453/102 > LOCATION: heap_xlog_update, heapam.c:4724 > LOG: 00000: startup process (PID 32597) was terminated by signal 6: Aborted > > This might be due to pilot error as well; I did a several tests over the > weekend and after this error I was more alert on remembering immediate > shutdowns/starting with a clean backup after that, and didn't see > similar errors since. Good. There are no changes in the patch for that section of code. > 4) The performance of the syncrep seems to be quite an improvement over > the previous syncrep patches, I've seen tps-ses of O(650) where the > others were more like O(20). The O(650) tps is limited by the speed of > the standby server I used-at several times the master would halt only > because of heavy disk activity at the standby. A warning in the docs > might be right: be sure to use good IO hardware for your synchronous > replicas! With that bottleneck gone, I suspect the current syncrep > version can go beyond 1000tps over 1 Gbit. Good, thanks. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Thu, 2011-02-24 at 22:08 +0900, Fujii Masao wrote: > On Tue, Feb 22, 2011 at 2:38 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > > I've read about two-tenths of the patch, so I'll submit another comments > > about the rest later. Sorry for the slow reviewing... > > Here are another comments: Thanks for your comments Code available at git://github.com/simon2ndQuadrant/postgres.git > + {"synchronous_standby_names", PGC_SIGHUP, WAL_REPLICATION, > + gettext_noop("List of potential standby names to synchronise with."), > + NULL, > + GUC_LIST_INPUT | GUC_IS_NAME > > Why did you add GUC_IS_NAME here? I don't think that it's reasonable > to limit the length of this parameter to 63. Because dozens of standby > names might be specified in the parameter. OK, misunderstanding by me causing bug. Fixed > SyncRepQueue->qlock should be initialized by calling SpinLockInit? Fixed > + * Portions Copyright (c) 2010-2010, PostgreSQL Global Development Group > > Typo: s/2010/2011 Fixed > sync_replication_timeout_client would mess up the "wait-forever" option. > So, when allow_standalone_primary is disabled, ISTM that > sync_replication_timeout_client should have no effect. Agreed, done. > Please check max_wal_senders before calling SyncRepWaitForLSN for > non-replication case. SyncRepWaitForLSN() handles this > SyncRepRemoveFromQueue seems not to be as short-term as we can > use the spinlock. Instead, LW lock should be used there. > > + old_status = get_ps_display(&len); > + new_status = (char *) palloc(len + 21 + 1); > + memcpy(new_status, old_status, len); > + strcpy(new_status + len, " waiting for sync rep"); > + set_ps_display(new_status, false); > + new_status[len] = '\0'; /* truncate off " waiting" */ > > Updating the PS display should be skipped if update_process_title is false. Fixed. > + /* > + * XXX extra code needed here to maintain sorted invariant. > > Yeah, such a code is required. I think that we can shorten the time > it takes to find an insert position by searching the list backwards. > Because the given LSN is expected to be relatively new in the queue. Sure, just skipped that because of time pressure. Will add. > + * Our approach should be same as racing car - slow in, fast out. > + */ > > Really? Even when removing the entry from the queue, we need > to search the queue as well as we do in the add-entry case. > Why don't you make walsenders remove the entry from the queue, > instead? This models wakeup behaviour of LWlocks > + long timeout = SyncRepGetWaitTimeout(); > <snip> > + bool timeout = false; > <snip> > + else if (timeout > 0 && > + TimestampDifferenceExceeds(GetCurrentTransactionStopTimestamp(), > + now, timeout)) > + { > + release = true; > + timeout = true; > + } > > You seem to mix up two "timeout" variables. Yes, good catch. Fixed. > + if (proc->lwWaitLink == MyProc) > + { > + /* > + * Remove ourselves from middle of queue. > + * No need to touch head or tail. > + */ > + proc->lwWaitLink = MyProc->lwWaitLink; > + } > > When we find ourselves, we should break out of the loop soon, > instead of continuing the loop to the end? Incorporated in Yeb's patch -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Tue, 2011-02-22 at 14:38 +0900, Fujii Masao wrote: > On Mon, Feb 21, 2011 at 6:06 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > > I've read about a tenth of the patch, so I'll submit another comments > > about the rest later. > > Here are another comments: Thanks for your comments Code available at git://github.com/simon2ndQuadrant/postgres.git > SyncRepReleaseWaiters should be called when walsender exits. Otherwise, > if the standby crashes while a transaction is waiting for replication, > it waits infinitely. Will think on this. > sync_rep_service and potential_sync_standby are not required to be in the > WalSnd shmem because only walsender accesses them. For use in debug, if not later monitoring > +static bool > +SyncRepServiceAvailable(void) > +{ > + bool result = false; > + > + SpinLockAcquire(&WalSndCtl->ctlmutex); > + result = WalSndCtl->sync_rep_service_available; > + SpinLockRelease(&WalSndCtl->ctlmutex); > + > + return result; > +} Fixed > volatile pointer needs to be used to prevent code rearrangement. > > + slock_t ctlmutex; /* locks shared variables shown above */ > > cltmutex should be initialized by calling SpinLockInit. Fixed > + /* > + * Stop providing the sync rep service, even if there are > + * waiting backends. > + */ > + { > + SpinLockAcquire(&WalSndCtl->ctlmutex); > + WalSndCtl->sync_rep_service_available = false; > + SpinLockRelease(&WalSndCtl->ctlmutex); > + } > > sync_rep_service_available should be set to false only when > there is no synchronous walsender. The way I had it is "correct" because "if (MyWalSnd->sync_rep_service)" then if we're the sync walsender, so if we stop being it, then there isn't one. A potential walsender might then become the sync walsender. I think you'd like it if there was no gap at the point the potential wal sender takes over? Just not sure how to do that robustly at present. Will think some more. > + /* > + * When we first start replication the standby will be behind the primary. > + * For some applications, for example, synchronous replication, it is > + * important to have a clear state for this initial catchup mode, so we > + * can trigger actions when we change streaming state later. We may stay > + * in this state for a long time, which is exactly why we want to be > + * able to monitor whether or not we are still here. > + */ > + WalSndSetState(WALSNDSTATE_CATCHUP); > + > > The above has already been committed. Please remove that from the patch. Removed > I don't like calling SyncRepReleaseWaiters for each feedback because > I guess that it's too frequent. How about receiving all the feedbacks available > from the socket, and then calling SyncRepReleaseWaiters as well as > walreceiver does? Possibly, but an optimisation for later when we have behaviour correct. > + bool ownLatch; /* do we own the above latch? */ > > We can just remove the ownLatch flag. Agreed, removed -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Fri, 2011-02-25 at 16:41 +0100, Yeb Havinga wrote: > --- a/src/backend/replication/syncrep.c > +++ b/src/backend/replication/syncrep.c > @@ -274,6 +274,8 @@ SyncRepRemoveFromQueue(void) > } > else > { > + bool found = false; > + > while (proc->lwWaitLink != NULL) > { > /* Are we the next proc in our traversal of the > queue? */ > @@ -284,17 +286,19 @@ SyncRepRemoveFromQueue(void) > * No need to touch head or tail. > */ > proc->lwWaitLink = MyProc->lwWaitLink; > + found = true; > + break; > } > > - if (proc->lwWaitLink == NULL) > - elog(WARNING, "could not locate > ourselves on wait queue"); > proc = proc->lwWaitLink; > } > + if (!found) > + elog(WARNING, "could not locate ourselves on > wait queue"); > > - if (proc->lwWaitLink == NULL) /* At tail */ > + /* If MyProc was removed from the tail, maintain list > invariant head==tail */ > + if (proc->lwWaitLink == NULL) > { > - Assert(proc == MyProc); > - /* Remove ourselves from tail of queue */ > + Assert(proc != MyProc); /* impossible since that > is the head=MyProc branch above */ > Assert(queue->tail == MyProc); > queue->tail = proc; > proc->lwWaitLink = NULL; Used your suggested fix Code available at git://github.com/simon2ndQuadrant/postgres.git > I needed to add this to make the documentation compile > > --- a/doc/src/sgml/config.sgml > +++ b/doc/src/sgml/config.sgml > @@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF; > You should also consider setting <varname>hot_standby_feedback</> > as an alternative to using this parameter. > </para> > + </listitem> > + </varlistentry> > + </variablelist></sect2> > > <sect2 id="runtime-config-sync-rep"> Separate bug, will fix -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Thu, 2011-02-24 at 18:13 -0800, Daniel Farina wrote: > I have also reproduced this. Notably, things seem fine as long as > pgbench is confined to one backend, but as soon as two are used (-c 2) > by the feature I can get segfaults. Sorry that you all experienced this. I wasn't able to get concurrent queue accesses even with -c 8, so I spent about half a day last week investigating a possible spinlock locking flaw. That meant the code in that area was untested, which is most obvious now. I guess that means I should test on different hardware in future. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Mon, 2011-02-21 at 21:35 +0900, Tatsuo Ishii wrote: > > Well, good news all round. > > > > v17 implements what I believe to be the final set of features for sync > > rep. This one I'm actually fairly happy with. It can be enjoyed best at > > DEBUG3. > > > > The patch is very lite touch on a few areas of code, plus a chunk of > > specific code, all on master-side. Pretty straight really. I'm sure > > problems will be found, its not long since I completed this; thanks to > > Daniel Farina for your help with patch assembly. > > + <primary><varname>synchronous_standby_names</> configuration parameter</primary> > + </indexterm> > + <listitem> > + <para> > + Specifies a list of standby names that can become the sole > + synchronous standby. Other standby servers connect that are also on > + the list become potential standbys. If the current synchronous standby > + goes away it will be replaced with one of the potential standbys. > + Specifying more than one standby name can allow very high availability. > + </para> > > Can anybody please enlighten me? I do not quite follow "Other standby > servers connect that are also on the list become potential standbys" > part. > > Can I read this as "Other standby servers that are also on the list > become potential synchrnous standbys"? Yes I have reworded it to see if that improves the explanation Code available at git://github.com/simon2ndQuadrant/postgres.git untagged text included here for clarity synchronous_standby_names Specifies a list of standby names that can become the solesynchronous standby. At any time there can be only one synchronousstandbyserver. The first standby to connect that is listed herewill become the synchronous standby server. Otherstandby serversthat connect will then become potential synchronous standbys.If the current synchronous standby disconnectsfor whatever reasonit will be replaced with one of the potential standbys.Specifying more than one standby namecan allow very high availability. The standby name is currently taken as the application_name of thestandby, as set in the primary_conninfo on the standby.Names arenot enforced for uniqueness, though clearly that could lead toconfusion and misconfiguration. Specifyingmultiple standbys with thesame name does not allow more than one standby to be the currentsynchronous standby. If a standby is removed from the list of servers then it will stopbeing the synchronous standby, allowing another to takeit's place.Standbys may also be added to the list without restarting the server. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Fri, 2011-02-25 at 16:41 +0100, Yeb Havinga wrote: > I needed to add this to make the documentation compile > > --- a/doc/src/sgml/config.sgml > +++ b/doc/src/sgml/config.sgml > @@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF; > You should also consider setting > <varname>hot_standby_feedback</> > as an alternative to using this parameter. > </para> > + </listitem> > + </varlistentry> > + </variablelist></sect2> > > <sect2 id="runtime-config-sync-rep"> Corrected, thanks. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Sat, 2011-02-19 at 22:52 -0500, Robert Haas wrote: > On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > First, we should be clear to explain that you are referring to the fact > > that the request > > synchronous_commit = off > > synchronous_replication = on > > makes no sense in the way the replication system is currently designed, > > even though it is a wish-list item to make it work in 9.2+ > > What exactly do you mean by "make it work"? We can either (1) wait > for the local commit and the remote commit (synchronous_commit=on, > synchronous_replication=on), (2) wait for the local commit only > (synchronous_commit=on, synchronous_replication=off), or (3) wait for > neither (synchronous_commit=off, synchronous_replication=off). > There's no fourth possible behavior, AFAICS. Currently, no, since as we discussed earlier we currently need to fsync WAL locally before it gets sent to standby. > The question is whether synchronous_commit=off, > synchronous_replication=on should behave like (1) or (3) Yes, that is the right question. > You have it as #1; I'm arguing > it should be #3. I realize it's an arguable point; I'm just arguing > for what makes most sense to me. Various comments follow on thread. We can pick this up once we've committed the main patch. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Sat, 2011-02-19 at 23:26 -0500, Robert Haas wrote: > I believe the problem is that the definition of IsOnSyncRepQueue is > bogus, so that the loop in SyncRepWaitOnQueue always takes the first > branch. Sorry, don't see that. Jaime/Yeb fix applied. > It was a little confusing to me setting this up that setting only > synchronous_replication did nothing; I had to also set > synchronous_standby_names. We might need a cross-check there. I'm inclined to make an empty "synchronous_standby_names" mean that any standby can become the sync standby. That seems more useful behaviour and avoids the need for a cross-check (what exactly would we check??). > I > believe the docs for synchronous_replication also need some updating; > this part appears to be out of date: > > + between primary and standby. The commit wait will last until > the > + first reply from any standby. Multiple standby servers allow > + increased availability and possibly increase performance as > well. Agreed > The words "on the primary" in the next sentence may not be necessary > any more either, as I believe this parameter now has no effect > anywhere else. Agreed Docs changed: git://github.com/simon2ndQuadrant/postgres.git -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Mon, Feb 28, 2011 at 4:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sat, 2011-02-19 at 23:26 -0500, Robert Haas wrote: > >> I believe the problem is that the definition of IsOnSyncRepQueue is >> bogus, so that the loop in SyncRepWaitOnQueue always takes the first >> branch. > > Sorry, don't see that. Jaime/Yeb fix applied. > >> It was a little confusing to me setting this up that setting only >> synchronous_replication did nothing; I had to also set >> synchronous_standby_names. We might need a cross-check there. > > I'm inclined to make an empty "synchronous_standby_names" mean that any > standby can become the sync standby. That seems more useful behaviour > and avoids the need for a cross-check (what exactly would we check??). Hmm, that is a little surprising but might be reasonable. My thought was that we would check that if synchronous_replication=on then synchronous_standbys must be non-empty. I think there ought to be some way for the admin to turn synchronous replication *off* though, in a way that an individual user cannot override. How will we do that? > Docs changed: git://github.com/simon2ndQuadrant/postgres.git I'm hoping you're going to post an updated patch once the current rash of updates is all done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote: > > Docs changed: git://github.com/simon2ndQuadrant/postgres.git > > I'm hoping you're going to post an updated patch once the current rash > of updates is all done. Immediately prior to commit, yes. Everybody else has been nudging me towards developing in public view, commit by commit on a public repo. So that's what I'm doing now, as promised earlier. That should help people object to specific commits if they no likey. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Mon, Feb 28, 2011 at 4:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote: > >> > Docs changed: git://github.com/simon2ndQuadrant/postgres.git >> >> I'm hoping you're going to post an updated patch once the current rash >> of updates is all done. > > Immediately prior to commit, yes. > > Everybody else has been nudging me towards developing in public view, > commit by commit on a public repo. So that's what I'm doing now, as > promised earlier. That should help people object to specific commits if > they no likey. It took a few days for the problems with the last version to shake out. I think you should give people about that much time again. It's not realistic to suppose that everyone will follow your git repo in detail. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, 2011-02-28 at 16:55 -0500, Robert Haas wrote: > On Mon, Feb 28, 2011 at 4:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote: > > > >> > Docs changed: git://github.com/simon2ndQuadrant/postgres.git > >> > >> I'm hoping you're going to post an updated patch once the current rash > >> of updates is all done. > > > > Immediately prior to commit, yes. > > > > Everybody else has been nudging me towards developing in public view, > > commit by commit on a public repo. So that's what I'm doing now, as > > promised earlier. That should help people object to specific commits if > > they no likey. > > It took a few days for the problems with the last version to shake > out. I think you should give people about that much time again. It's > not realistic to suppose that everyone will follow your git repo in > detail. Yeh, I'm not rushing to commit. And even afterwards I expect comments that will mean I'm editing this for next month at least. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Mon, 2011-02-28 at 18:40 +0000, Simon Riggs wrote: > > SyncRepReleaseWaiters should be called when walsender exits. Otherwise, > > if the standby crashes while a transaction is waiting for replication, > > it waits infinitely. > > Will think on this. The behaviour seems correct to me: If allow_standalone_primary = off then you wish to wait forever (at your request...) If allow_standalone_primary = on then we sit and wait until we hit client timeout, which occurs even after last standby has gone. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, 2011-02-28 at 18:40 +0000, Simon Riggs wrote: >> > SyncRepReleaseWaiters should be called when walsender exits. Otherwise, >> > if the standby crashes while a transaction is waiting for replication, >> > it waits infinitely. >> >> Will think on this. > > The behaviour seems correct to me: > > If allow_standalone_primary = off then you wish to wait forever (at your > request...) No, I've never wished wait-forever option for now. I'd like to make the primary work alone when there is no connected standby, for high-availability. > If allow_standalone_primary = on then we sit and wait until we hit > client timeout, which occurs even after last standby has gone. In that case, why do backends need to wait until the timeout occurs? We can make those backends resume their transaction as soon as the last standby has gone. No? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Thanks for update of the patch! On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> SyncRepRemoveFromQueue seems not to be as short-term as we can >> use the spinlock. Instead, LW lock should be used there. You seem to have forgotten to fix the above-mentioned issue. A spinlock can be used only for very short-term operation like read/write of some shared-variables. The operation on the queue is not short, so should be protected by LWLock, I think. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Mar 1, 2011 at 3:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> PREPARE TRANSACTION and ROLLBACK PREPARED should wait for >> replication as well as COMMIT PREPARED? > > PREPARE - Yes > ROLLBACK - No > > Further discussion welcome If we don't make ROLLBACK PREPARED wait for replication, we might need to issue ROLLBACK PREPARED to new master again after failover, even if we've already received a success indication of ROLLBACK PREPARED from old master. This looks strange to me because, OTOH, in simple COMMIT/ROLLBACK case, we don't need to issue that to new master again after failover. >> What if fast shutdown is requested while RecordTransactionCommit >> is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete >> until replication has been successfully done (i.e., until at least one >> synchronous standby has connected to the master especially if >> allow_standalone_primary is disabled). Is this OK? > > A "behaviour" - important, though needs further discussion. One of the scenarios which I'm concerned is: 1. The primary is running with allow_standalone_primary = on. 2. While some backends are waiting for replication, the user requests fast shutdown. 3. Since the timeout expires, those backends stop waiting and return the success indication to the client (but not replicatedto the standby). 4. Since there is no backend waiting for replication, fast shutdown completes. 5. The clusterware like pacemaker detects the death of the primary and triggers the failover. 6. New primary doesn't have some transactions committed to the client, i.e., transaction lost happens!! To avoid such a transaction lost, we should prevent the primary from returning the success indication to the client while fast shutdown is being executed, even if allow_standalone_primary is enabled, I think. Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote: > On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Mon, 2011-02-28 at 18:40 +0000, Simon Riggs wrote: > >> > SyncRepReleaseWaiters should be called when walsender exits. Otherwise, > >> > if the standby crashes while a transaction is waiting for replication, > >> > it waits infinitely. > >> > >> Will think on this. > > > > The behaviour seems correct to me: > > > > If allow_standalone_primary = off then you wish to wait forever (at your > > request...) > > No, I've never wished wait-forever option for now. I'd like to make > the primary work alone when there is no connected standby, for > high-availability. Good news, please excuse that reference. > > If allow_standalone_primary = on then we sit and wait until we hit > > client timeout, which occurs even after last standby has gone. > > In that case, why do backends need to wait until the timeout occurs? > We can make those backends resume their transaction as soon as > the last standby has gone. No? The guarantee provided is that we will wait for up to client timeout for the sync standby to confirm. If we stop waiting right at the point that an "event" occurs, it breaks the whole purpose of the feature. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote: > Thanks for update of the patch! > > On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> SyncRepRemoveFromQueue seems not to be as short-term as we can > >> use the spinlock. Instead, LW lock should be used there. > > You seem to have forgotten to fix the above-mentioned issue. Not forgotten. > A spinlock can be used only for very short-term operation like > read/write of some shared-variables. The operation on the queue > is not short, so should be protected by LWLock, I think. There's no need to sleep while holding locks and the operations are very short in most cases. The code around it isn't trivial, but that's no reason to use LWlocks. LWlocks are just spinlocks plus sem sleeps, so I don't see the need for that in the current code. Other views welcome. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Tue, 2011-03-01 at 16:28 +0900, Fujii Masao wrote: > On Tue, Mar 1, 2011 at 3:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> PREPARE TRANSACTION and ROLLBACK PREPARED should wait for > >> replication as well as COMMIT PREPARED? > > > > PREPARE - Yes > > ROLLBACK - No > > > > Further discussion welcome > > If we don't make ROLLBACK PREPARED wait for replication, we might need to > issue ROLLBACK PREPARED to new master again after failover, even if we've > already received a success indication of ROLLBACK PREPARED from old master. > This looks strange to me because, OTOH, in simple COMMIT/ROLLBACK case, > we don't need to issue that to new master again after failover. OK > >> What if fast shutdown is requested while RecordTransactionCommit > >> is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete > >> until replication has been successfully done (i.e., until at least one > >> synchronous standby has connected to the master especially if > >> allow_standalone_primary is disabled). Is this OK? > > > > A "behaviour" - important, though needs further discussion. > > One of the scenarios which I'm concerned is: > > 1. The primary is running with allow_standalone_primary = on. > 2. While some backends are waiting for replication, the user requests > fast shutdown. > 3. Since the timeout expires, those backends stop waiting and return the success > indication to the client (but not replicated to the standby). > 4. Since there is no backend waiting for replication, fast shutdown completes. > 5. The clusterware like pacemaker detects the death of the primary and > triggers the > failover. > 6. New primary doesn't have some transactions committed to the client, i.e., > transaction lost happens!! > > To avoid such a transaction lost, we should prevent the primary from > returning the > success indication to the client while fast shutdown is being executed, even if > allow_standalone_primary is enabled, I think. Thought? Walsenders don't shutdown until after they have sent the shutdown checkpoint. We could make them wait for the reply from the standby at that point. I'll think some more, not convinced yet. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Tue, Mar 1, 2011 at 5:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> A spinlock can be used only for very short-term operation like >> read/write of some shared-variables. The operation on the queue >> is not short, so should be protected by LWLock, I think. > > There's no need to sleep while holding locks and the operations are very > short in most cases. The code around it isn't trivial, but that's no > reason to use LWlocks. > > LWlocks are just spinlocks plus sem sleeps, so I don't see the need for > that in the current code. Other views welcome. The following description in in src/backend/storage/lmgr/README leads me to further think that the spinlock should not be used there. Because, in the patch, while the spinlock is being held, whole of the waiting list (if there are many concurrent running transactions, this might be large) can be searched, SetLatch can be called and elog(WARNING) can be called (though this should not happen). ---------------- * Spinlocks. These are intended for *very* short-term locks. If a lock is to be held more than a few dozen instructions, or across any sort of kernel call (or even a call to a nontrivial subroutine), don't use a spinlock. Spinlocks are primarily used as infrastructure for lightweight locks. They are implemented using a hardware atomic-test-and-set instruction, if available. Waiting processes busy-loop until they can get the lock. There is no provision for deadlock detection, automatic release on error, or any other nicety. There is a timeout if the lock cannot be gotten after a minute or so (which is approximately forever in comparison to the intended lock hold time, so this is certainly an error condition). ---------------- Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Mar 1, 2011 at 4:56 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> > If allow_standalone_primary = on then we sit and wait until we hit >> > client timeout, which occurs even after last standby has gone. >> >> In that case, why do backends need to wait until the timeout occurs? >> We can make those backends resume their transaction as soon as >> the last standby has gone. No? > > The guarantee provided is that we will wait for up to client timeout for > the sync standby to confirm. If we stop waiting right at the point that > an "event" occurs, it breaks the whole purpose of the feature. ISTM that at least people (including me) who want to use synchronous replication for high-availability don't like this behavior. Because, when there is one synchronous standby and it's shut down, the transactions would get blocked until the timeout happens. The system down time gets longer. Of course, if walsender doesn't detect the termination of replication connection, I can't complain that transactions wait until the timeout happens. But, otherwise, releasing the waiting transactions would be help for high-availability, I think. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Mar 1, 2011 at 5:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> >> What if fast shutdown is requested while RecordTransactionCommit >> >> is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete >> >> until replication has been successfully done (i.e., until at least one >> >> synchronous standby has connected to the master especially if >> >> allow_standalone_primary is disabled). Is this OK? >> > >> > A "behaviour" - important, though needs further discussion. >> >> One of the scenarios which I'm concerned is: >> >> 1. The primary is running with allow_standalone_primary = on. >> 2. While some backends are waiting for replication, the user requests >> fast shutdown. >> 3. Since the timeout expires, those backends stop waiting and return the success >> indication to the client (but not replicated to the standby). >> 4. Since there is no backend waiting for replication, fast shutdown completes. >> 5. The clusterware like pacemaker detects the death of the primary and >> triggers the >> failover. >> 6. New primary doesn't have some transactions committed to the client, i.e., >> transaction lost happens!! >> >> To avoid such a transaction lost, we should prevent the primary from >> returning the >> success indication to the client while fast shutdown is being executed, even if >> allow_standalone_primary is enabled, I think. Thought? > > Walsenders don't shutdown until after they have sent the shutdown > checkpoint. > > We could make them wait for the reply from the standby at that point. Right. But what if the replication connection is closed before #3? In this case, obviously walsender cannot send WAL up to the shutdown checkpoint. > I'll think some more, not convinced yet. Thanks! I'll think more, too, to make sync rep more reliable! Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 2011-02-28 20:32, Simon Riggs wrote: > > I have reworded it to see if that improves the explanation > Code available at git://github.com/simon2ndQuadrant/postgres.git > > > If a standby is removed from the list of servers then it will stop > being the synchronous standby, allowing another to take it's place. How can I see at the master which one is the synchronous standby? It would be nice if it was an extra attribute in the pg_stat_replication view. As part of understanding syncrep better, we're going to work on that. regards, Yeb Havinga
On Tue, Mar 1, 2011 at 3:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote: >> Thanks for update of the patch! >> >> On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> >> SyncRepRemoveFromQueue seems not to be as short-term as we can >> >> use the spinlock. Instead, LW lock should be used there. >> >> You seem to have forgotten to fix the above-mentioned issue. > > Not forgotten. > >> A spinlock can be used only for very short-term operation like >> read/write of some shared-variables. The operation on the queue >> is not short, so should be protected by LWLock, I think. > > There's no need to sleep while holding locks and the operations are very > short in most cases. The code around it isn't trivial, but that's no > reason to use LWlocks. > > LWlocks are just spinlocks plus sem sleeps, so I don't see the need for > that in the current code. Other views welcome. An LWLock is a lot safer, in general, than a spinlock. A spinlock mustn't do anything that could emit an error or abort (among other things). I doubt that the performance cost of using an LWLock rather than a spin lock here is enough to matter, and the spin lock seems more likely to result in hard-to-find bugs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Simon Riggs <simon@2ndQuadrant.com> writes: > On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote: >> A spinlock can be used only for very short-term operation like >> read/write of some shared-variables. The operation on the queue >> is not short, so should be protected by LWLock, I think. > There's no need to sleep while holding locks and the operations are very > short in most cases. The code around it isn't trivial, but that's no > reason to use LWlocks. What does "in most cases" mean? > LWlocks are just spinlocks plus sem sleeps, so I don't see the need for > that in the current code. Other views welcome. Simon, that is absolutely NOT acceptable. Spinlocks are to be used only for short straight-line code segments. If the lock has any potential to be held for more than nanoseconds, use an LWLock. The contention costs of the shortcut you propose are too high. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Mar 1, 2011 at 3:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> LWlocks are just spinlocks plus sem sleeps, so I don't see the need for >> that in the current code. Other views welcome. > An LWLock is a lot safer, in general, than a spinlock. A spinlock > mustn't do anything that could emit an error or abort (among other > things). I doubt that the performance cost of using an LWLock rather > than a spin lock here is enough to matter, and the spin lock seems > more likely to result in hard-to-find bugs. Well, stuck spinlocks aren't exactly hard to identify. But I agree that the lack of any release-on-error infrastructure is a killer reason not to use a spinlock for anything but short straight-line code segments. regards, tom lane
On Tue, 2011-03-01 at 10:02 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote: > >> A spinlock can be used only for very short-term operation like > >> read/write of some shared-variables. The operation on the queue > >> is not short, so should be protected by LWLock, I think. > > > There's no need to sleep while holding locks and the operations are very > > short in most cases. The code around it isn't trivial, but that's no > > reason to use LWlocks. > > What does "in most cases" mean? > > > LWlocks are just spinlocks plus sem sleeps, so I don't see the need for > > that in the current code. Other views welcome. > > Simon, that is absolutely NOT acceptable. Spinlocks are to be used only > for short straight-line code segments. If the lock has any potential to > be held for more than nanoseconds, use an LWLock. The contention costs > of the shortcut you propose are too high. No problem to change. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote: > No, I've never wished wait-forever option for now. I'd like to make > the primary work alone when there is no connected standby, for > high-availability. allow_standalone_primary seems to need to be better through than it is now, yet neither of us think its worth having. If the people that want it can think it through a little better then it might make this release, but I propose to remove it from this current patch to allow us to commit with greater certainty and fewer bugs. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote: > On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Mon, 2011-02-28 at 18:40 +0000, Simon Riggs wrote: > >> > SyncRepReleaseWaiters should be called when walsender exits. Otherwise, > >> > if the standby crashes while a transaction is waiting for replication, > >> > it waits infinitely. > >> > >> Will think on this. > > > > The behaviour seems correct to me: > > If allow_standalone_primary = on then we sit and wait until we hit > > client timeout, which occurs even after last standby has gone. > > In that case, why do backends need to wait until the timeout occurs? > We can make those backends resume their transaction as soon as > the last standby has gone. No? I'm getting a little confused as to what you're asking for regarding shutdowns and WALSender disconnects. The current behaviour is that when a reply is received we attempt to wake up backends waiting for an LSN. If that reply is not received within a timeout then we just return to user anyway. You can wait forever, if you choose, by setting the timeout to 0. The WALSender deliberately does *not* wake waiting users if the standby disconnects. Doing so would break the whole reason for having sync rep in the first place. What we do is allow a potential standby to takeover the role of sync standby, if one is available. Or the failing standby can reconnect and then release waiters. If we shutdown, then we wait for the shutdown commit record to be transferred to our standby, so a normal or fast shutdown of the master always leaves all connected standbys up to date. We already do that, so sync rep doesn't touch that behaviour. If a standby is disconnected, then it doesn't receive the shutdown checkpoint record. The wait state for a commit does not persist when we shutdown and restart. Can you restate which bits of the above you think need to be changed? -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On 2011-03-02 11:40, Simon Riggs wrote: > > allow_standalone_primary seems to need to be better through than it is > now, yet neither of us think its worth having. > > If the people that want it can think it through a little better then it > might make this release, but I propose to remove it from this current > patch to allow us to commit with greater certainty and fewer bugs. As somebody who actually thought having it was a good idea, +1 for remove. I can monitor having one or two sync standbys at all times and let bells ring when they fail, but especially when things might break and the standbys are gone, having allow_standalone_primary set to off is a very efficient way to limit your options then to effectively zero with the master. regards, Yeb Havinga
On Wed, Mar 2, 2011 at 6:22 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > The WALSender deliberately does *not* wake waiting users if the standby > disconnects. Doing so would break the whole reason for having sync rep > in the first place. What we do is allow a potential standby to takeover > the role of sync standby, if one is available. Or the failing standby > can reconnect and then release waiters. If the transaction would have been allowed to commit without waiting had the standby not been connected in the first place, then presumably it should also be allowed to commit if the standby disconnects later, too. Otherwise, it doesn't seem very consistent. A commit should either wait for a disconnected standby to reconnect, or it should not wait. It shouldn't wait in some situations but not others, I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > The WALSender deliberately does *not* wake waiting users if the standby > disconnects. Doing so would break the whole reason for having sync rep > in the first place. What we do is allow a potential standby to takeover > the role of sync standby, if one is available. Or the failing standby > can reconnect and then release waiters. If there is potential standby when synchronous standby has gone, I agree that it's not good idea to release the waiting backends soon. In this case, those backends should wait for next synchronous standby. On the other hand, if there is no potential standby, I think that the waiting backends should not wait for the timeout and should wake up as soon as synchronous standby has gone. Otherwise, those backends suspend for a long time (i.e., until the timeout expires), which would decrease the high-availability, I'm afraid. Keeping those backends waiting for the failed standby to reconnect is an idea. But this looks like the behavior for "allow_standalone_primary = off". If allow_standalone_primary = on, it looks more natural to make the primary work alone without waiting the timeout. > If we shutdown, then we wait for the shutdown commit record to be > transferred to our standby, so a normal or fast shutdown of the master > always leaves all connected standbys up to date. We already do that, so > sync rep doesn't touch that behaviour. If a standby is disconnected, > then it doesn't receive the shutdown checkpoint record. > > The wait state for a commit does not persist when we shutdown and > restart. > > Can you restate which bits of the above you think need to be changed? What I'm thinking is: when the waiting backends are released because of the timeout while the fast shutdown is being done in the master, those backends should not return the success indication to the client. Of course, in that case, WAL has already been flushed in the master, but I think that those backends should exit with FATAL error before returning the success. This is for avoiding breaking the synchronous replication rule, i.e., all the transaction which the client knows as committed must be committed in the synchronous standby after failover. If we allow those backends to return the success in that situation, the following scenario which can cause a data loss can happen. 1. The primary is running with allow_standalone_primary = on. There is only one (synchronous) standby connected. 2. The replication connection is closed because of the network outage. 3. While some backends are waiting for replication, the user requests fast shutdown in the master. 4. Since the timeout expires, those backends stop waiting and return the success indication to the client (but not replicatedto the standby). 5. Since there is no backend waiting for replication, fast shutdown completes. 6. The clusterware like pacemaker detects the death of the primary and triggers the failover. 7. New primary doesn't have some transactions committed to the client, i.e., transaction lost happens!! Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Mar 2, 2011 at 7:40 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote: > >> No, I've never wished wait-forever option for now. I'd like to make >> the primary work alone when there is no connected standby, for >> high-availability. > > allow_standalone_primary seems to need to be better through than it is > now, yet neither of us think its worth having. > > If the people that want it can think it through a little better then it > might make this release, but I propose to remove it from this current > patch to allow us to commit with greater certainty and fewer bugs. +1 to remove the "wait-forever" behavior and the parameter for 9.1. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 02.03.2011 12:40, Simon Riggs wrote: > allow_standalone_primary seems to need to be better through than it is > now, yet neither of us think its worth having. > > If the people that want it can think it through a little better then it > might make this release, but I propose to remove it from this current > patch to allow us to commit with greater certainty and fewer bugs. If you leave it out, then let's rename the feature to "semi-synchronous replication" or such. The point of synchronous replication is zero-data-loss, and you don't achieve that with allow_standalone_primary=on. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Mar 2, 2011 at 9:30 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > What I'm thinking is: when the waiting backends are released because > of the timeout while the fast shutdown is being done in the master, > those backends should not return the success indication to the client. > Of course, in that case, WAL has already been flushed in the master, > but I think that those backends should exit with FATAL error before > returning the success. This is for avoiding breaking the synchronous > replication rule, i.e., all the transaction which the client knows as > committed must be committed in the synchronous standby after failover. That seems like an extremely bad idea. Now any client that assumes that FATAL means his transaction didn't commit is broken. Clients should be entitled to assume that a successful COMMIT means the transaction committed (with whatever the operative durability guarantee is) and that an error means it rolled back. If the connection is closed before either one of those things happens, the client can't assume anything. It might be reasonable to COMMIT but also issue a warning message, or to just close the connection without telling the client what happened, but sending an error seems poor. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 2, 2011 at 9:53 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 02.03.2011 12:40, Simon Riggs wrote: >> >> allow_standalone_primary seems to need to be better through than it is >> now, yet neither of us think its worth having. >> >> If the people that want it can think it through a little better then it >> might make this release, but I propose to remove it from this current >> patch to allow us to commit with greater certainty and fewer bugs. > > If you leave it out, then let's rename the feature to "semi-synchronous > replication" or such. The point of synchronous replication is > zero-data-loss, and you don't achieve that with allow_standalone_primary=on. I think that'd just be adding confusion. Replication will still be synchronous; it'll just be more likely to be not happening when you think it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 02.03.2011 17:07, Robert Haas wrote: > On Wed, Mar 2, 2011 at 9:30 AM, Fujii Masao<masao.fujii@gmail.com> wrote: >> What I'm thinking is: when the waiting backends are released because >> of the timeout while the fast shutdown is being done in the master, >> those backends should not return the success indication to the client. >> Of course, in that case, WAL has already been flushed in the master, >> but I think that those backends should exit with FATAL error before >> returning the success. This is for avoiding breaking the synchronous >> replication rule, i.e., all the transaction which the client knows as >> committed must be committed in the synchronous standby after failover. > > That seems like an extremely bad idea. Now any client that assumes > that FATAL means his transaction didn't commit is broken. Clients > should be entitled to assume that a successful COMMIT means the > transaction committed (with whatever the operative durability > guarantee is) and that an error means it rolled back. If the > connection is closed before either one of those things happens, the > client can't assume anything. To achieve the effect Fujii is looking for, we would have to silently drop the connection. That would correctly leave the client not knowing whether the transaction committed or not. > It might be reasonable to COMMIT but also issue a warning message, or > to just close the connection without telling the client what happened, > but sending an error seems poor. Yeah, I guess that would work too, if the client knows to watch out for those warnings. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 02.03.2011 17:07, Robert Haas wrote: > On Wed, Mar 2, 2011 at 9:53 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> On 02.03.2011 12:40, Simon Riggs wrote: >>> >>> allow_standalone_primary seems to need to be better through than it is >>> now, yet neither of us think its worth having. >>> >>> If the people that want it can think it through a little better then it >>> might make this release, but I propose to remove it from this current >>> patch to allow us to commit with greater certainty and fewer bugs. >> >> If you leave it out, then let's rename the feature to "semi-synchronous >> replication" or such. The point of synchronous replication is >> zero-data-loss, and you don't achieve that with allow_standalone_primary=on. > > I think that'd just be adding confusion. Replication will still be > synchronous; it'll just be more likely to be not happening when you > think it is. The defining property of synchronous replication is that when a transaction is acknowledged as committed to the client, it has also been replicated to the standby. You don't achieve that with allow_standalone_primary=on, plain and simple. That's fine for a lot of people, most people don't actually want synchronous replication because they're not willing to pay the availability penalty. But IMHO it would be disingenuous to call it synchronous replication if you can't achieve zero data loss with it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > To achieve the effect Fujii is looking for, we would have to silently drop > the connection. That would correctly leave the client not knowing whether > the transaction committed or not. +1 >> It might be reasonable to COMMIT but also issue a warning message, or >> to just close the connection without telling the client what happened, >> but sending an error seems poor. > > Yeah, I guess that would work too, if the client knows to watch out for > those warnings. -1 Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > The defining property of synchronous replication is that when a > transaction is acknowledged as committed to the client, it has > also been replicated to the standby. You don't achieve that with > allow_standalone_primary=on, plain and simple. That's fine for a > lot of people, most people don't actually want synchronous > replication because they're not willing to pay the availability > penalty. But IMHO it would be disingenuous to call it synchronous > replication if you can't achieve zero data loss with it. Right. While there may be more people who favor high availability than the guarantees of synchronous replication, let's not blur the lines by mislabeling things. It's not synchronous replication if a commit returns successfully without the data being persisted on a second server. -Kevin
On Wed, Mar 2, 2011 at 2:30 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > 1. The primary is running with allow_standalone_primary = on. There > is only one (synchronous) standby connected. OK. Explicitly configured to allow the master to report as commited stuff which isn't on a/any slave. > 7. New primary doesn't have some transactions committed to the > client, i.e., transaction lost happens!! And this is a surprise? I'm not saying there isn't a better way to to sequence/control a shutdown to make this risk less, but isn't that the whole point of the "allow_standalone_primary" debate/option? "If there isn't a sync slave for whatever reason, just march on, I'll deal with the transactions that are committed and not replicated some other way". I guess complaining that it shouldn't be possible to "just march on when no sync slave is available" is one possible way oof "dealing with" them ;-) a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On Wed, Mar 2, 2011 at 9:58 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> To achieve the effect Fujii is looking for, we would have to silently drop >> the connection. That would correctly leave the client not knowing whether >> the transaction committed or not. > > +1 > >>> It might be reasonable to COMMIT but also issue a warning message, or >>> to just close the connection without telling the client what happened, >>> but sending an error seems poor. >> >> Yeah, I guess that would work too, if the client knows to watch out for >> those warnings. > > -1 yeah, unless by warning, you meant 'error'. merlin
On Wed, Mar 2, 2011 at 12:39 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, Mar 2, 2011 at 9:58 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >>> To achieve the effect Fujii is looking for, we would have to silently drop >>> the connection. That would correctly leave the client not knowing whether >>> the transaction committed or not. >> >> +1 >> >>>> It might be reasonable to COMMIT but also issue a warning message, or >>>> to just close the connection without telling the client what happened, >>>> but sending an error seems poor. >>> >>> Yeah, I guess that would work too, if the client knows to watch out for >>> those warnings. >> >> -1 > > yeah, unless by warning, you meant 'error'. Well, as mentioned upthread, throwing an error when the transaction is actually committed seems poor. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 2011-03-02 at 17:23 +0200, Heikki Linnakangas wrote: > On 02.03.2011 17:07, Robert Haas wrote: > > On Wed, Mar 2, 2011 at 9:53 AM, Heikki Linnakangas > > <heikki.linnakangas@enterprisedb.com> wrote: > >> On 02.03.2011 12:40, Simon Riggs wrote: > >>> > >>> allow_standalone_primary seems to need to be better through than it is > >>> now, yet neither of us think its worth having. > >>> > >>> If the people that want it can think it through a little better then it > >>> might make this release, but I propose to remove it from this current > >>> patch to allow us to commit with greater certainty and fewer bugs. > >> > >> If you leave it out, then let's rename the feature to "semi-synchronous > >> replication" or such. The point of synchronous replication is > >> zero-data-loss, and you don't achieve that with allow_standalone_primary=on. > > > > I think that'd just be adding confusion. Replication will still be > > synchronous; it'll just be more likely to be not happening when you > > think it is. > > The defining property of synchronous replication is that when a > transaction is acknowledged as committed to the client, it has also been > replicated to the standby. You don't achieve that with > allow_standalone_primary=on, plain and simple. That's fine for a lot of > people, most people don't actually want synchronous replication because > they're not willing to pay the availability penalty. But IMHO it would > be disingenuous to call it synchronous replication if you can't achieve > zero data loss with it. I agree with some of what you say, but that focuses on one corner case and not on the whole solution. Yes, if we pick the allow_standalone_primary=on behaviour AND you are stupid enough to lose *all* of your sync standbys then you may get data loss. What I've spent a lot of time doing is trying to ensure that we never lose all our sync standbys, via clear recommendation to use multiple servers AND functionality to allow that the standbys work together to give true high availability without data loss. The current design allows for an arbitrary number of potential standby servers so your risk of data loss can be as many 9s as you care to make it. Not really bothered what we call it, but if you intend to make a song and dance about whether it is "proper" or not, then I would object to that because you're not presenting the full situation. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Wed, 2011-03-02 at 16:53 +0200, Heikki Linnakangas wrote: > On 02.03.2011 12:40, Simon Riggs wrote: > > allow_standalone_primary seems to need to be better through than it is > > now, yet neither of us think its worth having. > > > > If the people that want it can think it through a little better then it > > might make this release, but I propose to remove it from this current > > patch to allow us to commit with greater certainty and fewer bugs. > > If you leave it out, then let's rename the feature to "semi-synchronous > replication" or such. The point of synchronous replication is > zero-data-loss, and you don't achieve that with allow_standalone_primary=on. The reason I have suggested leaving that parameter out is because the behaviour is not fully specified and Yeb has reported cases that don't (yet) make sense. If you want to fully specify it then we could yet add it, assuming we have time. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On 02.03.2011 21:48, Simon Riggs wrote: > On Wed, 2011-03-02 at 16:53 +0200, Heikki Linnakangas wrote: >> On 02.03.2011 12:40, Simon Riggs wrote: >>> allow_standalone_primary seems to need to be better through than it is >>> now, yet neither of us think its worth having. >>> >>> If the people that want it can think it through a little better then it >>> might make this release, but I propose to remove it from this current >>> patch to allow us to commit with greater certainty and fewer bugs. >> >> If you leave it out, then let's rename the feature to "semi-synchronous >> replication" or such. The point of synchronous replication is >> zero-data-loss, and you don't achieve that with allow_standalone_primary=on. > > The reason I have suggested leaving that parameter out is because the > behaviour is not fully specified and Yeb has reported cases that don't > (yet) make sense. If you want to fully specify it then we could yet add > it, assuming we have time. Fair enough. All I'm saying is that if we end up shipping without that parameter (implying allow_standalone_primary=on), we need to call the feature something else. The GUCs and code can probably stay as it is, but we shouldn't use the term "synchronous replication" in the documentation, and release notes and such. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > All I'm saying is that if we end up shipping without that > parameter (implying allow_standalone_primary=on), we need to call > the feature something else. The GUCs and code can probably stay as > it is, but we shouldn't use the term "synchronous replication" in > the documentation, and release notes and such. I think including "synchronous" is OK as long as it's properly qualified. Off-hand thoughts in no particular order: semi-synchronous conditionally synchronous synchronous with automatic failover to standalone Perhaps the qualifications can be relaxed in some places but not others? The documentation should certainly emphasize that there is no guarantee that a successful commit means that the data is on at least two separate servers, if no such guarantee exists. If we expect that losing all replicas is such a terribly thin long shot that we don't need to worry about this difference, it is such a long shot that we don't need to worry about "wait forever" behavior, either; and we should just implement *that* so that no qualification is needed. I think that is an odd assumption, though; and I think a HA failover to weaker persistence guarantees in exchange for increased up-time would be popular. -Kevin
On Wed, 2011-03-02 at 14:26 -0600, Kevin Grittner wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > > > All I'm saying is that if we end up shipping without that > > parameter (implying allow_standalone_primary=on), we need to call > > the feature something else. The GUCs and code can probably stay as > > it is, but we shouldn't use the term "synchronous replication" in > > the documentation, and release notes and such. > > I think including "synchronous" is OK as long as it's properly > qualified. Off-hand thoughts in no particular order: > > semi-synchronous You mean asynchronous > conditionally synchronous You mean asynchronous JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On Wed, 2011-03-02 at 22:10 +0200, Heikki Linnakangas wrote: > On 02.03.2011 21:48, Simon Riggs wrote: > > On Wed, 2011-03-02 at 16:53 +0200, Heikki Linnakangas wrote: > >> On 02.03.2011 12:40, Simon Riggs wrote: > >>> allow_standalone_primary seems to need to be better through than it is > >>> now, yet neither of us think its worth having. > >>> > >>> If the people that want it can think it through a little better then it > >>> might make this release, but I propose to remove it from this current > >>> patch to allow us to commit with greater certainty and fewer bugs. > >> > >> If you leave it out, then let's rename the feature to "semi-synchronous > >> replication" or such. The point of synchronous replication is > >> zero-data-loss, and you don't achieve that with allow_standalone_primary=on. > > > > The reason I have suggested leaving that parameter out is because the > > behaviour is not fully specified and Yeb has reported cases that don't > > (yet) make sense. If you want to fully specify it then we could yet add > > it, assuming we have time. > > Fair enough. All I'm saying is that if we end up shipping without that > parameter (implying allow_standalone_primary=on), we need to call the > feature something else. The GUCs and code can probably stay as it is, > but we shouldn't use the term "synchronous replication" in the > documentation, and release notes and such. allow_standalone_primary=off means "wait forever". It does nothing to reduce data loss since you can't replicate to a server that isn't there. As we discussed it, allow_standalone_primary=off was not a persistent state, so shutting down the database would simply leave the data committed. Which gives the same problem, but implicitly. Truly "synchronous" requires two-phase commit, which this never was. So the absence or presence of the poorly specified parameter called allow_standalone_primary should have no bearing on what we call this feature. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On 03/02/2011 03:39 PM, Simon Riggs wrote: > Truly "synchronous" requires two-phase commit, which this never was. So > the absence or presence of the poorly specified parameter called > allow_standalone_primary should have no bearing on what we call this > feature. > I haven't been following this very closely, but to me this screams out that we simply must not call it "synchronous". Just my $0.02 worth. cheers andrew
Simon Riggs <simon@2ndQuadrant.com> wrote: > allow_standalone_primary=off means "wait forever". It does nothing > to reduce data loss since you can't replicate to a server that > isn't there. Unless you're pulling from some persistent source which will then feel free to discard what you have retrieved. You can't assume otherwise; it's not that rare a pattern for interfaces. We use such a pattern for accepting criminal complaints from district attorneys and sending warrant information to police agencies. Waiting a long time (it won't *actually* be forever) is better than losing information. In other words, making a persistence promise which is not kept can lose data on the client side, if the clients actually trust your guarantees. -Kevin
On Wed, Mar 2, 2011 at 3:45 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Simon Riggs <simon@2ndQuadrant.com> wrote: > >> allow_standalone_primary=off means "wait forever". It does nothing >> to reduce data loss since you can't replicate to a server that >> isn't there. > > Unless you're pulling from some persistent source which will then > feel free to discard what you have retrieved. You can't assume > otherwise; it's not that rare a pattern for interfaces. We use such > a pattern for accepting criminal complaints from district attorneys > and sending warrant information to police agencies. Waiting a long > time (it won't *actually* be forever) is better than losing > information. > > In other words, making a persistence promise which is not kept can > lose data on the client side, if the clients actually trust your > guarantees. I agree. I assumed that when Simon was talking about removing allow_standalone_primary, he meant making the code always behave as if it were turned OFF. The common scenario here is bound to be: 1. Everything is humming along. 2. The network link between the master and standby drops. 3. Then it comes back up again. After (2) and before (3), what should the behavior the master be? It seems clear to me that it should WAIT. Otherwise, a crash on the master now leaves you with transactions that were confirmed committed but not actually replicated to the standby. If you were OK with that scenario, you would have used asynchronous replication in the first place. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 2011-03-02 at 15:50 -0500, Robert Haas wrote: > I assumed that when Simon was talking about removing > allow_standalone_primary, he meant making the code always behave as if > it were turned OFF. That is the part that is currently not fully specified, so no that is not currently included in the patch. That isn't double-talk for "and I will not include it". What I mean is I'd rather have something than nothing, whatever we decide to call it. But the people that want it had better come up with a clear definition of how it will actually work, covering all cases, not just "splish splash, it kinda works and we'll let Simon will work out the rest". -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Simon Riggs <simon@2ndQuadrant.com> wrote: > On Wed, 2011-03-02 at 15:50 -0500, Robert Haas wrote: > >> I assumed that when Simon was talking about removing >> allow_standalone_primary, he meant making the code always behave >> as if it were turned OFF. > > That is the part that is currently not fully specified, so no that > is not currently included in the patch. > > That isn't double-talk for "and I will not include it". > > What I mean is I'd rather have something than nothing, whatever we > decide to call it. +1 on that. > But the people that want it had better come up with a clear > definition of how it will actually work What is ill-defined? I would have thought that the commit request would hang indefinitely until the server was able to provide its usual guarantees. I'm not clear on what cases aren't covered by that. -Kevin
On Wed, 2011-03-02 at 15:44 -0500, Andrew Dunstan wrote: > > On 03/02/2011 03:39 PM, Simon Riggs wrote: > > Truly "synchronous" requires two-phase commit, which this never was. So > > the absence or presence of the poorly specified parameter called > > allow_standalone_primary should have no bearing on what we call this > > feature. > > > > I haven't been following this very closely, but to me this screams out > that we simply must not call it "synchronous". As long as we describe it via its characteristics, then I'll be happy: * significantly reduces the possibility of data loss in a sensibly configured cluster * allow arbitrary N+k resilience that can meet and easily exceed"5 nines" data durability * isn't two phase commit * isn't a magic bullet that will protect your data even after your hardware fails or is disconnected -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On Wed, 2011-03-02 at 22:10 +0200, Heikki Linnakangas wrote: >> Fair enough. All I'm saying is that if we end up shipping without that >> parameter (implying allow_standalone_primary=on), we need to call the >> feature something else. The GUCs and code can probably stay as it is, >> but we shouldn't use the term "synchronous replication" in the >> documentation, and release notes and such. > allow_standalone_primary=off means "wait forever". It does nothing to > reduce data loss since you can't replicate to a server that isn't there. This is irrelevant to the point. The point is that sync rep implies that we will not *tell a client* its data is committed unless the commit is down to disk in two places. I agree with the people saying that not having this parameter makes it not real sync rep. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > 1. Everything is humming along. > 2. The network link between the master and standby drops. > 3. Then it comes back up again. > > After (2) and before (3), what should the behavior the master be? It > seems clear to me that it should WAIT. Otherwise, a crash on the That just means you want data high availability, not service HA. Some people want the *service* to stay available in such a situation. > master now leaves you with transactions that were confirmed committed > but not actually replicated to the standby. If you were OK with that > scenario, you would have used asynchronous replication in the first > place. What is so hard to understand in "worst case scenario" being different than "expected conditions". We all know that getting the last percent is more expensive than getting the 99 first one. We have no reason to force people into building for the last percent whatever their context. So, what cases need to be defined wrt forbidding the primary to continue alone? - in flight commit blocked until we can offer the asked for durability, wait forever - shutdown request blocked until standby acknowledge the final checkpoint are immediate shutdown requests permitted? what do they do? What other cases are to be fully designed here? Please note that above list is just a way to start the design, not a definitive proposal from me. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 03/02/2011 04:13 PM, Simon Riggs wrote: > On Wed, 2011-03-02 at 15:44 -0500, Andrew Dunstan wrote: >> On 03/02/2011 03:39 PM, Simon Riggs wrote: >>> Truly "synchronous" requires two-phase commit, which this never was. So >>> the absence or presence of the poorly specified parameter called >>> allow_standalone_primary should have no bearing on what we call this >>> feature. >>> >> I haven't been following this very closely, but to me this screams out >> that we simply must not call it "synchronous". > As long as we describe it via its characteristics, then I'll be happy: > > * significantly reduces the possibility of data loss in a sensibly > configured cluster > > * allow arbitrary N+k resilience that can meet and easily exceed > "5 nines" data durability > > * isn't two phase commit > > * isn't a magic bullet that will protect your data even after your > hardware fails or is disconnected > Ok, so let's call it "enhanced safety" or something else that isn't a term of art. cheers andrew
On Wed, Mar 2, 2011 at 4:19 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> 1. Everything is humming along. >> 2. The network link between the master and standby drops. >> 3. Then it comes back up again. >> >> After (2) and before (3), what should the behavior the master be? It >> seems clear to me that it should WAIT. Otherwise, a crash on the > > That just means you want data high availability, not service HA. Some > people want the *service* to stay available in such a situation. > >> master now leaves you with transactions that were confirmed committed >> but not actually replicated to the standby. If you were OK with that >> scenario, you would have used asynchronous replication in the first >> place. > > What is so hard to understand in "worst case scenario" being different > than "expected conditions". We all know that getting the last percent > is more expensive than getting the 99 first one. We have no reason to > force people into building for the last percent whatever their context. I don't understand how synchronous replication with allow_standalone_primary=on gives you ANY extra nines. AFAICS, the only point of having synchronous replication is that you wait to acknowledge the commit to the client until the commit record has been replicated. Doing that only when the standby happens to be connected doesn't seem like it helps much. If the master is up, then it doesn't really matter what the standby does; we don't need high availability in that case, because we have just plain regular old availability. If the master goes down, then we need to know that we haven't lost any confirmed-committed transactions. With allow_standalone_primary=off, we don't know that. They might be, or they might not be. Even if we have 100 separate standbys, there is no way of knowing whether there was a time period just before the crash during which the master couldn't get out to the Internet, and some commits by clients on the local network went through. Maybe with some careful network engineering you can convince yourself that that isn't very likely, but I sure wouldn't bet on it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2011-03-02 21:26, Kevin Grittner wrote: > > I think including "synchronous" is OK as long as it's properly > qualified. Off-hand thoughts in no particular order: > > semi-synchronous > conditionally synchronous > synchronous with automatic failover to standalone It would be good to name the concept equal to how other DBMSses call it, if they have a similar concept - don't know if Mysql's semisynchronous replication is the same, but after a quick read it sounds like it does. regards, Yeb Havinga
On Wed, 2011-03-02 at 16:16 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On Wed, 2011-03-02 at 22:10 +0200, Heikki Linnakangas wrote: > >> Fair enough. All I'm saying is that if we end up shipping without that > >> parameter (implying allow_standalone_primary=on), we need to call the > >> feature something else. The GUCs and code can probably stay as it is, > >> but we shouldn't use the term "synchronous replication" in the > >> documentation, and release notes and such. > > > allow_standalone_primary=off means "wait forever". It does nothing to > > reduce data loss since you can't replicate to a server that isn't there. > > This is irrelevant to the point. The point is that sync rep implies > that we will not *tell a client* its data is committed unless the commit > is down to disk in two places. I agree with the people saying that not > having this parameter makes it not real sync rep. Depends what you think the point is. Your comments go exactly to *my* point which is that the behaviour I'm looking to commit maximises data durability *and* availability and is the real choice that people will make. Claiming it "isn't real" will make people scared of it, when its actually what they have been waiting for. There is nothing half-arsed or unreal about what is being delivered. Let's not get hung up on textbook definitions, lets do the useful thing. And to repeat, I am not against the other way. It has its place, in a few cases. And I'm not against including it in this release either, given we have a good definition. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Yeb Havinga <yebhavinga@gmail.com> wrote: > On 2011-03-02 21:26, Kevin Grittner wrote: >> >> I think including "synchronous" is OK as long as it's properly >> qualified. Off-hand thoughts in no particular order: >> >> semi-synchronous >> conditionally synchronous >> synchronous with automatic failover to standalone > It would be good to name the concept equal to how other DBMSses > call it, if they have a similar concept - don't know if Mysql's > semisynchronous replication is the same, but after a quick read it > sounds like it does. I had no idea MySQL used that terminology; it just seemed apt for describing a setup which is synchronous except when it isn't. Using the same terminology for equivalent functionality has its pluses, but might there be an trademark or other IP issues here? -Kevin
On Wed, 2011-03-02 at 16:24 -0500, Andrew Dunstan wrote: > > On 03/02/2011 04:13 PM, Simon Riggs wrote: > > On Wed, 2011-03-02 at 15:44 -0500, Andrew Dunstan wrote: > >> On 03/02/2011 03:39 PM, Simon Riggs wrote: > >>> Truly "synchronous" requires two-phase commit, which this never was. So > >>> the absence or presence of the poorly specified parameter called > >>> allow_standalone_primary should have no bearing on what we call this > >>> feature. > >>> > >> I haven't been following this very closely, but to me this screams out > >> that we simply must not call it "synchronous". > > As long as we describe it via its characteristics, then I'll be happy: > > > > * significantly reduces the possibility of data loss in a sensibly > > configured cluster > > > > * allow arbitrary N+k resilience that can meet and easily exceed > > "5 nines" data durability > > > > * isn't two phase commit > > > > * isn't a magic bullet that will protect your data even after your > > hardware fails or is disconnected > > > > > Ok, so let's call it "enhanced safety" or something else that isn't a > term of art. Good plan. Oracle avoided the whole issue by referring to the two modes as "maximum availability" and "maximum protection". I'm not sure if that is patented or copyright etc, but I'm guessing they had this exact same discussion, just a little less friendly. Perhaps we can coin the term "High Durability". -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Thu, Mar 3, 2011 at 5:50 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I agree. I assumed that when Simon was talking about removing > allow_standalone_primary, he meant making the code always behave as if > it were turned OFF. I feel the same thing.. Despite his saying, the patch implements sync_replication_timeout_client, and if its value is too large, the primary behaves like "wait-forever". Though the primary behaves like "standalone" only when it's started first without connected standbys. So if we have sync_replication_timeout_client, allow_standalone_primary looks no longer useful. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Mar 3, 2011 at 6:33 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I don't understand how synchronous replication with > allow_standalone_primary=on gives you ANY extra nines. When you start the primary (or when there is one connected standby and it crashes), allow_standalone_primary = on allows the database service to proceed. OTOH, setting the parameter to off keeps the service stopping until new standby has connected and has caught up with the primary. This would cause long service down time, and decrease the availability. Of course, running the primary alone has the risk. If its disk gets corrupted before new standby appears, some committed transactions are lost. But we can decrease this risk to a certain extent by using RAID or something to the storage. So I think that some systems can accept the risk and prefer the availability of the database service. Josh explained clearly before why allow_standalone_primary = off is required for his case. http://archives.postgresql.org/message-id/4CAE2488.9020207%40agliodbs.com Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > To achieve the effect Fujii is looking for, we would have to silently drop > the connection. That would correctly leave the client not knowing whether > the transaction committed or not. Yeah, this seems to make more sense. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, 2011-03-03 at 13:35 +0900, Fujii Masao wrote: > On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > > To achieve the effect Fujii is looking for, we would have to silently drop > > the connection. That would correctly leave the client not knowing whether > > the transaction committed or not. > > Yeah, this seems to make more sense. How do you propose we do that? -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Fujii Masao <masao.fujii@gmail.com> writes: > On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> To achieve the effect Fujii is looking for, we would have to silently drop >> the connection. That would correctly leave the client not knowing whether >> the transaction committed or not. > Yeah, this seems to make more sense. It was pointed out that sending an ERROR would not do because it would likely lead to client code assuming the transaction failed, which might or might not be the case. But maybe we could send a WARNING and then close the connection? That would give humans a clue what had happened, but not do anything to the state of automated clients. regards, tom lane
On Thu, 2011-03-03 at 02:14 -0500, Tom Lane wrote: > Fujii Masao <masao.fujii@gmail.com> writes: > > On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas > > <heikki.linnakangas@enterprisedb.com> wrote: > >> To achieve the effect Fujii is looking for, we would have to silently drop > >> the connection. That would correctly leave the client not knowing whether > >> the transaction committed or not. > > > Yeah, this seems to make more sense. > > It was pointed out that sending an ERROR would not do because it would > likely lead to client code assuming the transaction failed, which might > or might not be the case. But maybe we could send a WARNING and then > close the connection? That would give humans a clue what had happened, > but not do anything to the state of automated clients. So when we perform a Fast Shutdown we want to do something fairly similar to quickdie()? Please review the attached patch. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > I don't understand how synchronous replication with > allow_standalone_primary=on gives you ANY extra nines. AFAICS, the > only point of having synchronous replication is that you wait to > acknowledge the commit to the client until the commit record has been > replicated. Doing that only when the standby happens to be connected > doesn't seem like it helps much. Because you're still thinking in terms of data availability, rather than in terms of service availability. With the later in mind, what you want is to be able to continue servicing from the standby should the primary crash, and you want a good guarantee about the standby's data. Of course in such a situation you will have some monitoring to ensure that the standby remains in sync, and you want to know that at failover time. But a standby failure, when you want service availability, should never bring the service down. It's what happens, though, if you've been setting up *data* availability, because there there's no choice. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Wed, Mar 2, 2011 at 5:10 PM, Yeb Havinga <yebhavinga@gmail.com> wrote: > On 2011-03-02 21:26, Kevin Grittner wrote: >> >> I think including "synchronous" is OK as long as it's properly >> qualified. Off-hand thoughts in no particular order: >> >> semi-synchronous >> conditionally synchronous >> synchronous with automatic failover to standalone > > It would be good to name the concept equal to how other DBMSses call it, if > they have a similar concept - don't know if Mysql's semisynchronous > replication is the same, but after a quick read it sounds like it does. Here's the link: http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html I think this is mostly about how many slaves have to ack the commit. It's not entirely clear to me what happens if a slave is set up but not connected at the moment. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company