Thread: Transaction-controlled robustness for replication
One of the cool features of 8.3 was the ability to control at the transaction level whether we would use synchronous or asynchronous commit. We're planning to add integrated replication features to 8.4, and I think it will be possible to extend the concept of asynchronous commit to a more general form of transaction-level robustness. Note that the proof that its possible to mix asynchronous and synchronous transactions on the same system has already been established, so this is just a straight extension of that concept. Asynchronous commit controls whether we go to disk at time of commit, or whether we defer this slightly. We have the same options with replication: do we replicate at time of commit, or do we defer this slightly for performance reasons. DRBD and other replication systems show us that there is actually another difference when talking about synchronous replication: do we go to disk on the standby before acknowledging the primary? We can generalise this as three closed questions, answered either Yes (Synchronous) or No (Asynchronous) * Does WAL get forced to disk on primary at commit time? * Does WAL get forced across link to standby at commit time? * Does WAL get forced to disk on standby at commit time? In code, these are simple if tests: Do we wait, or not? We could represent this with 3 parameters: synchronous_commit = on | off synchronous_standby_transfer = on | off synchronous_standby_wal_fsync = on | off If we are able to define these robustness characteristics for each transaction *separately* then it will represent an industry first: no other database has that capability implemented or planned on published roadmaps, nor has it been discussed in research to my knowledge. Changing the parameter setting at transaction-level would be expensive if we had to set three parameters. Or we could use just two parameters: synchronous_commit = on | off synchronous_replication = 'AA', 'SA' or 'SS' with A = Asynchronous, S = Synchronous which corresponds with DRBD's algorithmslike thisDRBD A = AADRBD B = SADRBD C = SS Or we could use just a single parameter synchronous_commit = 'AAA', 'SAA', 'SSA', 'SSS' or on |off when no log-based replication is defined Having the ability to set these at the transaction-level would be very cool. Having it set via a *single* parameter would make it much more viable to switch between AAA for bulk, low importance data and SSS for very low volume, critical data, or somewhere in between on the same server, at the same time. So proposal in summary is * allow various modes of synchronous replication for perf/robustness * allow modes to be specified per-transaction * allow modes to be specified as a single parameter I think Itagaki may have described similar concepts at PGCon2008, but this thread has been started to make sure that meme definitely has been released into the wild, and to discuss how we might specify it? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, very nice proposal and thoughts. Allow me to compare against Postgres-R again. Simon Riggs wrote: > Asynchronous commit controls whether we go to disk at time of commit, or > whether we defer this slightly. We have the same options with > replication: do we replicate at time of commit, or do we defer this > slightly for performance reasons. DRBD and other replication systems > show us that there is actually another difference when talking about > synchronous replication: do we go to disk on the standby before > acknowledging the primary? Yeah, I was thinking into the same direction for Postgres-R. There already exist three replication levels: sync, eager and lazy. Having more than just a primary and a standby server in mind, one can also argue about how many remote nodes need to have written the changes to disc, before commit is confirmed in 'sync' mode. At least a majority is required, probably more nodes are wanted. The eager mode is what the original Postgres-R approach is all about and is pretty much the only one I've implemented, at the moment. It only requires confirmation from the GCS, which means at least a majority of the nodes have received the change set (and will be able to apply it). (This leads to a corner case for a full cluster outage, see [1]). In async mode, commit is confirmed before sending the change set to other nodes. > If we are able to define these robustness characteristics for each > transaction *separately* then it will represent an industry first: Yeah, that would be pretty cool. > no > other database has that capability implemented or planned on published > roadmaps, nor has it been discussed in research to my knowledge. Well, a partial implementation in Postgres-R, if that counts... ;-) Regards Markus [1]: One of the few threads on the Postgres-R-general mailing list: http://pgfoundry.org/pipermail/postgres-r-general/2006-August/000002.html
On 7/22/08, Simon Riggs <simon@2ndquadrant.com> wrote: > We could represent this with 3 parameters: > synchronous_commit = on | off > synchronous_standby_transfer = on | off > synchronous_standby_wal_fsync = on | off > > If we are able to define these robustness characteristics for each > transaction *separately* then it will represent an industry first: no > other database has that capability implemented or planned on published > roadmaps, nor has it been discussed in research to my knowledge. > > Changing the parameter setting at transaction-level would be expensive > if we had to set three parameters. How about extending BEGIN.with additional keywords? -- marko
On Wed, 2008-07-23 at 01:39 +0300, Marko Kreen wrote: > On 7/22/08, Simon Riggs <simon@2ndquadrant.com> wrote: > > We could represent this with 3 parameters: > > synchronous_commit = on | off > > synchronous_standby_transfer = on | off > > synchronous_standby_wal_fsync = on | off > > > > If we are able to define these robustness characteristics for each > > transaction *separately* then it will represent an industry first: no > > other database has that capability implemented or planned on published > > roadmaps, nor has it been discussed in research to my knowledge. > > > > Changing the parameter setting at transaction-level would be expensive > > if we had to set three parameters. > > How about extending BEGIN.with additional keywords? SQL Standard, plus many interfaces hide BEGIN from you. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Simon Riggs wrote: > Asynchronous commit controls whether we go to disk at time of commit, or > whether we defer this slightly. We have the same options with > replication: do we replicate at time of commit, or do we defer this > slightly for performance reasons. DRBD and other replication systems > show us that there is actually another difference when talking about > synchronous replication: do we go to disk on the standby before > acknowledging the primary? > > We can generalise this as three closed questions, answered either Yes > (Synchronous) or No (Asynchronous) > > * Does WAL get forced to disk on primary at commit time? > * Does WAL get forced across link to standby at commit time? > * Does WAL get forced to disk on standby at commit time? * Does WAL get applied [and synced] to disk on standby at commit time? This is important if you want to use the standby as a read-only. I am slightly confused about what the fsync setting does to all this, hence the brackets. I think that questions 2 and 3 are trivially bundled together. Once the user can specify 2, implementing 3 should be trivial and vice versa. I am not even convinced that these need to be two different parameters. Also please note that an answer of "yes" to 3 means that 2 must also be answered "yes". > We could represent this with 3 parameters: > synchronous_commit = on | off > synchronous_standby_transfer = on | off > synchronous_standby_wal_fsync = on | off synchronous_standby_apply = on | off # just to propose a name > Changing the parameter setting at transaction-level would be expensive > if we had to set three parameters. What exactly does "expensive" mean? All three parameters can probably be set in one TCP packet from client to server. > Or we could use just a single parameter > synchronous_commit = 'AAA', 'SAA', 'SSA', 'SSS' or on |off when no > log-based replication is defined > > Having the ability to set these at the transaction-level would be very > cool. Having it set via a *single* parameter would make it much more > viable to switch between AAA for bulk, low importance data and SSS for > very low volume, critical data, or somewhere in between on the same > server, at the same time. The problem with a single parameter is that everything becomes position dependent and if whyever a new parameter is introduced, it's not easy to upgrade old application code. > So proposal in summary is > * allow various modes of synchronous replication for perf/robustness > * allow modes to be specified per-transaction > * allow modes to be specified as a single parameter How about creating named modes? This would give the user the ability to define more fine-grained control especially in larger clusters of fail-over/read-only servers without totally clogging the parameter space and application code. Whether this should be done SQL-style or in some config file is not so clear to me, although I'd prefer SQL-style like CREATE SYNCHRONIZING MODE immediate_readonly AS LOCAL SYNCHRONOUS APPLY 192.168.0.10 SYNCHRONOUS APPLY -- read-onlyslave 192.168.0.11 SYNCHRONOUS APPLY -- read-only slave 192.168.0.20 SYNCHRONOUS SHIP -- backup-server192.168.0.21 SYNCHRONOUS SHIP -- backup-server 192.168.0.30 SYNHCRONOUS FSYNC -- backup-serverwith fast disks ; and then something like synchronize_mode = immediate_readonly; Yeah, I know, give patches not pipe-dreams :) Regards, Jens-Wolfhard Schicke -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFIhoAdzhchXT4RR5ARAo/6AJ9R6LA0TsPvD/TBy6Bh1q0q5JvyKQCbBycx 1CKc8dqxnlvmH/hbi1Px+v8= =l5P4 -----END PGP SIGNATURE-----
Hi, Jens-Wolfhard Schicke wrote: >> * Does WAL get forced to disk on primary at commit time? >> * Does WAL get forced across link to standby at commit time? >> * Does WAL get forced to disk on standby at commit time? > * Does WAL get applied [and synced] to disk on standby at commit time? I think that's what Simon means by his question no 3. It wouldn't make much sense to me otherwise. I'm assuming the standby node has it's own physical format, so the changes from the remote WAL need to be transformed to a local WAL, which then needs to be written to disc. For Postgres, this pretty much means applying the changes and committing them. You never need to store the remote WAL on physical storage, what would that be good for? > I think that questions 2 and 3 are trivially bundled together. Once the > user can specify 2, implementing 3 should be trivial and vice versa. That might well be, yes. The code to collect changes from a transaction and then apply them remotely is pretty much the same, no matter when it is being executed. But it certainly makes a difference in the balance between performance and availability, which is a decision the user should be able to make for his specific application (or even better, per transaction, as proposed here and in Postgres-R). > I am not even convinced that these need to be two different parameters. Consider a standby heavily loaded (i/o) with some OLAP queries. Why should the master wait until the standby has written anything to disk for him? > Also please note that an answer of "yes" to 3 means that 2 must also > be answered "yes". Agreed. There's no 'AS' mode possible, only 'SS', 'SA' and 'SS'. > How about creating named modes? This would give the user the ability to > define more fine-grained control especially in larger clusters of fail-over/read-only > servers without totally clogging the parameter space and application code. > Whether this should be done SQL-style or in some config file is not so clear to me, > although I'd prefer SQL-style like > > CREATE SYNCHRONIZING MODE immediate_readonly AS > LOCAL SYNCHRONOUS APPLY > 192.168.0.10 SYNCHRONOUS APPLY -- read-only slave > 192.168.0.11 SYNCHRONOUS APPLY -- read-only slave > 192.168.0.20 SYNCHRONOUS SHIP -- backup-server > 192.168.0.21 SYNCHRONOUS SHIP -- backup-server > 192.168.0.30 SYNHCRONOUS FSYNC -- backup-server with fast disks > ; Hm.. that's an interesting idea. Especially considering the number of options that arise with more than two or three nodes, where you maybe also want to specify how many nodes must have written the changes to disk before confirming the commit. In Postgres-R, I've added a TRANSACTION REPLICATION LEVEL, which can be either SYNC, EAGER or LAZY. Maybe that's not quite sufficient. On the other hand, I don't think any other option here makes any sense. (Above, you yourself doubt that sync is different enough from eager). Regards Markus
On Wed, 2008-07-23 at 10:49 +1000, Jens-Wolfhard Schicke wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Simon Riggs wrote: > > Asynchronous commit controls whether we go to disk at time of commit, or > > whether we defer this slightly. We have the same options with > > replication: do we replicate at time of commit, or do we defer this > > slightly for performance reasons. DRBD and other replication systems > > show us that there is actually another difference when talking about > > synchronous replication: do we go to disk on the standby before > > acknowledging the primary? > > > > We can generalise this as three closed questions, answered either Yes > > (Synchronous) or No (Asynchronous) > > > > * Does WAL get forced to disk on primary at commit time? > > * Does WAL get forced across link to standby at commit time? > > * Does WAL get forced to disk on standby at commit time? > * Does WAL get applied [and synced] to disk on standby at commit time? > This is important if you want to use the standby as a read-only. That's an assumption - I'm not sure its a requirement in all cases. If a standby query needed to see particular data then the *query* would wait until correct data has been applied. I certainly wouldn't want to penalise writing transactions on the primary because there *might* be a statement on the standby that wishes to see an updated view. > I am slightly confused about what the fsync setting does to all this, hence > the brackets. There is no sync() during WAL apply when each individual transaction hits commit. This is because there is "no WAL" i.e. changes comes from WAL to the database, so we have no need of a second WAL to protect the changes being made. > I think that questions 2 and 3 are trivially bundled together. Once the > user can specify 2, implementing 3 should be trivial and vice versa. > I am not even convinced that these need to be two different parameters. > Also please note that an answer of "yes" to 3 means that 2 must also > be answered "yes". Yes, they are trivially bundled together, but there is benefit in doing so. The difference between 2 and 3 is about performance and levels of robustness. Waiting for transfer across link to standby (only) is much faster than waiting for transfer *and* waiting for fsync. Probably twice as fast in a tightly coupled cluster, i.e. option 3 will make your transactions somewhat more robust, but twice the response time and half the throughput. > > We could represent this with 3 parameters: > > synchronous_commit = on | off > > synchronous_standby_transfer = on | off > > synchronous_standby_wal_fsync = on | off > synchronous_standby_apply = on | off # just to propose a name > > > Changing the parameter setting at transaction-level would be expensive > > if we had to set three parameters. > What exactly does "expensive" mean? All three parameters can probably be set > in one TCP packet from client to server. Expensive as in we need to parse and handle each statement separately. If we have a single parameter then much lower overhead. > > Or we could use just a single parameter > > synchronous_commit = 'AAA', 'SAA', 'SSA', 'SSS' or on |off when no > > log-based replication is defined > > > > Having the ability to set these at the transaction-level would be very > > cool. Having it set via a *single* parameter would make it much more > > viable to switch between AAA for bulk, low importance data and SSS for > > very low volume, critical data, or somewhere in between on the same > > server, at the same time. > The problem with a single parameter is that everything becomes position > dependent and if whyever a new parameter is introduced, it's not easy to > upgrade old application code. True, but what new parameter do you imagine? > > So proposal in summary is > > * allow various modes of synchronous replication for perf/robustness > > * allow modes to be specified per-transaction > > * allow modes to be specified as a single parameter > How about creating named modes? Good idea > This would give the user the ability to > define more fine-grained control especially in larger clusters of fail-over/read-only > servers without totally clogging the parameter space and application code. > Whether this should be done SQL-style or in some config file is not so clear to me, > although I'd prefer SQL-style like > > CREATE SYNCHRONIZING MODE immediate_readonly AS > LOCAL SYNCHRONOUS APPLY > 192.168.0.10 SYNCHRONOUS APPLY -- read-only slave > 192.168.0.11 SYNCHRONOUS APPLY -- read-only slave > 192.168.0.20 SYNCHRONOUS SHIP -- backup-server > 192.168.0.21 SYNCHRONOUS SHIP -- backup-server > 192.168.0.30 SYNHCRONOUS FSYNC -- backup-server with fast disks > ; Thats not how we define parameter values, so no. > and then something like > > synchronize_mode = immediate_readonly; > > Yeah, I know, give patches not pipe-dreams :) Ah yes. Of course. The only sensible options are these four: AAA SAA SSA SSS plus the existing on & off So we give them 4 names and set it using a single parameter value. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, Simon Riggs wrote: > There is no sync() during WAL apply when each individual transaction > hits commit. This is because there is "no WAL" i.e. changes comes from > WAL to the database, so we have no need of a second WAL to protect the > changes being made. Aha, that makes much more sense to me now. So you are not just using the WAL to transfer changes, but the remote WAL replaces the one on the standby completely. Do you have replication to multiple slaves on the radar? What I still don't understand is, why you are speaking about "logical" replication. It rather sounds like an ordinary log shipping approach, where the complete WAL is sent over the wire. Nothing wrong with that, it certainly fits many needs and I welcome such a solution for Postgres. In another thread, you are saying that application of the WAL on the standby is "logical". This sounds like one could choose to skip certain parts of the WAL on the standby, but still the complete WAL would have to be sent. So this would only save a bit of i/o on the standby, right? > Expensive as in we need to parse and handle each statement separately. > If we have a single parameter then much lower overhead. Is that really much of a concern when otherwise caring about network and i/o latency? And what about sane default settings per session and database, so you won't even need to explicitly set them for the majority of transactions? Regards Markus
On Sat, 2008-07-26 at 10:17 +0200, Markus Wanner wrote: > > > Expensive as in we need to parse and handle each statement > separately. > > If we have a single parameter then much lower overhead. > > Is that really much of a concern when otherwise caring about network > and i/o latency? I believe so. Jens-Wolfhard has provided the solution it seems. > And what about sane default settings per session and > database, so you won't even need to explicitly set them for the > majority of transactions? Session pools. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Sat, 2008-07-26 at 10:17 +0200, Markus Wanner wrote: > What I still don't understand is, why you are speaking about > "logical" > replication. It rather sounds like an ordinary log shipping approach, > where the complete WAL is sent over the wire. Nothing wrong with > that, > it certainly fits many needs and I welcome such a solution for > Postgres. Yes, first stage is log shipping. Second stage is either physical or logical apply. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > We can generalise this as three closed questions, answered either Yes > (Synchronous) or No (Asynchronous) > > * Does WAL get forced to disk on primary at commit time? > * Does WAL get forced across link to standby at commit time? > * Does WAL get forced to disk on standby at commit time? > > In code, these are simple if tests: Do we wait, or not? > > We could represent this with 3 parameters: > synchronous_commit = on | off > synchronous_standby_transfer = on | off > synchronous_standby_wal_fsync = on | off > ... > Or we could use just two parameters: > synchronous_commit = on | off > synchronous_replication = 'AA', 'SA' or 'SS' > with A = Asynchronous, S = Synchronous > which corresponds with DRBD's algorithms like this > DRBD A = AA > DRBD B = SA > DRBD C = SS > > Or we could use just a single parameter > synchronous_commit = 'AAA', 'SAA', 'SSA', 'SSS' or on |off when no > log-based replication is defined First, I love this kind of control. Second, I think our current control for the main/master are sufficient that we don't need to add additional variables for this, so we can focus just on whether we wait for the slave. Third, we can control whether the slave WAL fsync's by setting things on the slave, so I don't think we need a master setting for that. Fourth, I don't think the boolean idea will work because there are boolean combinations that are illogical, like having synchronous_standby_transfer off and synchronous_standby_wal_fsync on. I think you need to make it an enumerated type like log_min_messages; something like: wal_transfer_wait with values of: nothing: have network traffic send WAL as needednetflush: wait for flush of WAL network packets to slaveprocess: wait for slave to process WAL traffic and optionally fsync The 'process' option either waits for fsync on the slave or not depending on how the slave is configured, meaning you could use synchronous_commit off to not wait for the fsync to disk. I think we can add a table in the documention to show how to set things up in postgresql.conf for async master and async slave, and other combinations. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Simon Riggs wrote: > > On Sat, 2008-07-26 at 10:17 +0200, Markus Wanner wrote: > > What I still don't understand is, why you are speaking about > > "logical" > > replication. It rather sounds like an ordinary log shipping approach, > > where the complete WAL is sent over the wire. Nothing wrong with > > that, > > it certainly fits many needs and I welcome such a solution for > > Postgres. > > Yes, first stage is log shipping. Second stage is either physical or > logical apply. What is the attraction of logical application of the WAL logs? Transmitting to a server with different architecture? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, 2008-08-12 at 11:52 -0400, Bruce Momjian wrote: > Simon Riggs wrote: > > > > On Sat, 2008-07-26 at 10:17 +0200, Markus Wanner wrote: > > > What I still don't understand is, why you are speaking about > > > "logical" > > > replication. It rather sounds like an ordinary log shipping approach, > > > where the complete WAL is sent over the wire. Nothing wrong with > > > that, > > > it certainly fits many needs and I welcome such a solution for > > > Postgres. > > > > Yes, first stage is log shipping. Second stage is either physical or > > logical apply. > > What is the attraction of logical application of the WAL logs? > Transmitting to a server with different architecture? Yes, * different release * different encoding * different CPU architecture * (with the correct transform) a different DBMS So logical apply can provide a route for data transfer between applications, not just replication for DR or HA. Physical apply works and will be more performant, but it will always be more restrictive. So there are arguments for doing it both ways. I believe that we can and should offer both options to provide customer choice. Ideally, it would be nice to be able to switch between the two without significant reconfiguration, but that's definitely not for this release. (Pragmatically, implementation will be limited by my funding.) -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Tue, 2008-08-12 at 11:51 -0400, Bruce Momjian wrote: > I think you need to make it an enumerated type like log_min_messages; > something like: > > wal_transfer_wait Yeh, that way sounds best and I like name. > with values of: > > nothing: have network traffic send WAL as needed > netflush: wait for flush of WAL network packets to slave > process: wait for slave to process WAL traffic and > optionally fsync Suggest async syncnet syncdisk > The 'process' option either waits for fsync on the slave or not > depending on how the slave is configured, meaning you could use > synchronous_commit off to not wait for the fsync to disk. Hmmm, not sure that flicking a switch on the standby should cause a loss of performance on the master. That will be an accident waiting to happen. Best to make all things that effect the performance/robustness of the master be configuration options on the master side. > I think we can add a table in the documention to show how to set things > up in postgresql.conf for async master and async slave, and other > combinations. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > > What is the attraction of logical application of the WAL logs? > > Transmitting to a server with different architecture? > > Yes, > > * different release > * different encoding > * different CPU architecture > * (with the correct transform) a different DBMS > > So logical apply can provide a route for data transfer between > applications, not just replication for DR or HA. > > Physical apply works and will be more performant, but it will always be > more restrictive. So there are arguments for doing it both ways. > > I believe that we can and should offer both options to provide customer > choice. Ideally, it would be nice to be able to switch between the two > without significant reconfiguration, but that's definitely not for this > release. I assume you are focusing on physical first, becuase that is easiest. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Simon Riggs wrote: > > On Tue, 2008-08-12 at 11:51 -0400, Bruce Momjian wrote: > > I think you need to make it an enumerated type like log_min_messages; > > something like: > > > > wal_transfer_wait > > Yeh, that way sounds best and I like name. > > > with values of: > > > > nothing: have network traffic send WAL as needed > > netflush: wait for flush of WAL network packets to slave > > process: wait for slave to process WAL traffic and > > optionally fsync > > Suggest > async > syncnet > syncdisk I think the first two are fine, but 'syncdisk' might be wrong if the slave has 'synchronous_commit = off'. Any ideas? > > The 'process' option either waits for fsync on the slave or not > > depending on how the slave is configured, meaning you could use > > synchronous_commit off to not wait for the fsync to disk. > > Hmmm, not sure that flicking a switch on the standby should cause a loss > of performance on the master. That will be an accident waiting to > happen. Best to make all things that effect the performance/robustness > of the master be configuration options on the master side. Well, we are not removing synchronous_commit so a slave certainly could be configured to be async, so I don't see that making the master set this makes any sense, unless you want the master to somehow override that. So, I think there are three options, async, syncnet, and perhaps 'commit', meaning the slave committed the transactions. That can be sync slave or async slave, based on the configuration of the slave. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, 2008-08-12 at 12:54 -0400, Bruce Momjian wrote: > Simon Riggs wrote: > > > > On Tue, 2008-08-12 at 11:51 -0400, Bruce Momjian wrote: > > > I think you need to make it an enumerated type like log_min_messages; > > > something like: > > > > > > wal_transfer_wait > > > > Yeh, that way sounds best and I like name. > > > > > with values of: > > > > > > nothing: have network traffic send WAL as needed > > > netflush: wait for flush of WAL network packets to slave > > > process: wait for slave to process WAL traffic and > > > optionally fsync > > > > Suggest > > async > > syncnet > > syncdisk > > I think the first two are fine, but 'syncdisk' might be wrong if the slave > has 'synchronous_commit = off'. Any ideas? Yes, synchronous_commit can be set in the postgresql.conf, but its great advantage is it is a userset parameter. The main point of the post is that the parameter would be transaction controlled, so *must* be set in the transaction and thus *must* be set on the master. Otherwise the capability is not available in the way I am describing. synchronous_commit applies to transaction commits. The code path would be completely different here, so having parameter passed as an info byte from master will not cause code structure problems or performance problems. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > > > > with values of: > > > > > > > > nothing: have network traffic send WAL as needed > > > > netflush: wait for flush of WAL network packets to slave > > > > process: wait for slave to process WAL traffic and > > > > optionally fsync > > > > > > Suggest > > > async > > > syncnet > > > syncdisk > > > > I think the first two are fine, but 'syncdisk' might be wrong if the slave > > has 'synchronous_commit = off'. Any ideas? > > Yes, synchronous_commit can be set in the postgresql.conf, but its great > advantage is it is a userset parameter. > > The main point of the post is that the parameter would be transaction > controlled, so *must* be set in the transaction and thus *must* be set > on the master. Otherwise the capability is not available in the way I am > describing. Oh, so synchronous_commit would not control WAL sync on the slave? What about our fsync parameter? Because the slave is read-only, I saw no disadvantage of setting synchronous_commit to off in postgresql.conf on the slave. > synchronous_commit applies to transaction commits. The code path would > be completely different here, so having parameter passed as an info byte > from master will not cause code structure problems or performance > problems. OK, I was just trying to simplify it. The big problem with an async slave is that not only would you have lost data in a failover, but the database might be inconsistent, like fsync = off, which is something I think we want to try to avoid, which is why I was suggesting synchronous_commit = off. Or were you thinking of always doing fsync on the slave, no matter what. I am worried the slave might not be able to keep up (being single-threaded) and therefore we should allow a way to async commit on the slave. Certainly if the master is async sending the data, there is no need to do a synchronous_commit on the slave. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Simon Riggs <simon@2ndquadrant.com> writes: > On Tue, 2008-08-12 at 11:52 -0400, Bruce Momjian wrote: >> What is the attraction of logical application of the WAL logs? >> Transmitting to a server with different architecture? > Yes, > * different release > * different encoding > * different CPU architecture > * (with the correct transform) a different DBMS The notion that the WAL logs will ever be portable across such differences is so ... so ... well, it's barely worth laughing at. regards, tom lane
On Tue, 2008-08-12 at 15:40 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > On Tue, 2008-08-12 at 11:52 -0400, Bruce Momjian wrote: > >> What is the attraction of logical application of the WAL logs? > >> Transmitting to a server with different architecture? > > > Yes, > > > * different release > > * different encoding > > * different CPU architecture > > * (with the correct transform) a different DBMS > > The notion that the WAL logs will ever be portable across such > differences is so ... so ... well, it's barely worth laughing at. I expect to begin discussion of how that might be achieved in a few days. I understand the starting place for your thinking, but am not deterred because I see some solutions. I feel certain you will point out some difficulties, but I think it is worth doing. Please lets start those discussions from a neutral point. If you find a real showstopper, then so be it. Sorry for the delay. Bruce was asking about why I might want that. If we can at least agree there is a use case then it helps. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi Tom, Could you expand on why logical application of WAL records is impractical in these cases? This is what Oracle does. Moreover once you are into SQL a lot of other use cases immediately become practical, such as large scale master/slave set-ups for read scaling. Thanks, Robert On 8/12/08 12:40 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndquadrant.com> writes: >> On Tue, 2008-08-12 at 11:52 -0400, Bruce Momjian wrote: >>> What is the attraction of logical application of the WAL logs? >>> Transmitting to a server with different architecture? > >> Yes, > >> * different release >> * different encoding >> * different CPU architecture >> * (with the correct transform) a different DBMS > > The notion that the WAL logs will ever be portable across such > differences is so ... so ... well, it's barely worth laughing at. > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Robert Hodges, CTO, Continuent, Inc. Email: robert.hodges@continuent.com Mobile: +1-510-501-3728 Skype: hodgesrm
Hi, Robert Hodges wrote: > Could you expand on why logical application of WAL records is impractical in > these cases? This is what Oracle does. Moreover once you are into SQL a > lot of other use cases immediately become practical, such as large scale > master/slave set-ups for read scaling. I cannot speak for Tom, but what strikes me as a strange approach here is using the WAL for "logical application" of changes. That's because the WAL is quite far away from SQL, and thus from a "logical representation" of the data. It's rather pretty physical, meaning it's bound to a certain Postgres release and CPU architecture. A more "logical" exchange format certainly poses less problems across releases, encodings and CPU architectures. Or even across RDMSen. But hey, let's see what Simon comes up with... Regards Markus Wanner
Markus Wanner <markus@bluegap.ch> writes: > Robert Hodges wrote: >> Could you expand on why logical application of WAL records is impractical in >> these cases? This is what Oracle does. Moreover once you are into SQL a >> lot of other use cases immediately become practical, such as large scale >> master/slave set-ups for read scaling. > I cannot speak for Tom, but what strikes me as a strange approach here > is using the WAL for "logical application" of changes. That's because > the WAL is quite far away from SQL, and thus from a "logical > representation" of the data. It's rather pretty physical, meaning it's > bound to a certain Postgres release and CPU architecture. Right. To take just one example: the value of MAXALIGN affects not only how many tuples you can put on a heap page (thus changing TIDs of tuples, which fundamentally breaks most of the current types of WAL records) but how many tuples you can put on an index page (and thus index page split decisions, and thereby pretty much every single fact about the contents of upper btree levels). We need not go into architecture dependencies that are stronger than that one, though there are many. As for version independence, who thinks they can WAL-replicate changes of the system catalogs into a different version with significantly different system catalogs? You couldn't even begin to make this work with anything approaching the current level of semantic detail of WAL entries. What I think Simon was actually driving at was query-shipping, which is not my idea of "WAL" at all. It has some usefulness, but also a bunch of downsides of its very own, mostly centered around reproducibility. With the current WAL design I have some faith that the slaves reproduce the contents of the master. With any "logical replication" design that becomes a faith of the religious kind, because it sure isn't provable. regards, tom lane
Tom Lane wrote: > What I think Simon was actually driving at was query-shipping, which is > not my idea of "WAL" at all. It has some usefulness, but also a bunch > of downsides of its very own, mostly centered around reproducibility. > With the current WAL design I have some faith that the slaves reproduce > the contents of the master. With any "logical replication" design that > becomes a faith of the religious kind, because it sure isn't provable. Actually I think the idea here is to take certain WAL records, translate them into "portable" constructs, ship them, and let the slave handle the remaining tasks that need to be done with it. For example you would only ship heap insert, not index insert; the slave needs to take this insert and derive the appropriate index operations that the slave needs. What's needed here is a process that is able to read (or receive from the backend calling the XLogInsert) WAL records and at the same time have catalog access for getting type I/O data, converting relation OIDs into table names if needed, etc. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
<font face="Verdana, Helvetica, Arial"><span style="font-size:12.0px">Hi Tom, <br /><br /> Part of this is semantics—I likeSimon’s logical vs. physical terminology because it distinguishes neatly between replication that copies implementationdown to OIDs etc. and replication that copies data content including schema changes but not implementation. It seems a noble goal get both to work well, as they are quite complementary. <br /><br /> There are variousways to get information to recapitulate SQL, but piggy-backing off WAL record generation has a lot of advantages. You at least have the data structures and don’t have to reverse-engineer log information on disk. Of the multipleways to build capable logical replication solutions, this seems to involve the least effort. <br /><br /> My companyis currently heads down building a solution for Oracle based on reading REDO log files. It requires a master of Oracledark arts to decode them and is also purely asynchronous. PostgreSQL will eventually be far better as these discussionsboil down into designs. <br /><br /> Thanks, Robert<br /><br /> On 8/12/08 8:51 PM, "Tom Lane" <tgl@sss.pgh.pa.us>wrote:<br /><br /></span></font><blockquote><font face="Verdana, Helvetica, Arial"><span style="font-size:12.0px">MarkusWanner <markus@bluegap.ch> writes:<br /> > Robert Hodges wrote:<br /> >> Couldyou expand on why logical application of WAL records is impractical in<br /> >> these cases? This is what Oracledoes. Moreover once you are into SQL a<br /> >> lot of other use cases immediately become practical, such aslarge scale<br /> >> master/slave set-ups for read scaling.<br /><br /> > I cannot speak for Tom, but what strikesme as a strange approach here<br /> > is using the WAL for "logical application" of changes. That's because<br/> > the WAL is quite far away from SQL, and thus from a "logical<br /> > representation" of the data. It'srather pretty physical, meaning it's<br /> > bound to a certain Postgres release and CPU architecture.<br /><br />Right. To take just one example: the value of MAXALIGN affects not only<br /> how many tuples you can put on a heap page(thus changing TIDs of tuples,<br /> which fundamentally breaks most of the current types of WAL records)<br /> but howmany tuples you can put on an index page (and thus index page<br /> split decisions, and thereby pretty much every singlefact about the<br /> contents of upper btree levels). We need not go into architecture<br /> dependencies that arestronger than that one, though there are many.<br /> As for version independence, who thinks they can WAL-replicate changes<br/> of the system catalogs into a different version with significantly<br /> different system catalogs?<br /><br/> You couldn't even begin to make this work with anything approaching the<br /> current level of semantic detail ofWAL entries.<br /><br /> What I think Simon was actually driving at was query-shipping, which is<br /> not my idea of "WAL"at all. It has some usefulness, but also a bunch<br /> of downsides of its very own, mostly centered around reproducibility.<br/> With the current WAL design I have some faith that the slaves reproduce<br /> the contents of the master. With any "logical replication" design that<br /> becomes a faith of the religious kind, because it sure isn't provable.<br/><br /> regards, tom lane<br /><br /></span></font></blockquote><font face="Verdana,Helvetica, Arial"><span style="font-size:12.0px"><br /><br /> -- <br /> Robert Hodges, CTO, Continuent, Inc.<br/> Email: robert.hodges@continuent.com<br /> Mobile: +1-510-501-3728 Skype: hodgesrm<br /></span></font>
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> What I think Simon was actually driving at was query-shipping, which is >> not my idea of "WAL" at all. It has some usefulness, but also a bunch >> of downsides of its very own, mostly centered around reproducibility. > Actually I think the idea here is to take certain WAL records, translate > them into "portable" constructs, ship them, and let the slave handle the > remaining tasks that need to be done with it. For example you would > only ship heap insert, not index insert; the slave needs to take this > insert and derive the appropriate index operations that the slave needs. Oooh, so we'll run user-defined index functions during WAL replay. Yessir, *that* will be reliable and reproducible. In any case, you didn't answer the point about heap TIDs not matching across architectures. That seems at minimum to require that UPDATE and DELETE identify target tuples by primary key instead of TID. Which requires for starters that all your tables *have* a primary key, and for seconds that the replay environment be capable of identifying the pkey and being able to do lookup operations using it. Whether or not you'd like to mandate that every table have a pkey, that certainly takes out any notion that a "WAL" of this species can be used for crash recovery, because there is 0 chance that it will work reliably in a crash-corrupted database. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Tom Lane wrote: > >> What I think Simon was actually driving at was query-shipping, which is > >> not my idea of "WAL" at all. It has some usefulness, but also a bunch > >> of downsides of its very own, mostly centered around reproducibility. > > > Actually I think the idea here is to take certain WAL records, translate > > them into "portable" constructs, ship them, and let the slave handle the > > remaining tasks that need to be done with it. For example you would > > only ship heap insert, not index insert; the slave needs to take this > > insert and derive the appropriate index operations that the slave needs. > > Oooh, so we'll run user-defined index functions during WAL replay. > Yessir, *that* will be reliable and reproducible. Hmm, I don't know what Simon was thinking, but I think it would be acceptable to both parties to have WAL stay as it currently is (and used for crash recovery), and only the logical record (derived from this WAL record) be sent to the slave. The reason for not having the "logical record" be shipped straight away at the time of operation is that this system would need to resume sending logical record if the shipping system crashed (i.e. some committed transactions are in WAL but have not been shipped yet.) > In any case, you didn't answer the point about heap TIDs not matching > across architectures. That seems at minimum to require that UPDATE > and DELETE identify target tuples by primary key instead of TID. Yep -- the PK would be required. Alternatively, one could accept tables that have no PKs but for which no UPDATEs or DELETEs are allowed on the shipping system. (This could be useful for log-type tables that you want to replicate.) > Which requires for starters that all your tables *have* a primary > key, and for seconds that the replay environment be capable of > identifying the pkey and being able to do lookup operations using it. Possibly the requirement would be that the replay system would have the same PK as the shipping system. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hi, Robert Hodges wrote: > I like Simon’s logical vs. physical terminology So far, it seems to mainly have caused confusion (physical replication, but logical application? logical replication using WAL shipping?). At least I still prefer the more meaningful and descriptive terms, like "log shipping", "statement based replication" or "row based replication". But maybe, what Simon is about to propose just doesn't fit into any of those categories. I have a similar problem with Postgres-R, which is somewhere in between synchronous and asynchronous. Regards Markus Wanner
Hi, Alvaro Herrera wrote: > Actually I think the idea here is to take certain WAL records, translate > them into "portable" constructs, ship them, At which point it clearly shouldn't be called a WAL shipping method. What would it have to do with the WAL at all, then? Why translate from WAL records at all, better use the real tuples right away. (Almost needless to say that here, but obviously Postgres-R does it that way). So far, Simon really seems to mean WAL shipping: "it allows WAL to be used as the replication transport", see [1]. Regards Markus Wanner [1]: mail to -hackers from Simon, Subject: "Plans for 8.4": http://archives.postgresql.org/pgsql-hackers/2008-07/msg01010.php
On Tue, 2008-08-12 at 13:33 -0400, Bruce Momjian wrote: > Simon Riggs wrote: > > > > > with values of: > > > > > > > > > > nothing: have network traffic send WAL as needed > > > > > netflush: wait for flush of WAL network packets to slave > > > > > process: wait for slave to process WAL traffic and > > > > > optionally fsync > > > > > > > > Suggest > > > > async > > > > syncnet > > > > syncdisk > > > > > > I think the first two are fine, but 'syncdisk' might be wrong if the slave > > > has 'synchronous_commit = off'. Any ideas? > > > > Yes, synchronous_commit can be set in the postgresql.conf, but its great > > advantage is it is a userset parameter. > > > > The main point of the post is that the parameter would be transaction > > controlled, so *must* be set in the transaction and thus *must* be set > > on the master. Otherwise the capability is not available in the way I am > > describing. > > Oh, so synchronous_commit would not control WAL sync on the slave? What > about our fsync parameter? Because the slave is read-only, I saw no > disadvantage of setting synchronous_commit to off in postgresql.conf on > the slave. The setting of synchronous_commit will be important if the standby becomes the primary. I can see many cases where we might want "syncnet" mode (i.e. no fsync of WAL data to disk on standby) and yet want synchronous_commit=on when it becomes primary. So if we were to use same parameters it would be confusing. > > synchronous_commit applies to transaction commits. The code path would > > be completely different here, so having parameter passed as an info byte > > from master will not cause code structure problems or performance > > problems. > > OK, I was just trying to simplify it. I understood why you've had those thoughts and commend the lateral thinking. I just don't think that on this occasion we've discovered any better ways of doing it. > The big problem with an async > slave is that not only would you have lost data in a failover, but the > database might be inconsistent, like fsync = off, which is something I > think we want to try to avoid, which is why I was suggesting > synchronous_commit = off. > > Or were you thinking of always doing fsync on the slave, no matter what. > I am worried the slave might not be able to keep up (being > single-threaded) and therefore we should allow a way to async commit on > the slave. Bit confused here. I've not said I want always async, neither have I said I want always sync. The main thing is we agree there will be 3 settings, including two variants of synchronous replication one fairly safe and one ultra safe. For the ultra safe mode we really need to see how synch replication will work before we comment on where we might introduce fsyncs. I'm presuming that incoming WAL will be written to WAL files (and optionally fsynced). You might be talking about applying WAL records to the database and then fsyncing them, but we do need to allow for crash recovery of the standby server, so the data must be synced to WAL files before it is synced to database. > Certainly if the master is async sending the data, there is > no need to do a synchronous_commit on the slave. Agreed -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, Robert Hodges wrote: > Part of this is semantics—I like Simon’s logical vs. physical > terminology because it distinguishes neatly between replication that > copies implementation down to OIDs etc. and replication that copies data > content including schema changes but not implementation. So far, these terms have mostly caused confusion for me: "logical replication using WAL shipping", "physical replication, but logical application"... As Simon didn't explain in more details, what he has in mind, we all have our own and quite different interpretations. These terms obviously haven't helped to clarify the issue until now. > It seems a > noble goal get both to work well, as they are quite complementary. Agreed. > There are various ways to get information to recapitulate SQL, but > piggy-backing off WAL record generation has a lot of advantages. You at > least have the data structures and don’t have to reverse-engineer log > information on disk. Of the multiple ways to build capable logical > replication solutions, this seems to involve the least effort. We even have the real tuple, which is about the closest you can get to being a "logical representation". Using that clearly requires less efforts than converting a WAL record back to a logical tuple. For example, it allows the optimization of sending only differences to the old tuple for UPDATES, instead of always sending full tuples - see Postgres-R for a partially working implementation. > My company is currently heads down building a solution for Oracle based > on reading REDO log files. It requires a master of Oracle dark arts to > decode them and is also purely asynchronous. That sounds pretty challenging. Good luck! Regards Markus Wanner
On Wed, 2008-08-13 at 11:27 +0200, Markus Wanner wrote: > Hi, > > Robert Hodges wrote: > > Part of this is semantics—I like Simon’s logical vs. physical > > terminology because it distinguishes neatly between replication that > > copies implementation down to OIDs etc. and replication that copies data > > content including schema changes but not implementation. > > So far, these terms have mostly caused confusion for me: "logical > replication using WAL shipping", "physical replication, but logical > application"... > > As Simon didn't explain in more details, what he has in mind, we all > have our own and quite different interpretations. These terms obviously > haven't helped to clarify the issue until now. Classification of Replication Techniques --------------------------------------- We can consider that there are two stages to replication * transfer or shipping of data to second node(s) * apply mechanism Apply mechanism can be either logical, where we execute SQL, or physical, where we bypass the SQL layer and do this at lower level. It is possible to have 1. logical shipping, logical apply 2. physical shipping, logical apply 3. physical shipping, physical apply Mammoth and Postgres-R are both type 1 replication systems, since they stream data to second node in a form that makes SQL reconstruction easier. Slony is also type 1, using triggers. So we have another classification 1 Logical data shipping a) Trigger based b) Additional internals based approaches "Warm Standby" log shipping is type 3. We ship the WAL and apply it directly. Easy and good. Type 2 is where you ship the WAL (efficient) then use it to reconstruct SQL (flexible) and then apply that to other nodes. It is somewhat harder than type 1, but requires less infrastructure (IMHO). Definitely requires less data shipping from Primary node, so very possibly more efficient. [Tom is absolutely right to say this is "impossible". Currently, it is, but that's why we develop]. 1a can't be synchronous by definition, but 1b and other approaches can be. Otherwise above definitions not relevant to sync/async capability. Note that DRBD is also a type 3 system, but uses filesystem level physical data. Since this is outside of control of Postgres, I discount this approach because it cannot deliver transaction-controlled synch replication. Previously, most RDBMS vendors supported type 1a) systems. They have now moved to type 2 and 3 systems. Both DB2 and Oracle support a type 2 *and* a type 3 replication system. The reasons they do this are valid for us also, so I suggest that we do the same. So for me, it is not about whether we do type 2 or type 3, I think we should do both. Doing both may not be possible in this release: I have one potential sponsor for type 2, and have made public call for work on query access to type 3 systems. I recognise that some people that like type 3 do not like type 2 and possibly vice versa. If (not when) I do work on type 2 systems it will be only to provide that as a "transport" option for other technologies. Not as a full-blown replication system. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, Simon Riggs wrote: > Classification of Replication Techniques Thanks for your classifications. It helps a great deal to clarify. > Type 2 is where you ship the WAL (efficient) then use it to reconstruct > SQL (flexible) and then apply that to other nodes. It is somewhat harder > than type 1, but requires less infrastructure (IMHO). Definitely > requires less data shipping from Primary node, so very possibly more > efficient. What leads you to that conclusion? AFAICT a logical format, specifically designed for replication is quite certainly more compact than the WAL (assuming that's what you mean by "less data"). The only efficiency gain I can see compared to type 1 is, that most of the processing work is offloaded from the master to the slave(s). For setups with multiple slaves, that's a bad trade-off, IMO. > Previously, most RDBMS vendors supported type 1a) systems. They have now > moved to type 2 and 3 systems. Both DB2 and Oracle support a type 2 > *and* a type 3 replication system. The reasons they do this are valid > for us also, so I suggest that we do the same. So for me, it is not > about whether we do type 2 or type 3, I think we should do both. I currently don't think type 2 is doable with any reasonable effort, but hey, I'm always open for surprises. :-) Which of IBM's and Oracle's products are you referring to? Regards Markus Wanner
On Wed, 2008-08-13 at 15:38 +0200, Markus Wanner wrote: > Simon Riggs wrote: > > Classification of Replication Techniques > > Thanks for your classifications. It helps a great deal to clarify. > > > Type 2 is where you ship the WAL (efficient) then use it to reconstruct > > SQL (flexible) and then apply that to other nodes. It is somewhat harder > > than type 1, but requires less infrastructure (IMHO). Definitely > > requires less data shipping from Primary node, so very possibly more > > efficient. > > What leads you to that conclusion? AFAICT a logical format, specifically > designed for replication is quite certainly more compact than the WAL > (assuming that's what you mean by "less data"). Possibly, but since we are generating and writing WAL anyway that's not a completely fair comparison. > Which of IBM's and Oracle's products are you referring to? IBM DB2 HADR, QReplication. Oracle Streams 10g+, Data Guard Logical and Physical Standby All of which I've personally used, except for Oracle Streams10g, which I investigated thoroughly for a client about 4 years ago. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > > On Wed, 2008-08-13 at 15:38 +0200, Markus Wanner wrote: > > > Simon Riggs wrote: > > > Classification of Replication Techniques > > > > Thanks for your classifications. It helps a great deal to clarify. > > > > > Type 2 is where you ship the WAL (efficient) then use it to reconstruct > > > SQL (flexible) and then apply that to other nodes. It is somewhat harder > > > than type 1, but requires less infrastructure (IMHO). Definitely > > > requires less data shipping from Primary node, so very possibly more > > > efficient. > > > > What leads you to that conclusion? AFAICT a logical format, specifically > > designed for replication is quite certainly more compact than the WAL > > (assuming that's what you mean by "less data"). > > Possibly, but since we are generating and writing WAL anyway that's not > a completely fair comparison. > > > Which of IBM's and Oracle's products are you referring to? > > IBM DB2 HADR, QReplication. > Oracle Streams 10g+, Data Guard Logical and Physical Standby > All of which I've personally used, except for Oracle Streams10g, which I > investigated thoroughly for a client about 4 years ago. I think doing the WAL streaming and allowing a read-only slave is enough work to keep Simon busy for quite some time. I don't understand why the logical issue is being discussed at this stage --- let's get the other stuff done first. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Simon Riggs wrote: > > > The main point of the post is that the parameter would be transaction > > > controlled, so *must* be set in the transaction and thus *must* be set > > > on the master. Otherwise the capability is not available in the way I am > > > describing. > > > > Oh, so synchronous_commit would not control WAL sync on the slave? What > > about our fsync parameter? Because the slave is read-only, I saw no > > disadvantage of setting synchronous_commit to off in postgresql.conf on > > the slave. > > The setting of synchronous_commit will be important if the standby > becomes the primary. I can see many cases where we might want "syncnet" > mode (i.e. no fsync of WAL data to disk on standby) and yet want > synchronous_commit=on when it becomes primary. > > So if we were to use same parameters it would be confusing. I disagree. If they make it the master they change the setting. Also, if we choose async slave I would expect the consistency of synchronous_commit rather than fsync=off's database inconsistency. > > The big problem with an async > > slave is that not only would you have lost data in a failover, but the > > database might be inconsistent, like fsync = off, which is something I > > think we want to try to avoid, which is why I was suggesting > > synchronous_commit = off. > > > > Or were you thinking of always doing fsync on the slave, no matter what. > > I am worried the slave might not be able to keep up (being > > single-threaded) and therefore we should allow a way to async commit on > > the slave. > > Bit confused here. I've not said I want always async, neither have I > said I want always sync. > > The main thing is we agree there will be 3 settings, including two > variants of synchronous replication one fairly safe and one ultra safe. > > For the ultra safe mode we really need to see how synch replication will > work before we comment on where we might introduce fsyncs. I'm presuming > that incoming WAL will be written to WAL files (and optionally fsynced). > You might be talking about applying WAL records to the database and then > fsyncing them, but we do need to allow for crash recovery of the standby > server, so the data must be synced to WAL files before it is synced to > database. I was talking about fsync'ing the WAL when each entry arrives from the master. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Wed, 2008-08-13 at 11:17 -0400, Bruce Momjian wrote: > I think doing the WAL streaming and allowing a read-only slave is > enough work to keep Simon busy for quite some time. I don't > understand why the logical issue is being discussed at this stage --- > let's get the other stuff done first. I'm not working on WAL streaming. My plans assume someone else is and I'll be as worried as you if the current answer is "nobody". I expect to be involved in reviewing it fairly closely though... Other than that, my time is limited only by how much funding I have. Or put another way, the constraining factor is money, not available time. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, 2008-08-13 at 11:27 -0400, Bruce Momjian wrote: > I disagree. If they make it the master they change the setting. It's not acceptable to force people to edit a configuration file when failover occurs. Some people wish to automate this and fumbling parameter values at this important time is definitely inappropriate. We gain nothing by asking people do do things that way. Plus, as I have said: if it is controlled on the Standby then it will not be transaction-controlled and this will be a useful thing. I asked myself "where would I like to be set?" The answer was "on the master". If you think differently, please say why. Yes, we can set it on the standby, but I see no reason to do so. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > > On Wed, 2008-08-13 at 11:17 -0400, Bruce Momjian wrote: > > > I think doing the WAL streaming and allowing a read-only slave is > > enough work to keep Simon busy for quite some time. I don't > > understand why the logical issue is being discussed at this stage --- > > let's get the other stuff done first. > > I'm not working on WAL streaming. My plans assume someone else is and > I'll be as worried as you if the current answer is "nobody". Oh, OK, good. I was unclear on that. NTT is working with EnterpriseDB on the WAL steaming code to be released to the community. > I expect to be involved in reviewing it fairly closely though... Yes, of course. > Other than that, my time is limited only by how much funding I have. Or > put another way, the constraining factor is money, not available time. That is good, I think. ;-) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Simon Riggs wrote: > > On Wed, 2008-08-13 at 11:27 -0400, Bruce Momjian wrote: > > > I disagree. If they make it the master they change the setting. > > It's not acceptable to force people to edit a configuration file when > failover occurs. Some people wish to automate this and fumbling > parameter values at this important time is definitely inappropriate. We > gain nothing by asking people do do things that way. > > Plus, as I have said: if it is controlled on the Standby then it will > not be transaction-controlled and this will be a useful thing. > > I asked myself "where would I like to be set?" The answer was "on the > master". If you think differently, please say why. Yes, we can set it on > the standby, but I see no reason to do so. Ah, I can see an advantage for the failover case to have the slave wal sync controlled from the master. My original goal was just to reduce GUC option bloat. Let's get farther down the road on this and see how it looks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Wed, 2008-08-13 at 12:55 -0400, Bruce Momjian wrote: > NTT is working with EnterpriseDB > on the WAL steaming code to be released to the community. Hopefully the code isn't steaming... :-) and that we will be able to see detailed designs and code soon. Might end up as a big pileup otherwise. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support