Thread: is sync rep stalled?
So we've got two patches that implement synchronous replication, and no agreement on which one, if either, should be committed. We have no agreement on how synchronous replication should be configured, and at most a tenuous agreement that it should involve standby registration. This is bad. This feature is important, and we need to get it done. How do we get the ball rolling again? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, Sep 29, 2010 at 11:47 AM, Robert Haas <robertmhaas@gmail.com> wrote: > So we've got two patches that implement synchronous replication, and > no agreement on which one, if either, should be committed. We have no > agreement on how synchronous replication should be configured, and at > most a tenuous agreement that it should involve standby registration. > > This is bad. > > This feature is important, and we need to get it done. How do we get > the ball rolling again? ISTM that it still takes long to make consensus on standby registration. So, how about putting the per-standby parameters in recovery.conf, and focusing on the basic features in synchronous replication at first? During that time, we can deepen discussion on standby registration, and then we can implement that. The basic features that I mean is for most basic use case, that is, one master and one synchronous standby case. In detail, > * Support multiple standbys with various synchronization levels. Not required for that case. > * What happens if a synchronous standby isn't connected at the moment? Return immediately vs. wait forever. The wait-forever option is not required for that case. Let's implement the return-immediately at first. > * Per-transaction control. Some transactions are important, others are not. Not required for that case. > * Quorum commit. Wait until n standbys acknowledge. n=1 and n=all servers can be seen as important special cases of this. Not required for that case. > * async, recv, fsync and replay levels of synchronization. At least one of three synchronous levels should be included in the first commit. I think that either recv or fsync is suitable for first try because those don't require wake-up signaling from startup process to walreceiver and are relatively easy to implement. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Sep 29, 2010 at 3:56 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Wed, Sep 29, 2010 at 11:47 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> So we've got two patches that implement synchronous replication, and >> no agreement on which one, if either, should be committed. We have no >> agreement on how synchronous replication should be configured, and at >> most a tenuous agreement that it should involve standby registration. >> >> This is bad. >> >> This feature is important, and we need to get it done. How do we get >> the ball rolling again? > > ISTM that it still takes long to make consensus on standby registration. > So, how about putting the per-standby parameters in recovery.conf, and > focusing on the basic features in synchronous replication at first? > During that time, we can deepen discussion on standby registration, and > then we can implement that. > > The basic features that I mean is for most basic use case, that is, one > master and one synchronous standby case. In detail, > >> * Support multiple standbys with various synchronization levels. > > Not required for that case. > >> * What happens if a synchronous standby isn't connected at the moment? Return immediately vs. wait forever. > > The wait-forever option is not required for that case. Let's implement > the return-immediately at first. > >> * Per-transaction control. Some transactions are important, others are not. > > Not required for that case. > >> * Quorum commit. Wait until n standbys acknowledge. n=1 and n=all servers can be seen as important special cases of this. > > Not required for that case. > >> * async, recv, fsync and replay levels of synchronization. > > At least one of three synchronous levels should be included in the first > commit. I think that either recv or fsync is suitable for first try > because those don't require wake-up signaling from startup process to > walreceiver and are relatively easy to implement. I'm not sure this really gets us anywhere. We already have two patches; writing a third one won't fix anything. We need to decide which patch can be the basis for future work. According to my understanding, the most significant difference between the patches is the way that ACKs get sent from standby to master. Whose idea is better, yours or Simon's? And why? Are there other reasons to prefer one patch to the other? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 29.09.2010 10:56, Fujii Masao wrote: > On Wed, Sep 29, 2010 at 11:47 AM, Robert Haas<robertmhaas@gmail.com> wrote: >> So we've got two patches that implement synchronous replication, and >> no agreement on which one, if either, should be committed. We have no >> agreement on how synchronous replication should be configured, and at >> most a tenuous agreement that it should involve standby registration. >> >> This is bad. >> >> This feature is important, and we need to get it done. How do we get >> the ball rolling again? Agreed. Actually, given the lack of people jumping in and telling us what they'd like to do with the feature, maybe it's not that important after all. > ISTM that it still takes long to make consensus on standby registration. > So, how about putting the per-standby parameters in recovery.conf, and > focusing on the basic features in synchronous replication at first? > During that time, we can deepen discussion on standby registration, and > then we can implement that. > > The basic features that I mean is for most basic use case, that is, one > master and one synchronous standby case. In detail, ISTM the problem is exactly that there is no consensus on what the basic use case is. I'm sure there's several things you can accomplish with synchronous replication, perhaps you could describe what the important use case for you is? >> * Support multiple standbys with various synchronization levels. > > Not required for that case. IMHO at least we'll still need to support asynchronous standbys in the same mix, that's an existing feature. >> * What happens if a synchronous standby isn't connected at the moment? Return immediately vs. wait forever. > > The wait-forever option is not required for that case. Let's implement > the return-immediately at first. > > ..- > >> * async, recv, fsync and replay levels of synchronization. > > At least one of three synchronous levels should be included in the first > commit. I think that either recv or fsync is suitable for first try > because those don't require wake-up signaling from startup process to > walreceiver and are relatively easy to implement. What is the use case for that combination? For zero data loss, you *must* wait forever if a standby isn't connected. For keeping a hot standby server up-to-date so that you can freely query the standby instead of the master, you need replay level synchronization. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2010-09-30 at 09:09 +0300, Heikki Linnakangas wrote: > On 29.09.2010 10:56, Fujii Masao wrote: > > On Wed, Sep 29, 2010 at 11:47 AM, Robert Haas<robertmhaas@gmail.com> wrote: > >> This feature is important, and we need to get it done. How do we get > >> the ball rolling again? > > Agreed. Actually, given the lack of people jumping in and telling us > what they'd like to do with the feature, maybe it's not that important > after all. I don't see anything has stalled. I've been busy for a few days, so haven't had a chance to follow up on the use cases, as suggested. I'm busy again today, so cannot reply further. Anyway, taking a few days to let us think some more about the technical comments is no bad thing. I think we need to relax about this feature some more because trying to get something actually done when basic issues need analysis is hard and that creates tension. Between us we can work out the code in a few days, once we know which code to write. What we actually need to do is talk and listen. I'd like to suggest that we have an online "focus day" (onlist) on Sync Rep on Oct 5 and maybe 6 as well?. Meeting in person is possible, but probably impractical. But a design sprint, not a code sprint. This is important and I'm sure we'll work something out. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services
On Thu, Sep 30, 2010 at 09:14:42AM +0100, Simon Riggs wrote: > On Thu, 2010-09-30 at 09:09 +0300, Heikki Linnakangas wrote: > > On 29.09.2010 10:56, Fujii Masao wrote: > > > On Wed, Sep 29, 2010 at 11:47 AM, Robert Haas<robertmhaas@gmail.com> wrote: > > > >> This feature is important, and we need to get it done. How do > > >> we get the ball rolling again? > > > > Agreed. Actually, given the lack of people jumping in and telling > > us what they'd like to do with the feature, maybe it's not that > > important after all. > > I don't see anything has stalled. I do. We're half way through this commitfest, so if no one's actually ready to commit one of the patches, I kinda have to bounce them both, at least to the next CF. The very likely outcome of that, given that it's a pretty enormous feature that involves even more enormous amounts of testing on various hardware, networks, etc., is that we don't get SR in 9.1, and you among others will be very unhappy. So yes, it is stalled, and yes, there's a real urgency to actually getting a baseline something in there in the next couple of weeks. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > On Thu, Sep 30, 2010 at 09:14:42AM +0100, Simon Riggs wrote: >> I don't see anything has stalled. > I do. We're half way through this commitfest, so if no one's actually > ready to commit one of the patches, I kinda have to bounce them both, > at least to the next CF. [ raised eyebrow ] You seem to be in an awfully big hurry to bounce stuff. The CF end is still two weeks away. But while I'm thinking about that... The actual facts on the ground are that practically no CF work has gotten done yet (at least not in my house) due to the git move and the 9.0.0 release and the upcoming back-branch releases. Maybe we shouldn't have started the CF while all that was going on, but that's water over the dam now. What we can do is rethink the scheduled end date. IMHO we should push out the end date by at least a week to reflect the lack of time spent on the CF so far. regards, tom lane
On Thu, Sep 30, 2010 at 2:09 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Agreed. Actually, given the lack of people jumping in and telling us what > they'd like to do with the feature, maybe it's not that important after all. >> The basic features that I mean is for most basic use case, that is, one >> master and one synchronous standby case. In detail, > > ISTM the problem is exactly that there is no consensus on what the basic use > case is. I'm sure there's several things you can accomplish with synchronous > replication, perhaps you could describe what the important use case for you > is? OK, So I'll throw in my ideal use case. I'm starting to play with Magnus's "streaming -> archive". *that's* what I want, with synchronous. Yes, again, I'm looking for "data durability", not "server query-ability", and I'ld like to rely on the PG user-space side of things instead of praying that replicated block-devices hold together.... If my master flips out, I'm quite happy to do a normal archive restore. Except I don't want that last 16MB (or archive timeout) of transactions lost. The streaming -> archive in it's current state get's me pretty close, but I'ld love to be able to guarantee that my recovery from that archive has *every* transaction that the master committed... a. a.
On Thu, Sep 30, 2010 at 09:52:46AM -0400, Tom Lane wrote: > David Fetter <david@fetter.org> writes: > > On Thu, Sep 30, 2010 at 09:14:42AM +0100, Simon Riggs wrote: > >> I don't see anything has stalled. > > > I do. We're half way through this commitfest, so if no one's > > actually ready to commit one of the patches, I kinda have to > > bounce them both, at least to the next CF. > > [ raised eyebrow ] You seem to be in an awfully big hurry to bounce > stuff. The CF end is still two weeks away. If people are still wrangling over the design, I'd say two weeks is a ludicrously short time, not a long one. > But while I'm thinking about that... > > The actual facts on the ground are that practically no CF work has > gotten done yet (at least not in my house) Your non-involvement in the first half or more--I'd say maybe 3 weeks or so--is precisely what commitfests are for. The point is that people who are *not* committers need to do a bunch of QA on patches, review them, get or create new patches as needed. Only then should a committer get involved. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Aidan Van Dyk <aidan@highrise.ca> wrote: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> I'm sure there's several things you can accomplish with >> synchronous replication, perhaps you could describe what the >> important use case for you is? > I'm looking for "data durability", not "server query-ability" Same here. If we used synchronous replication, the important thing for us would be to hold up the master for the minimum time required to ensure remote persistence -- not actual application to the remote database. We could tolerate some WAL replay time on recovery better than poor commit performance on the master. -Kevin
On 30.09.2010 17:09, Kevin Grittner wrote: > Aidan Van Dyk<aidan@highrise.ca> wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote: > >>> I'm sure there's several things you can accomplish with >>> synchronous replication, perhaps you could describe what the >>> important use case for you is? > >> I'm looking for "data durability", not "server query-ability" > > Same here. If we used synchronous replication, the important thing > for us would be to hold up the master for the minimum time required > to ensure remote persistence -- not actual application to the remote > database. We could tolerate some WAL replay time on recovery better > than poor commit performance on the master. You do realize that to be able to guarantee zero data loss, the master will have to stop committing new transactions if the streaming stops for any reason, like a network glitch. Maybe that's a tradeoff you want, but I'm asking because that point isn't clear to many people. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > On 30.09.2010 17:09, Kevin Grittner wrote: >> Aidan Van Dyk<aidan@highrise.ca> wrote: >> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote: >> >>>> I'm sure there's several things you can accomplish with >>>> synchronous replication, perhaps you could describe what the >>>> important use case for you is? >> >>> I'm looking for "data durability", not "server query-ability" >> >> Same here. If we used synchronous replication, the important thing >> for us would be to hold up the master for the minimum time required >> to ensure remote persistence -- not actual application to the remote >> database. We could tolerate some WAL replay time on recovery better >> than poor commit performance on the master. > > You do realize that to be able to guarantee zero data loss, the master > will have to stop committing new transactions if the streaming stops > for any reason, like a network glitch. Maybe that's a tradeoff you > want, but I'm asking because that point isn't clear to many people. If there's a network glitch, it'd probably affect networked client connections as well, so it would mean no extra degration of service. -- Yeb
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > You do realize that to be able to guarantee zero data loss, the > master will have to stop committing new transactions if the > streaming stops for any reason, like a network glitch. Maybe > that's a tradeoff you want, but I'm asking because that point > isn't clear to many people. Yeah, I get that. I do think the quorum approach or some simplified special case of it would be important for us -- possibly even a requirement -- for that reason. -Kevin
On Thu, 2010-09-30 at 07:06 -0700, David Fetter wrote: > On Thu, Sep 30, 2010 at 09:52:46AM -0400, Tom Lane wrote: > > David Fetter <david@fetter.org> writes: > > > On Thu, Sep 30, 2010 at 09:14:42AM +0100, Simon Riggs wrote: > > >> I don't see anything has stalled. > > > > > I do. We're half way through this commitfest, so if no one's > > > actually ready to commit one of the patches, I kinda have to > > > bounce them both, at least to the next CF. > > > > [ raised eyebrow ] You seem to be in an awfully big hurry to bounce > > stuff. The CF end is still two weeks away. > > If people are still wrangling over the design, I'd say two weeks is > a ludicrously short time, not a long one. Yes, there is design work still to do. What purpose would be served by "bouncing" these patches? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services
On Thu, Sep 30, 2010 at 12:52 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Thu, 2010-09-30 at 07:06 -0700, David Fetter wrote: >> On Thu, Sep 30, 2010 at 09:52:46AM -0400, Tom Lane wrote: >> > David Fetter <david@fetter.org> writes: >> > > On Thu, Sep 30, 2010 at 09:14:42AM +0100, Simon Riggs wrote: >> > >> I don't see anything has stalled. >> > >> > > I do. We're half way through this commitfest, so if no one's >> > > actually ready to commit one of the patches, I kinda have to >> > > bounce them both, at least to the next CF. >> > >> > [ raised eyebrow ] You seem to be in an awfully big hurry to bounce >> > stuff. The CF end is still two weeks away. >> >> If people are still wrangling over the design, I'd say two weeks is >> a ludicrously short time, not a long one. > > Yes, there is design work still to do. > > What purpose would be served by "bouncing" these patches? None whatsoever, IMHO. That having been said, I would like to see us make some forward progress. I'm open to your ideas expressed up-thread, but I'm not sure whether they'll be sufficient to resolve the problem. Seems worth a try, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Thu, Sep 30, 2010 at 3:09 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >>> * Support multiple standbys with various synchronization levels. >> >> Not required for that case. > > IMHO at least we'll still need to support asynchronous standbys in the same > mix, that's an existing feature. My intention is to commit the core part of synchronous replication (which would be used for every use cases) at first. Then we can implement the feature for each use case. I agree that 9.1 should support asynchronous standbys in the same mix, but this seems to be extended feature rather than very core. >>> * What happens if a synchronous standby isn't connected at the moment? >>> Return immediately vs. wait forever. >> >> The wait-forever option is not required for that case. Let's implement >> the return-immediately at first. >> >> ..- >> >>> * async, recv, fsync and replay levels of synchronization. >> >> At least one of three synchronous levels should be included in the first >> commit. I think that either recv or fsync is suitable for first try >> because those don't require wake-up signaling from startup process to >> walreceiver and are relatively easy to implement. > > What is the use case for that combination? For zero data loss, you *must* > wait forever if a standby isn't connected. For keeping a hot standby server > up-to-date so that you can freely query the standby instead of the master, > you need replay level synchronization. For high availability, and zero data loss unless the disk on one of master and standby gets corrupted after the other goes down. It's the same use case that cluster with shared disk covers. I proposed to implement the "return-immediately" at first because it doesn't require standby registration. But if many people think that the "wait-forever" is the core rather than the "return-immediately", I'll follow them. We can implement the "return-immediately" after that. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Oct 01, 2010 at 07:48:25PM +0900, Fujii Masao wrote: > I proposed to implement the "return-immediately" at first because it > doesn't require standby registration. But if many people think that > the "wait-forever" is the core rather than the "return-immediately", > I'll follow them. We can implement the "return-immediately" after > that. In my experience, most people who want "synchronous" behavior are willing to put up with "wait forever," especially when asynchronous behavior is already available. In short, +1 for "push 'wait forever' soonest." Anybody who's got a Secret Base, Hidden in a Hollowed-Out Mountain, Making Grand Plans While Stroking a Long-Haired Cat[1], should please to update their public repository, or create a public repository if it doesn't already exist, and in either case keep it current. Cheers, David [1] While the Hollowed-Out Mountain trick worked back in the 60s, it's gotten a little trite. The cool kids are keeping things pretty public these days when they plan to go public. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Fujii Masao <masao.fujii@gmail.com> writes: > I proposed to implement the "return-immediately" at first because it doesn't > require standby registration. But if many people think that the "wait-forever" > is the core rather than the "return-immediately", I'll follow them. We can > implement the "return-immediately" after that. Wait forever can be done without standby registration, with quorum commit. -- dim
On 09/30/2010 10:52 PM, Tom Lane wrote: > IMHO > we should push out the end date by at least a week to reflect the lack > of time spent on the CF so far. I agree that we should postpone the end of the CF by one week to deal with the distractions people have had. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
> What we actually need to do is talk and listen. I'd like to suggest that > we have an online "focus day" (onlist) on Sync Rep on Oct 5 and maybe 6 > as well?. Meeting in person is possible, but probably impractical. But a > design sprint, not a code sprint. I'd suggest something even simpler: (1) Create a wiki page which lists all of the design descisions we need to make in order to finish the specification for synch rep. (2) Link each item to any prior discussion we've had about the item. (3) Invite people to comment on the wiki by leaving per-item comments and suggestions with their own names. I believe that right now only a handful of people (Simon, Heikki, Fujii, Zoltan) are really acquainted with all of the decisions which need to be made. No wonder the rest of us fly off on minutia like file formats; we really have no sense of scope. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Hi, On 10/03/2010 05:52 AM, Josh Berkus wrote: > (3) Invite people to comment on the wiki by leaving per-item comments > and suggestions with their own names. Please keep discussions on the mailing list. On Wikis, those are very hard to follow (Date or From missing, no offline capabilities, indirect notification, etc..) I like Simon's suggestion, but thought of something *more* direct (maybe IRC), not less (like Wikis). > I believe that right now only a handful of people (Simon, Heikki, Fujii, > Zoltan) are really acquainted with all of the decisions which need to be > made. I at least try to follow. And I actually think we had quite some DBA inputs as well. Regards Markus
On 09/30/2010 04:54 PM, Yeb Havinga wrote: > Heikki Linnakangas wrote: >> You do realize that to be able to guarantee zero data loss, the master >> will have to stop committing new transactions if the streaming stops >> for any reason, like a network glitch. Maybe that's a tradeoff you >> want, but I'm asking because that point isn't clear to many people. > If there's a network glitch, it'd probably affect networked client > connections as well, so it would mean no extra degration of service. Agreed. I think the network glitch example is too general, it could affect any part of the whole network. Even just the connection between the master and the standby, in which case all client connections would keep up. Let's quickly think about that scenario. AFAIU in such a case, the standby would continue to answer read-only queries, independent of what the master does, right? Or does the standby stop processing read-only queries in case it looses connection to the master? It seems to me the later is required, if we let the master continue to commit transactions. Otherwise the standby would serve stale data to its clients without knowing. Given that scenario, I'd clearly favor a master that stops committing new transactions, but allow both (i.e. master and standbies) to continue answering read-only queries. Regards Markus Wanner
On 10/01/2010 05:06 PM, Dimitri Fontaine wrote: > Wait forever can be done without standby registration, with quorum commit. Yeah, I also think the only reason for standby registration is ease of configuration (if at all). There's no technical requirement for standby registration, AFAICS. Or does anybody know of a realistic use case that's possible with standby registration, but not with quorum commit? Regards Markus Wanner
On 04.10.2010 10:03, Markus Wanner wrote: > On 09/30/2010 04:54 PM, Yeb Havinga wrote: >> Heikki Linnakangas wrote: >>> You do realize that to be able to guarantee zero data loss, the master >>> will have to stop committing new transactions if the streaming stops >>> for any reason, like a network glitch. Maybe that's a tradeoff you >>> want, but I'm asking because that point isn't clear to many people. >> If there's a network glitch, it'd probably affect networked client >> connections as well, so it would mean no extra degration of service. > > Agreed. > > I think the network glitch example is too general, it could affect any > part of the whole network. Even just the connection between the master > and the standby, in which case all client connections would keep up. > > Let's quickly think about that scenario. AFAIU in such a case, the > standby would continue to answer read-only queries, independent of what > the master does, right? Right. > Or does the standby stop processing read-only > queries in case it looses connection to the master? As far as the current proposals go, no. > It seems to me the later is required, if we let the master continue to > commit transactions. Otherwise the standby would serve stale data to its > clients without knowing. Yep. If you want to guarantee that a hot standby doesn't return stale data, if the connection is lost you need to either stop processing read-only queries in the standby, or stop processing commits in the master. Note that this assumes that you use the 'replay' synchronization level. In the weaker levels, read-only queries can always return stale data. With 'replay' and hot standby combination, you'll want to set max_standby_archive_delay to a very low value, or a read-only query can cause master to stop processing commits (or the standby to stop accepting new queries, if that's preferred). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 10/04/2010 09:18 AM, Heikki Linnakangas wrote: > Note that this assumes that you use the 'replay' synchronization level. > In the weaker levels, read-only queries can always return stale data. I'm not too found of those various synchronization levels, but IIUC all other levels only allow a rather limited staleness. But a master that's continuing to commit new transactions with a disconnected standby that happily continues to answer read-only queries, the age of the standby's snapshot can grow without limitation. > With 'replay' and hot standby combination, you'll want to set > max_standby_archive_delay to a very low value, or a read-only query can > cause master to stop processing commits (or the standby to stop > accepting new queries, if that's preferred). Well, given that DML-only transactions aren't prone such to conflicts, I think of this as a corner case. Also note, that this requirement seems to apply whether we wait forever on standby failure or not. (Because even if we don't, there must be some kind of timeout on the master from the very first suspicion to actually declare the standby dead - anything else is called anync). Regards Markus Wanner
On 04.10.2010 10:49, Markus Wanner wrote: > On 10/04/2010 09:18 AM, Heikki Linnakangas wrote: >> With 'replay' and hot standby combination, you'll want to set >> max_standby_archive_delay to a very low value, or a read-only query can >> cause master to stop processing commits (or the standby to stop >> accepting new queries, if that's preferred). > > Well, given that DML-only transactions aren't prone such to conflicts, I > think of this as a corner case. Yes they are. Any DML operation, and even read-only queries IIRC, can trigger HOT pruning, which can conflict with a read-only query in a hot standby. And then there's autovacuum which can cause conflicts in the standby, even if no user transactions are running in the master. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Oct 1, 2010 at 11:16 PM, David Fetter <david@fetter.org> wrote: > On Fri, Oct 01, 2010 at 07:48:25PM +0900, Fujii Masao wrote: >> I proposed to implement the "return-immediately" at first because it >> doesn't require standby registration. But if many people think that >> the "wait-forever" is the core rather than the "return-immediately", >> I'll follow them. We can implement the "return-immediately" after >> that. > > In my experience, most people who want "synchronous" behavior are > willing to put up with "wait forever," especially when asynchronous > behavior is already available. > > In short, +1 for "push 'wait forever' soonest." I have one question for clarity: If we make all the transactions wait until specified standbys have connected to the master, how do we take a base backup from the master for those standbys? We seem to be unable to do that because pg_start_backup also waits forever. Is this right? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Oct 4, 2010 at 10:22 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > I have one question for clarity: > > If we make all the transactions wait until specified standbys have > connected to the master, how do we take a base backup from the > master for those standbys? We seem to be unable to do that because > pg_start_backup also waits forever. Is this right? Well, in my *opinion*, if you've told the master to not "commit to" *anything* unless it's synchronously replicated, you should already have a synchronously replicating slave up and running. I'm happy with the docs saying (maybe some what more politely): Before configuring your master to be completly, wait-fully-synchronous, make sure you have a slave capable of being synchronous ready. Because if you've told it to never be un-synchronous, it won't be.
On Tue, Oct 5, 2010 at 2:06 AM, Aidan Van Dyk <aidan@highrise.ca> wrote: > On Mon, Oct 4, 2010 at 10:22 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > >> I have one question for clarity: >> >> If we make all the transactions wait until specified standbys have >> connected to the master, how do we take a base backup from the >> master for those standbys? We seem to be unable to do that because >> pg_start_backup also waits forever. Is this right? > > Well, in my *opinion*, if you've told the master to not "commit to" > *anything* unless it's synchronously replicated, you should already > have a synchronously replicating slave up and running. > > I'm happy with the docs saying (maybe some what more politely): > Before configuring your master to be completly, > wait-fully-synchronous, make sure you have a slave capable of being > synchronous ready. Because if you've told it to never be > un-synchronous, it won't be. How can we take a base backup for that synchronous standby? You mean that we should disable the wait-forever option, start the master, take a base backup, shut down the master, enable the wait-forever option, start the master, and start the standby from that base backup? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao <masao.fujii@gmail.com> writes: > On Tue, Oct 5, 2010 at 2:06 AM, Aidan Van Dyk <aidan@highrise.ca> wrote: >> I'm happy with the docs saying (maybe some what more politely): >> �Before configuring your master to be completly, >> wait-fully-synchronous, make sure you have a slave capable of being >> synchronous ready. �Because if you've told it to never be >> un-synchronous, it won't be. > How can we take a base backup for that synchronous standby? You mean > that we should disable the wait-forever option, start the master, take > a base backup, shut down the master, enable the wait-forever option, > start the master, and start the standby from that base backup? I think the point here is that it's possible to have sync-rep configurations in which it's impossible to take a base backup. That doesn't seem to me to be unacceptable in itself. What *is* unacceptable is to be unable to change the configuration to another state in which you could take a base backup. Which is why "keep the config in a system catalog" doesn't work. regards, tom lane
On 04.10.2010 17:22, Fujii Masao wrote: > If we make all the transactions wait until specified standbys have > connected to the master, how do we take a base backup from the > master for those standbys? We seem to be unable to do that because > pg_start_backup also waits forever. Is this right? Hmm, pg_start_backup() writes WAL, but it doesn't commit. Only a commit needs to wait for acknowledgment from the standby, so 'wait forever' behavior doesn't necessarily mean that you can't take a base backup. If you run it outside a transaction you get an implicit commit, though, which will wait, so you might need to do something odd like "begin; select pg_start_backup(); rollback". But I agree with Tom that as long as it's possible to change the configuration on the fly, it's not a show-stopper if you can't take a new base backup while the standby is disconnected. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Oct 5, 2010 at 5:49 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 04.10.2010 17:22, Fujii Masao wrote: >> >> If we make all the transactions wait until specified standbys have >> connected to the master, how do we take a base backup from the >> master for those standbys? We seem to be unable to do that because >> pg_start_backup also waits forever. Is this right? > > Hmm, pg_start_backup() writes WAL, but it doesn't commit. Only a commit > needs to wait for acknowledgment from the standby, so 'wait forever' > behavior doesn't necessarily mean that you can't take a base backup. If you > run it outside a transaction you get an implicit commit, though, which will > wait, so you might need to do something odd like "begin; select > pg_start_backup(); rollback". Yep. Similarly, we would need to enclose also pg_stop_backup with begin and rollback. I have another question: when should the waiting transactions resume? It's a moment the standby has connected to the master? It's a moment the standby has caught up with the master? For no data loss, the latter seems to be required. Right? The third question: if the WAL file is unfortunately recycled when a transaction waits for that WAL file to be shipped forever, how should that transaction behave? Still waiting? Cause PANIC? Give up waiting? For no data loss, ISTM that the second should be chosen. Right? This can happen because we can write WAL to the master without waiting for replication by enclosing a query with begin and rollback, even if all the transaction *commit* are waiting for replication forever. > But I agree with Tom that as long as it's possible to change the > configuration on the fly, it's not a show-stopper if you can't take a new > base backup while the standby is disconnected. Yep. If people who want the "wait-forever" can live with such an odd backup procedure, I have no objection to implement that. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 05.10.2010 12:47, Fujii Masao wrote: > I have another question: when should the waiting transactions resume? > It's a moment the standby has connected to the master? It's a moment > the standby has caught up with the master? For no data loss, the > latter seems to be required. Right? Yep. > The third question: if the WAL file is unfortunately recycled when a > transaction waits for that WAL file to be shipped forever, how should > that transaction behave? Still waiting? Cause PANIC? Give up waiting? > For no data loss, ISTM that the second should be chosen. Right? Right, it should keep waiting. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2010-10-05 at 18:47 +0900, Fujii Masao wrote: > On Tue, Oct 5, 2010 at 5:49 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > > On 04.10.2010 17:22, Fujii Masao wrote: > >> > >> If we make all the transactions wait until specified standbys have > >> connected to the master, how do we take a base backup from the > >> master for those standbys? We seem to be unable to do that because > >> pg_start_backup also waits forever. Is this right? > > > > Hmm, pg_start_backup() writes WAL, but it doesn't commit. Only a commit > > needs to wait for acknowledgment from the standby, so 'wait forever' > > behavior doesn't necessarily mean that you can't take a base backup. If you > > run it outside a transaction you get an implicit commit, though, which will > > wait, so you might need to do something odd like "begin; select > > pg_start_backup(); rollback". > > Yep. Similarly, we would need to enclose also pg_stop_backup with begin > and rollback. Presumably we will have an option to *not* wait forever? So we would be able to set the option prior to running the base backup? So there isn't any need to do this rollback trick suggested. pg_start_backup() and pg_stop_backup() have two use cases: 1) ensuring both are sent through to the standby would make it very easy to allow backups from the standby. 2) make sure we don't wait, so we can take a base backup at any time So there's no argument here to prevent it being in a table. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services
On Fri, 2010-10-01 at 07:16 -0700, David Fetter wrote: > On Fri, Oct 01, 2010 at 07:48:25PM +0900, Fujii Masao wrote: > > I proposed to implement the "return-immediately" at first because it > > doesn't require standby registration. But if many people think that > > the "wait-forever" is the core rather than the "return-immediately", > > I'll follow them. We can implement the "return-immediately" after > > that. > > In my experience, most people who want "synchronous" behavior are > willing to put up with "wait forever," especially when asynchronous > behavior is already available. > > In short, +1 for "push 'wait forever' soonest." > > Anybody who's got a Secret Base, Hidden in a Hollowed-Out Mountain, > Making Grand Plans While Stroking a Long-Haired Cat[1], should please > to update their public repository, or create a public repository if it > doesn't already exist, and in either case keep it current. You've long held the belief that I code in secret and don't reveal my code to people. Not really sure why, since I've contributed so much, so openly. Strange. I am trying to establish a sensible design based upon public discussion. I'm not working on any code currently; my understanding was that we would discuss what we were going to do and only then do it. I *could* add automatic registration or many other features to my patch. Doing so would take hours or days. How would that help us decide what to do? I'm not treating this as a race between people's patches; is it a race? Or is it a discussion and move forwards by mutual agreement towards something sensible? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services
On Fri, 2010-10-01 at 19:48 +0900, Fujii Masao wrote: > My intention is to commit the core part of synchronous replication (which would > be used for every use cases) at first. Then we can implement the > feature for each > use case. I completely agree that we should commit the core part of sync rep, but the question is: what is that? We both have equally valid "cores". > I agree that 9.1 should support asynchronous standbys in the same mix, but this > seems to be extended feature rather than very core. That is trivial, so no need to exclude that. > I proposed to implement the "return-immediately" at first because it doesn't > require standby registration. But if many people think that the "wait-forever" > is the core rather than the "return-immediately", I'll follow them. We can > implement the "return-immediately" after that. I think its fair to say that many people don't like the specific form of standby registration that has been proposed. I really don't mind if it exists as an option, but it looks way too complex to me to manage for realistic systems. Wait-forever needs to be an option. Nobody actually will wait forever, so if people select it, they will need some form of clusterware to control it and I don't want to see people forced to use clusterware. If people do choose wait-forever, then we could also do standby registration automatically, to give them something to wait for. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services
On Mon, Oct 4, 2010 at 11:48 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > How can we take a base backup for that synchronous standby? You mean > that we should disable the wait-forever option, start the master, take > a base backup, shut down the master, enable the wait-forever option, > start the master, and start the standby from that base backup? All I'm saying is that *after* you've configured that everything must be synchronous is *not* the time to start trying to figure out if your PITR backups/archive are working, and starting to try and get a slave replicating synchronously. Yes, High-Durability sync rep has caveats. One of them is that you must have a working synchronous slave before you can enforce synchronousity. a.
On Tue, Oct 5, 2010 at 8:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Presumably we will have an option to *not* wait forever? So we would be > able to set the option prior to running the base backup? So there isn't > any need to do this rollback trick suggested. At the initial setup of the standby, we can easily disable wait-forever option and take a base backup. I'm concerned about the case where the standby goes down while replication is working. ISTM that we cannot easily disable wait-forever option for backup because that disablement resumes the waiting transactions. In this case, we would need to issue rollback. Or we seem to need to shut down the master, take a cold backup, start the master and start the standby from that cold backup. Though I'm not sure if this is really right procedure.. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Tom Lane <tgl@sss.pgh.pa.us> writes: > I think the point here is that it's possible to have sync-rep > configurations in which it's impossible to take a base backup. Sorry to be slow. I still don't understand that problem. I can understand why people want "wait forever", but I can't understand when the following strange idea apply: consider my non-ready standby there as a full member of the distributed setup already. I've been making plenty of noise about this topic in the past, at the beginning of plans for SR in 9.0 IIRC, pushing Heikki into having a worked out state machine to figure out what are the known states of a standby and what we can do with each. We've cancelled that and said it would maybe necessary for Synchronous Replication. Here we go, right? So, first thing first, when is it a good idea to consider a standby that's not yet had its base backup, let alone validated that after taking it the master still has enough WAL for the backup to be valid as far as initialising the slave goes, to consider this broken standby as someone we wait forever on? I say a standby is registered when it's currently "attached" and already able to keep up in async. That's a time when you can slow down the master until this new member catches up to full sync or whatever you've setup. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support Lack of google and archives-fu today means no link to those mails. Yet…
Heikki Linnakangas wrote: > On 04.10.2010 10:49, Markus Wanner wrote: > > On 10/04/2010 09:18 AM, Heikki Linnakangas wrote: > >> With 'replay' and hot standby combination, you'll want to set > >> max_standby_archive_delay to a very low value, or a read-only query can > >> cause master to stop processing commits (or the standby to stop > >> accepting new queries, if that's preferred). > > > > Well, given that DML-only transactions aren't prone such to conflicts, I > > think of this as a corner case. > > Yes they are. Any DML operation, and even read-only queries IIRC, can > trigger HOT pruning, which can conflict with a read-only query in a hot > standby. And then there's autovacuum which can cause conflicts in the > standby, even if no user transactions are running in the master. I can confirm that SELECT can trigger HOT pruning, based on research for my PG West MVCC talk. Anything that does a tuple lookup can cause it --- INSERT VALUES does not. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +