Thread: pgsql: Use a latch to make startup process wake up and replay
pgsql: Use a latch to make startup process wake up and replay
From
heikki@postgresql.org (Heikki Linnakangas)
Date:
Log Message: ----------- Use a latch to make startup process wake up and replay immediately when new WAL arrives via streaming replication. This reduces the latency, and also allows us to use a longer polling interval, which is good for energy efficiency. We still need to poll to check for the appearance of a trigger file, but the interval is now 5 seconds (instead of 100ms), like when waiting for a new WAL segment to appear in WAL archive. Modified Files: -------------- pgsql/src/backend/access/transam: xlog.c (r1.434 -> r1.435) (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xlog.c?r1=1.434&r2=1.435) pgsql/src/backend/replication: walreceiver.c (r1.16 -> r1.17) (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/replication/walreceiver.c?r1=1.16&r2=1.17) pgsql/src/include/access: xlog.h (r1.116 -> r1.117) (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/access/xlog.h?r1=1.116&r2=1.117)
On Wed, Sep 15, 2010 at 7:35 PM, Heikki Linnakangas <heikki@postgresql.org> wrote: > Log Message: > ----------- > Use a latch to make startup process wake up and replay immediately when > new WAL arrives via streaming replication. This reduces the latency, and > also allows us to use a longer polling interval, which is good for energy > efficiency. > > We still need to poll to check for the appearance of a trigger file, but > the interval is now 5 seconds (instead of 100ms), like when waiting for > a new WAL segment to appear in WAL archive. Good work! + /* + * Take ownership of the wakup latch if we're going to sleep during + * recovery. + */ I found a typo: s/wakup/wakeup/ Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 15 September 2010 11:35, Heikki Linnakangas <heikki@postgresql.org> wrote: > Log Message: > ----------- > Use a latch to make startup process wake up and replay immediately when > new WAL arrives via streaming replication. This reduces the latency, and > also allows us to use a longer polling interval, which is good for energy > efficiency. > > We still need to poll to check for the appearance of a trigger file, but > the interval is now 5 seconds (instead of 100ms), like when waiting for > a new WAL segment to appear in WAL archive. > > Modified Files: > -------------- > pgsql/src/backend/access/transam: > xlog.c (r1.434 -> r1.435) > (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xlog.c?r1=1.434&r2=1.435) > pgsql/src/backend/replication: > walreceiver.c (r1.16 -> r1.17) > (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/replication/walreceiver.c?r1=1.16&r2=1.17) > pgsql/src/include/access: > xlog.h (r1.116 -> r1.117) > (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/access/xlog.h?r1=1.116&r2=1.117) > > -- + * We don't need the latch anymore. It's not strictly necessary to disown + * it, but let's do it for the sake of tidyness. + */ s/tidyness/tidiness/ -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
On 15/09/10 14:14, Fujii Masao wrote: > + /* > + * Take ownership of the wakup latch if we're going to sleep during > + * recovery. > + */ > > I found a typo: s/wakup/wakeup/ > On 15/09/10 14:48, Thom Brown wrote: > + * We don't need the latch anymore. It's not strictly necessary to disown > + * it, but let's do it for the sake of tidyness. > + */ > > s/tidyness/tidiness/ Fixed both of these. Thank you. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2010-09-15 at 20:14 +0900, Fujii Masao wrote: > On Wed, Sep 15, 2010 at 7:35 PM, Heikki Linnakangas > <heikki@postgresql.org> wrote: > > Log Message: > > ----------- > > Use a latch to make startup process wake up and replay immediately when > > new WAL arrives via streaming replication. This reduces the latency, and > > also allows us to use a longer polling interval, which is good for energy > > efficiency. > > > > We still need to poll to check for the appearance of a trigger file, but > > the interval is now 5 seconds (instead of 100ms), like when waiting for > > a new WAL segment to appear in WAL archive. > > Good work! No, not good work. You both know very well that I'm working on this area also and these commits are not agreed... yet. They might not be contended but they are very likely to break my patch, again. Please desist while we resolve which are the good ideas and which are not. We won't know that if you keep breaking other people's patches in a stream of commits that prevent anybody completing other options. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services
Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay
From
David Fetter
Date:
On Wed, Sep 15, 2010 at 03:35:30PM +0100, Simon Riggs wrote: > On Wed, 2010-09-15 at 20:14 +0900, Fujii Masao wrote: > > On Wed, Sep 15, 2010 at 7:35 PM, Heikki Linnakangas > > <heikki@postgresql.org> wrote: > > > Log Message: > > > ----------- > > > Use a latch to make startup process wake up and replay immediately when > > > new WAL arrives via streaming replication. This reduces the latency, and > > > also allows us to use a longer polling interval, which is good for energy > > > efficiency. > > > > > > We still need to poll to check for the appearance of a trigger file, but > > > the interval is now 5 seconds (instead of 100ms), like when waiting for > > > a new WAL segment to appear in WAL archive. > > > > Good work! > > No, not good work. > > You both know very well that I'm working on this area also and these > commits are not agreed... yet. They might not be contended but they are > very likely to break my patch, again. > > Please desist while we resolve which are the good ideas and which are > not. We won't know that if you keep breaking other people's patches in a > stream of commits that prevent anybody completing other options. Simon, No matter how many times you try, you are not going to get a license to stop all work on anything you might chance to think about. It is quite simply never going to happen, so you need to back off. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay
From
Simon Riggs
Date:
On Wed, 2010-09-15 at 07:59 -0700, David Fetter wrote: > On Wed, Sep 15, 2010 at 03:35:30PM +0100, Simon Riggs wrote: > > Please desist while we resolve which are the good ideas and which are > > not. We won't know that if you keep breaking other people's patches in a > > stream of commits that prevent anybody completing other options. > No matter how many times you try, you are not going to get a license > to stop all work on anything you might chance to think about. It is > quite simply never going to happen, so you need to back off. I agree that asking people to stop work is not OK. However, I haven't asked for development work to stop, only that commits into that area stop until proper debate has taken place. Those might be minor commits, but they might not. Had I made those commits, they would have been called premature by others also. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On Wed, 2010-09-15 at 20:14 +0900, Fujii Masao wrote: >> Good work! > No, not good work. > You both know very well that I'm working on this area also and these > commits are not agreed... yet. They might not be contended but they are > very likely to break my patch, again. Simon, you can't just command the tide to stop coming in while you do something that no one else can see. This patch is perfectly reasonable, not at all invasive, and a clear improvement over the way the code was before. If you can improve on it further, fine, but we're not going to declare the code off-limits while waiting for an unspecified patch with no firm delivery date. regards, tom lane
Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay
From
Robert Haas
Date:
On Wed, Sep 15, 2010 at 11:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > I agree that asking people to stop work is not OK. However, I haven't > asked for development work to stop, only that commits into that area > stop until proper debate has taken place. Those might be minor commits, > but they might not. Had I made those commits, they would have been > called premature by others also. I do not believe that Heikki has done anything inappropriate. We've spent weeks discussing the latch facility and its various applications. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay
From
Simon Riggs
Date:
On Wed, 2010-09-15 at 12:45 -0400, Robert Haas wrote: > On Wed, Sep 15, 2010 at 11:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > I agree that asking people to stop work is not OK. However, I haven't > > asked for development work to stop, only that commits into that area > > stop until proper debate has taken place. Those might be minor commits, > > but they might not. Had I made those commits, they would have been > > called premature by others also. > > I do not believe that Heikki has done anything inappropriate. We've > spent weeks discussing the latch facility and its various > applications. Sounds reasonable, but my comments were about this commit, not the one that happened on Saturday. This patch was posted about 32 hours ago, and the commit need not have taken place yet. If I had posted such a patch and committed it knowing other work is happening in that area we both know that you would have objected. It's not actually a major issue, but at some point I have to ask for no more commits, so Fujii and I can finish our patches, compare and contrast, so the best ideas can get into Postgres. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services
Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay
From
Robert Haas
Date:
On Wed, Sep 15, 2010 at 1:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, 2010-09-15 at 12:45 -0400, Robert Haas wrote: >> On Wed, Sep 15, 2010 at 11:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> > I agree that asking people to stop work is not OK. However, I haven't >> > asked for development work to stop, only that commits into that area >> > stop until proper debate has taken place. Those might be minor commits, >> > but they might not. Had I made those commits, they would have been >> > called premature by others also. >> >> I do not believe that Heikki has done anything inappropriate. We've >> spent weeks discussing the latch facility and its various >> applications. > > Sounds reasonable, but my comments were about this commit, not the one > that happened on Saturday. This patch was posted about 32 hours ago, and > the commit need not have taken place yet. If I had posted such a patch > and committed it knowing other work is happening in that area we both > know that you would have objected. I've often felt that we ought to have a bit more delay between when committers post patches and when they commit them. I was told 24 hours and I've seen cases where people haven't even waited that long. On the other hand, if we get to strict about it, it can easily get to the point where it just gets in the way of progress, and certainly some patches are far more controversial than others. So I don't know what the best thing to do is. Still, I have to admit that I feel fairly positive about the direction we're going with this particular patch. Clearing away these peripheral issues should make it easier for us to have a rational discussion about the core issues around how this is going to be configured and actually work at the protocol level. > It's not actually a major issue, but at some point I have to ask for no > more commits, so Fujii and I can finish our patches, compare and > contrast, so the best ideas can get into Postgres. I don't think anyone is prepared to agree to that. I think that everyone is prepared to accept a limited amount of further delay in pressing forward with the main part of sync rep, but I expect that no one will be willing to freeze out incremental improvements in the meantime, even if it does induce a certain amount of rebasing. It's also worth noting that Fujii Masao's patch has been around for months, and yours isn't finished yet. That's not to say that we don't want to consider your ideas, because we do: and you've had more than your share of good ones. At the same time, it would be unfair and unreasonable to expect work on a patch that is done, and has been done for some time, to wait on one that isn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay
From
Heikki Linnakangas
Date:
On 15/09/10 20:58, Robert Haas wrote: > On Wed, Sep 15, 2010 at 1:30 PM, Simon Riggs<simon@2ndquadrant.com> wrote: >> On Wed, 2010-09-15 at 12:45 -0400, Robert Haas wrote: >>> On Wed, Sep 15, 2010 at 11:24 AM, Simon Riggs<simon@2ndquadrant.com> wrote: >>>> I agree that asking people to stop work is not OK. However, I haven't >>>> asked for development work to stop, only that commits into that area >>>> stop until proper debate has taken place. Those might be minor commits, >>>> but they might not. Had I made those commits, they would have been >>>> called premature by others also. >>> >>> I do not believe that Heikki has done anything inappropriate. We've >>> spent weeks discussing the latch facility and its various >>> applications. >> >> Sounds reasonable, but my comments were about this commit, not the one >> that happened on Saturday. This patch was posted about 32 hours ago, and >> the commit need not have taken place yet. If I had posted such a patch >> and committed it knowing other work is happening in that area we both >> know that you would have objected. > > I've often felt that we ought to have a bit more delay between when > committers post patches and when they commit them. I was told 24 > hours and I've seen cases where people haven't even waited that long. > On the other hand, if we get to strict about it, it can easily get to > the point where it just gets in the way of progress, and certainly > some patches are far more controversial than others. So I don't know > what the best thing to do is. With anything non-trivial, I try to "sleep on it" before committing. More with complicated patches, but it's really up to your own comfort level with the patch, and whether you think anyone might have different opinions on it. I don't mind quick commits if it's something that has been discussed in the past and the committer thinks it's non-controversial. There's always the option of complaining afterwards. If it comes to that, though, it wasn't really ripe for committing yet. (That doesn't apply to gripes about typos or something like that, because that happens to me way too often ;-) ) > Still, I have to admit that I feel > fairly positive about the direction we're going with this particular > patch. Clearing away these peripheral issues should make it easier > for us to have a rational discussion about the core issues around how > this is going to be configured and actually work at the protocol > level. Yeah, I don't think anyone has any qualms about the substance of these patches. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay
From
Simon Riggs
Date:
On Wed, 2010-09-15 at 13:58 -0400, Robert Haas wrote: > > It's not actually a major issue, but at some point I have to ask for > no > > more commits, so Fujii and I can finish our patches, compare and > > contrast, so the best ideas can get into Postgres. > > I don't think anyone is prepared to agree to that. I think that > everyone is prepared to accept a limited amount of further delay in > pressing forward with the main part of sync rep, but I expect that no > one will be willing to freeze out incremental improvements in the > meantime, even if it does induce a certain amount of rebasing. > It's > also worth noting that Fujii Masao's patch has been around for months, > and yours isn't finished yet. That's not to say that we don't want to > consider your ideas, because we do: and you've had more than your > share of good ones. At the same time, it would be unfair and > unreasonable to expect work on a patch that is done, and has been done > for some time, to wait on one that isn't. I understand your viewpoint there. I'm sure we all agree sync rep is a very important feature that must get into the next release. The only reason my patch exists is because debate around my ideas was ruled out on various grounds. One of those was it would take so long to develop we shouldn't risk not getting sync rep in this release. I am amenable to such arguments (and I make the same one on MERGE, btw, where I am getting seriously worried) but the reality is that there is actually very little code here and we can definitely do this, whatever ideas we pick. I've shown this by providing an almost working version in about 4 days work. Will finishing it help? We definitely have the time, so the question is, what are the best ideas? We must discuss the ideas properly, not just plough forwards claiming time pressure when it isn't actually an issue at all. We *need* to put the tools down and talk in detail about the best way forwards. Before, I had no patch. Now mine "isn't finished". At what point will my ideas be reviewed without instant dismissal? If we accept your seniority argument, then "never" because even if I finish it you'll say "Fujii was there first". If who mentioned it first was important, then I'd say I've been discussing this for literally years (late 2006) and have regularly explained the benefits of the master-side approach I've outlined on list every time this has come up (every few months). I have also explained the implementation details many times as well an I'm happy to say that latches are pretty much exactly what I described earlier. (I called them LSN queues, similar to lwlocks, IIRC). But thats not the whole deal. If we simply wanted a patch that was "done" we would have gone with Zoltan's wouldn't we, based on the seniority argument you use above? Zoltan's patch didn't perform well at all. Fujii's performs much better. However, my proposed approach offers even better performance, so whatever argument you use to include Fujii's also applies to mine doesn't it? But that's silly and divisive, its not about who's patch "wins" is it? Do we have to benchmark multiple patches to prove which is best? If that's the criteria I'll finish my patch and demonstrate that. But it doesn't make sense to start committing pieces of Fujii's patch, so that I can't ever keep up and as a result "Simon never finished his patch, but it sounded good". Next steps should be: tools down, discuss what to do. Then go forwards. We have time, so lets discuss all of the ideas on the table not just some of them. For me this is not about the number or names of parameters, its about master-side control of sync rep and having very good performance. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services
Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay
From
Robert Haas
Date:
On Wed, Sep 15, 2010 at 3:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Will finishing it help? Yes, I expect that to help a lot. > Before, I had no patch. Now mine "isn't finished". At what point will my > ideas be reviewed without instant dismissal? If we accept your seniority > argument, then "never" because even if I finish it you'll say "Fujii was > there first". I said very clearly in my previous email that "I think that everyone is prepared to accept a limited amount of further delay in pressing forward with the main part of sync rep". In other words, I think everyone is willing to consider your ideas provided that they are submitted in a form which everyone can understand and think through sometime soon. I am not, nor do I think anyone is, saying that we don't wish to consider your ideas. I'm actually really pleased that you are only a day or two from having a working patch. It can be much easier to conceptualize a patch than to find the time to finish it (unfortunately, this problem has overtaken me rather badly in the last few weeks, which is why I have no new patches in this CommitFest) and if you can finish it up and get it out in front of everyone I expect that to be a good thing for this feature and our community. > Do we have to benchmark multiple patches to prove which is best? If > that's the criteria I'll finish my patch and demonstrate that. I was thinking about that earlier today. I think it's definitely possible that we'll need to do some benchmarking, although I expect that people will want to read the code first. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay
From
Fujii Masao
Date:
On Thu, Sep 16, 2010 at 4:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > We definitely have the time, so the question is, what are the best > ideas? Before advancing the review of each patch, we must determine what should be committed in 9.1, and what's in this CF. "Synchronization level on per-transaction" feature is included in Simon's patch, but not in mine. This is most important difference, which would have wide-reaching impact on the implementation, e.g., protocol between walsender and walreceiver. So, at first we should determine whether we'll commit the feature in 9.1. Then we need to determine how far we should implement in this CF. Thought? Each patch provides "synchronization level on per-standby" feature. In Simon's patch, that level is specified in the standbys's recovery.conf. In mine, it's in the master's standbys.conf. I think that the former is simpler. But if we support the capability to register the standbys, the latter would be required. Which is the best? Simon's patch seems to include simple quorum commit feature (correct me if I'm wrong). That is, when there are multiple synchronous standbys, the master waits until ACK has arrived from at least one standby. OTOH, in my patch, the master waits until ACK has arrived from all the synchronous standbys. Which should we choose? I think that we should commit my straightforward approach first, and enable the quorum commit on that. Thought? Simon proposes to invoke walwriter in the standby. This is not included in my patch, but looks good idea. ISTM that this is not essential feature for synchronous replication, so how about detachmenting of the walwriter part from the patch and reviewing it independently? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center