Thread: Replication server timeout patch
Hello list, I split this out of the synchronous replication patch for independent review. I'm dashing out the door, so I haven't put it on the CF yet or anything, but I just wanted to get it out there...I'll be around in Not Too Long to finish any other details. -- fdr
Attachment
On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina <drfarina@acm.org> wrote: > I split this out of the synchronous replication patch for independent > review. I'm dashing out the door, so I haven't put it on the CF yet or > anything, but I just wanted to get it out there...I'll be around in > Not Too Long to finish any other details. This looks like a useful and separately committable change. However, it looks to me like this renders wal_sender_delay aka WalSndDelay completely unused. If we don't need that GUC any more, we should rip it out completely. The comment in WalSndHandshake should have a tab at the beginning of every line. Right now the first line has a tab and the rest have spaces. The first hunk in WalSndLoop is a meaningless whitespace change. I wonder if we ought to just call this replication_timeout, rather than replication_timeout_server. Simon's patch (from which this extracted) also has replication_timeout_client, but the two aren't symmetrical. The replication_timeout_client in this patch is the amount of time after which the master acknowledges the commit even though the synchronous standby hasn't acked yet. So it only applies to the synchronous replication case, whereas this is useful for both synchronous and asynchronous replication. I'm inclined to think that knob is utterly useless anyway; surely waiting more than zero will reduce the throughput of the system to some minute fraction of its normal value, while waiting less than infinity throws out the data guarantee that made you pick synchronous replication in the first place. Even if we do decide to keep that knob, I don't think we'll want the value to be symmetric with this one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11.02.2011 22:11, Robert Haas wrote: > On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina<drfarina@acm.org> wrote: >> I split this out of the synchronous replication patch for independent >> review. I'm dashing out the door, so I haven't put it on the CF yet or >> anything, but I just wanted to get it out there...I'll be around in >> Not Too Long to finish any other details. > > This looks like a useful and separately committable change. Hmm, so this patch implements a watchdog, where the master disconnects the standby if the heartbeat from the standby stops for more than 'replication_[server]_timeout' seconds. The standby sends the heartbeat every wal_receiver_status_interval seconds. It would be nice if the master and standby could negotiate those settings. As the patch stands, it's easy to have a pathological configuration where replication_server_timeout < wal_receiver_status_interval, so that the master repeatedly disconnects the standby because it doesn't reply in time. Maybe the standby should report how often it's going to send a heartbeat, and master should wait for that long + some safety margin. Or maybe the master should tell the standby how often it should send the heartbeat? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Feb 11, 2011 at 4:30 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 11.02.2011 22:11, Robert Haas wrote: >> >> On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina<drfarina@acm.org> wrote: >>> >>> I split this out of the synchronous replication patch for independent >>> review. I'm dashing out the door, so I haven't put it on the CF yet or >>> anything, but I just wanted to get it out there...I'll be around in >>> Not Too Long to finish any other details. >> >> This looks like a useful and separately committable change. > > Hmm, so this patch implements a watchdog, where the master disconnects the > standby if the heartbeat from the standby stops for more than > 'replication_[server]_timeout' seconds. The standby sends the heartbeat > every wal_receiver_status_interval seconds. > > It would be nice if the master and standby could negotiate those settings. > As the patch stands, it's easy to have a pathological configuration where > replication_server_timeout < wal_receiver_status_interval, so that the > master repeatedly disconnects the standby because it doesn't reply in time. > Maybe the standby should report how often it's going to send a heartbeat, > and master should wait for that long + some safety margin. Or maybe the > master should tell the standby how often it should send the heartbeat? I guess the biggest use case for that behavior would be in a case where you have two standbys, one of which doesn't send a heartbeat and the other of which does. Then you really can't rely on a single timeout. Maybe we could change the server parameter to indicate what multiple of wal_receiver_status_interval causes a hangup, and then change the client to notify the server what value it's using. But that gets complicated, because the value could be changed while the standby is running. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Feb 11, 2011 at 12:11 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina <drfarina@acm.org> wrote: >> I split this out of the synchronous replication patch for independent >> review. I'm dashing out the door, so I haven't put it on the CF yet or >> anything, but I just wanted to get it out there...I'll be around in >> Not Too Long to finish any other details. > > This looks like a useful and separately committable change. > > However, it looks to me like this renders wal_sender_delay aka > WalSndDelay completely unused. If we don't need that GUC any more, we > should rip it out completely. Indeed; I have cleaned this up. > The comment in WalSndHandshake should have a tab at the beginning of > every line. Right now the first line has a tab and the rest have > spaces. > Also correct. Done. > The first hunk in WalSndLoop is a meaningless whitespace change. I was trying to get it under 80 columns wide, but yes, it is unnecessary. I think this closes out the small fry. I have rebased my splitorific branch to reflect these changes: https://github.com/fdr/postgres/commits/splitorific Context diff equivalent attached. -- fdr
Attachment
On Fri, Feb 11, 2011 at 4:38 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Feb 11, 2011 at 4:30 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> On 11.02.2011 22:11, Robert Haas wrote: >>> >>> On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina<drfarina@acm.org> wrote: >>>> >>>> I split this out of the synchronous replication patch for independent >>>> review. I'm dashing out the door, so I haven't put it on the CF yet or >>>> anything, but I just wanted to get it out there...I'll be around in >>>> Not Too Long to finish any other details. >>> >>> This looks like a useful and separately committable change. >> >> Hmm, so this patch implements a watchdog, where the master disconnects the >> standby if the heartbeat from the standby stops for more than >> 'replication_[server]_timeout' seconds. The standby sends the heartbeat >> every wal_receiver_status_interval seconds. >> >> It would be nice if the master and standby could negotiate those settings. >> As the patch stands, it's easy to have a pathological configuration where >> replication_server_timeout < wal_receiver_status_interval, so that the >> master repeatedly disconnects the standby because it doesn't reply in time. >> Maybe the standby should report how often it's going to send a heartbeat, >> and master should wait for that long + some safety margin. Or maybe the >> master should tell the standby how often it should send the heartbeat? > > I guess the biggest use case for that behavior would be in a case > where you have two standbys, one of which doesn't send a heartbeat and > the other of which does. Then you really can't rely on a single > timeout. > > Maybe we could change the server parameter to indicate what multiple > of wal_receiver_status_interval causes a hangup, and then change the > client to notify the server what value it's using. But that gets > complicated, because the value could be changed while the standby is > running. On reflection I'm deeply uncertain this is a good idea. It's pretty hopeless to suppose that we can keep the user from choosing parameter settings which will cause them problems, and there are certainly far stupider things they could do then set replication_timeout < wal_receiver_status_interval. They could, for example, set fsync=off or work_mem=4GB or checkpoint_segments=3 (never mind that we ship that last one out of the box). Any of those settings have the potential to thoroughly destroy their system in one way or another, and there's not a darn thing we can do about it. Setting up some kind of handshake system based on a multiple of the wal_receiver_status_interval is going to be complex, and it's not necessarily going to deliver the behavior someone wants anyway. If someone has wal_receiver_status_interval=10 on one system and =30 on another system, does it therefore follow that the timeouts should also be different by 3X? Perhaps, but it's non-obvious. There are two things that I think are pretty clear. If the receiver has wal_receiver_status_interval=0, then we should ignore replication_timeout for that connection. And also we need to make sure that the replication_timeout can't kill off a connection that is in the middle of streaming a large base backup. Maybe we should try to get those two cases right and not worry about the rest. Dan, can you check whether the base backup thing is a problem with this as implemented? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
<p>On Feb 11, 2011 8:20 PM, "Robert Haas" <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>> wrote:<br/> ><br /> > On Fri, Feb 11, 2011 at 4:38 PM, Robert Haas <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>wrote:<br /> > > On Fri, Feb 11, 2011 at 4:30 PM,Heikki Linnakangas<br /> > > <<a href="mailto:heikki.linnakangas@enterprisedb.com">heikki.linnakangas@enterprisedb.com</a>>wrote:<br /> > >> On11.02.2011 22:11, Robert Haas wrote:<br /> > >>><br /> > >>> On Fri, Feb 11, 2011 at 2:02 PM, DanielFarina<<a href="mailto:drfarina@acm.org">drfarina@acm.org</a>> wrote:<br /> > >>>><br /> >>>>> I split this out of the synchronous replication patch for independent<br /> > >>>> review.I'm dashing out the door, so I haven't put it on the CF yet or<br /> > >>>> anything, but I just wantedto get it out there...I'll be around in<br /> > >>>> Not Too Long to finish any other details.<br />> >>><br /> > >>> This looks like a useful and separately committable change.<br /> > >><br/> > >> Hmm, so this patch implements a watchdog, where the master disconnects the<br /> > >>standby if the heartbeat from the standby stops for more than<br /> > >> 'replication_[server]_timeout'seconds. The standby sends the heartbeat<br /> > >> every wal_receiver_status_intervalseconds.<br /> > >><br /> > >> It would be nice if the master and standby couldnegotiate those settings.<br /> > >> As the patch stands, it's easy to have a pathological configuration where<br/> > >> replication_server_timeout < wal_receiver_status_interval, so that the<br /> > >> masterrepeatedly disconnects the standby because it doesn't reply in time.<br /> > >> Maybe the standby should reporthow often it's going to send a heartbeat,<br /> > >> and master should wait for that long + some safety margin.Or maybe the<br /> > >> master should tell the standby how often it should send the heartbeat?<br /> >><br /> > > I guess the biggest use case for that behavior would be in a case<br /> > > where you havetwo standbys, one of which doesn't send a heartbeat and<br /> > > the other of which does. Then you really can'trely on a single<br /> > > timeout.<br /> > ><br /> > > Maybe we could change the server parameterto indicate what multiple<br /> > > of wal_receiver_status_interval causes a hangup, and then change the<br/> > > client to notify the server what value it's using. But that gets<br /> > > complicated, becausethe value could be changed while the standby is<br /> > > running.<br /> ><br /> > On reflection I'm deeplyuncertain this is a good idea. It's pretty<br /> > hopeless to suppose that we can keep the user from choosingparameter<br /> > settings which will cause them problems, and there are certainly far<br /> > stupider thingsthey could do then set replication_timeout <<br /> > wal_receiver_status_interval. They could, for example,set fsync=off<br /> > or work_mem=4GB or checkpoint_segments=3 (never mind that we ship that<br /> > last oneout of the box). Any of those settings have the potential to<br /> > thoroughly destroy their system in one way oranother, and there's not<br /> > a darn thing we can do about it. Setting up some kind of handshake<br /> > systembased on a multiple of the wal_receiver_status_interval is<br /> > going to be complex, and it's not necessarilygoing to deliver the<br /> > behavior someone wants anyway. If someone has<br /> > wal_receiver_status_interval=10on one system and =30 on another<br /> > system, does it therefore follow that the timeoutsshould also be<br /> > different by 3X? Perhaps, but it's non-obvious.<br /> ><br /> > There are two thingsthat I think are pretty clear. If the receiver<br /> > has wal_receiver_status_interval=0, then we should ignore<br/> > replication_timeout for that connection. And also we need to make<br /> > sure that the replication_timeoutcan't kill off a connection that is<br /> > in the middle of streaming a large base backup. Maybewe should try<br /> > to get those two cases right and not worry about the rest. Dan, can<br /> > you checkwhether the base backup thing is a problem with this as<br /> > implemented?<p>Yes, I will have something to saycome Saturday.<p>--<br /> fdr
On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina <daniel@heroku.com> wrote: > Context diff equivalent attached. Thanks for the patch! As I said before, the timeout which this patch provides doesn't work well when the walsender gets blocked in sending WAL. At first, we would need to implement a non-blocking write function as an infrastructure of the replication timeout, I think. http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina <daniel@heroku.com> wrote: >> Context diff equivalent attached. > > Thanks for the patch! > > As I said before, the timeout which this patch provides doesn't work well > when the walsender gets blocked in sending WAL. At first, we would > need to implement a non-blocking write function as an infrastructure > of the replication timeout, I think. > http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com Interesting point...if that's accepted as required-for-commit, what are the perceptions of the odds that, presuming I can write the code quickly enough, that there's enough infrastructure/ports already in postgres to allow for a non-blocking write on all our supported platforms? -- fdr
On Mon, 2011-02-14 at 14:13 -0800, Daniel Farina wrote: > On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > > On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina <daniel@heroku.com> wrote: > >> Context diff equivalent attached. > > > > Thanks for the patch! > > > > As I said before, the timeout which this patch provides doesn't work well > > when the walsender gets blocked in sending WAL. At first, we would > > need to implement a non-blocking write function as an infrastructure > > of the replication timeout, I think. > > http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com I wasn't aware that had been raised before. Thanks for noting it again. I guess that's why you thought "wait forever" was a good idea ;-) > Interesting point...if that's accepted as required-for-commit, what > are the perceptions of the odds that, presuming I can write the code > quickly enough, that there's enough infrastructure/ports already in > postgres to allow for a non-blocking write on all our supported > platforms? I'd like to see what you come up with. I would rate that as important, though not essential for sync replication. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Mon, Feb 14, 2011 at 5:13 PM, Daniel Farina <daniel@heroku.com> wrote: > On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina <daniel@heroku.com> wrote: >>> Context diff equivalent attached. >> >> Thanks for the patch! >> >> As I said before, the timeout which this patch provides doesn't work well >> when the walsender gets blocked in sending WAL. At first, we would >> need to implement a non-blocking write function as an infrastructure >> of the replication timeout, I think. >> http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com > > Interesting point...if that's accepted as required-for-commit, what > are the perceptions of the odds that, presuming I can write the code > quickly enough, that there's enough infrastructure/ports already in > postgres to allow for a non-blocking write on all our supported > platforms? Are you working on this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Feb 15, 2011 at 7:13 AM, Daniel Farina <daniel@heroku.com> wrote: > On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina <daniel@heroku.com> wrote: >>> Context diff equivalent attached. >> >> Thanks for the patch! >> >> As I said before, the timeout which this patch provides doesn't work well >> when the walsender gets blocked in sending WAL. At first, we would >> need to implement a non-blocking write function as an infrastructure >> of the replication timeout, I think. >> http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com > > Interesting point...if that's accepted as required-for-commit, what > are the perceptions of the odds that, presuming I can write the code > quickly enough, that there's enough infrastructure/ports already in > postgres to allow for a non-blocking write on all our supported > platforms? I'm not sure if there's already enough infrastructure for a non-blocking write. But the patch which I submitted before might help to implement that. http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, 2011-02-16 at 11:34 +0900, Fujii Masao wrote: > On Tue, Feb 15, 2011 at 7:13 AM, Daniel Farina <daniel@heroku.com> wrote: > > On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > >> On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina <daniel@heroku.com> wrote: > >>> Context diff equivalent attached. > >> > >> Thanks for the patch! > >> > >> As I said before, the timeout which this patch provides doesn't work well > >> when the walsender gets blocked in sending WAL. At first, we would > >> need to implement a non-blocking write function as an infrastructure > >> of the replication timeout, I think. > >> http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com > > > > Interesting point...if that's accepted as required-for-commit, what > > are the perceptions of the odds that, presuming I can write the code > > quickly enough, that there's enough infrastructure/ports already in > > postgres to allow for a non-blocking write on all our supported > > platforms? > > I'm not sure if there's already enough infrastructure for a non-blocking > write. But the patch which I submitted before might help to implement that. > http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com So, in summary, the position is that we have a timeout, but that timeout doesn't work in all cases. But it does work in some, so that seems enough for me to say "let's commit". Not committing gives us nothing at all, which is as much use as a chocolate teapot. I will be looking to commit this tomorrow morning, unless I hear some clear No comments, with reasons. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Thu, Feb 17, 2011 at 4:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, 2011-02-16 at 11:34 +0900, Fujii Masao wrote: >> On Tue, Feb 15, 2011 at 7:13 AM, Daniel Farina <daniel@heroku.com> wrote: >> > On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> >> On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina <daniel@heroku.com> wrote: >> >>> Context diff equivalent attached. >> >> >> >> Thanks for the patch! >> >> >> >> As I said before, the timeout which this patch provides doesn't work well >> >> when the walsender gets blocked in sending WAL. At first, we would >> >> need to implement a non-blocking write function as an infrastructure >> >> of the replication timeout, I think. >> >> http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com >> > >> > Interesting point...if that's accepted as required-for-commit, what >> > are the perceptions of the odds that, presuming I can write the code >> > quickly enough, that there's enough infrastructure/ports already in >> > postgres to allow for a non-blocking write on all our supported >> > platforms? >> >> I'm not sure if there's already enough infrastructure for a non-blocking >> write. But the patch which I submitted before might help to implement that. >> http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com > > So, in summary, the position is that we have a timeout, but that timeout > doesn't work in all cases. But it does work in some, so that seems > enough for me to say "let's commit". Not committing gives us nothing at > all, which is as much use as a chocolate teapot. > > I will be looking to commit this tomorrow morning, unless I hear some > clear No comments, with reasons. I guess the question is whether it works in 10% of cases or 95% of cases. In the first case there's probably no point in pretending we have a feature if it doesn't really work. In the second case, it might make sense. But I don't have a good feeling for which it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> So, in summary, the position is that we have a timeout, but that timeout > doesn't work in all cases. But it does work in some, so that seems > enough for me to say "let's commit". Not committing gives us nothing at > all, which is as much use as a chocolate teapot. Can someone summarize the cases where it does and doesn't work? There's been a longish gap in this thread. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Thu, 2011-02-17 at 16:42 -0500, Robert Haas wrote: > > > > So, in summary, the position is that we have a timeout, but that timeout > > doesn't work in all cases. But it does work in some, so that seems > > enough for me to say "let's commit". Not committing gives us nothing at > > all, which is as much use as a chocolate teapot. > > > > I will be looking to commit this tomorrow morning, unless I hear some > > clear No comments, with reasons. > > I guess the question is whether it works in 10% of cases or 95% of > cases. In the first case there's probably no point in pretending we > have a feature if it doesn't really work. In the second case, it > might make sense. But I don't have a good feeling for which it is. Well, I guess the people that wanted to wait forever may get their wish. For sync rep, I intend to put in place a client timeout, which we do have code for. The server side timeout still makes sense, but it's not a requirement for sync rep. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Fri, Feb 18, 2011 at 7:55 AM, Josh Berkus <josh@agliodbs.com> wrote: >> So, in summary, the position is that we have a timeout, but that timeout >> doesn't work in all cases. But it does work in some, so that seems >> enough for me to say "let's commit". Not committing gives us nothing at >> all, which is as much use as a chocolate teapot. > > Can someone summarize the cases where it does and doesn't work? > There's been a longish gap in this thread. The timeout doesn't work when walsender gets blocked during sending the WAL because the send buffer has been filled up, I'm afraid. IOW, it doesn't work when the standby becomes unresponsive while WAL is generated on the master one after another. Since walsender tries to continue sending the WAL while the standby is unresponsive, the send buffer gets filled up and the blocking send function (e.g., pq_flush) blocks the walsender. OTOH, if the standby becomes unresponsive when there is no workload which causes WAL, the timeout would work. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Feb 17, 2011 at 9:10 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Fri, Feb 18, 2011 at 7:55 AM, Josh Berkus <josh@agliodbs.com> wrote: >>> So, in summary, the position is that we have a timeout, but that timeout >>> doesn't work in all cases. But it does work in some, so that seems >>> enough for me to say "let's commit". Not committing gives us nothing at >>> all, which is as much use as a chocolate teapot. >> >> Can someone summarize the cases where it does and doesn't work? >> There's been a longish gap in this thread. > > The timeout doesn't work when walsender gets blocked during sending the > WAL because the send buffer has been filled up, I'm afraid. IOW, it doesn't > work when the standby becomes unresponsive while WAL is generated on > the master one after another. Since walsender tries to continue sending the > WAL while the standby is unresponsive, the send buffer gets filled up and > the blocking send function (e.g., pq_flush) blocks the walsender. > > OTOH, if the standby becomes unresponsive when there is no workload > which causes WAL, the timeout would work. IMHO, that's so broken as to be useless. I would really like to have a solution to this problem, though. Relying on TCP keepalives is weak. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Feb 18, 2011 at 12:10 PM, Robert Haas <robertmhaas@gmail.com> wrote: > IMHO, that's so broken as to be useless. > > I would really like to have a solution to this problem, though. > Relying on TCP keepalives is weak. Agreed. I updated the replication timeout patch which I submitted before. http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com Since the patch implements also non-blocking send functions, the timeout can work properly even when the send buffer has been filled up. > There are two things that I think are pretty clear. If the receiver > has wal_receiver_status_interval=0, then we should ignore > replication_timeout for that connection. The patch still doesn't check that wal_receiver_status_interval is set up properly. I'll implement that later. Regards, Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Sun, Feb 27, 2011 at 11:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> There are two things that I think are pretty clear. If the receiver >> has wal_receiver_status_interval=0, then we should ignore >> replication_timeout for that connection. > > The patch still doesn't check that wal_receiver_status_interval > is set up properly. I'll implement that later. Done. I attached the updated patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Mon, Feb 28, 2011 at 8:08 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Sun, Feb 27, 2011 at 11:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> There are two things that I think are pretty clear. If the receiver >>> has wal_receiver_status_interval=0, then we should ignore >>> replication_timeout for that connection. >> >> The patch still doesn't check that wal_receiver_status_interval >> is set up properly. I'll implement that later. > > Done. I attached the updated patch. Why does internal_flush_if_writable compute bufptr differently from internal_flush? And shouldn't it be static? It seems to me that this ought to be refactored so that you don't duplicate so much code. Maybe static int internal_flush(bool nonblocking). I don't think that the while (bufptr < bufend) loop needs to contain the code to set and clear the nonblocking state. You could do the whole loop with nonblocking mode turned on and then reenable it just once at the end. Besides possibly being clearer, that would be more efficient and leave less room for unexpected failures. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Mar 6, 2011 at 3:23 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Feb 28, 2011 at 8:08 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Sun, Feb 27, 2011 at 11:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>> There are two things that I think are pretty clear. If the receiver >>>> has wal_receiver_status_interval=0, then we should ignore >>>> replication_timeout for that connection. >>> >>> The patch still doesn't check that wal_receiver_status_interval >>> is set up properly. I'll implement that later. >> >> Done. I attached the updated patch. > > Why does internal_flush_if_writable compute bufptr differently from > internal_flush? And shouldn't it be static? > > It seems to me that this ought to be refactored so that you don't > duplicate so much code. Maybe static int internal_flush(bool > nonblocking). > > I don't think that the while (bufptr < bufend) loop needs to contain > the code to set and clear the nonblocking state. You could do the > whole loop with nonblocking mode turned on and then reenable it just > once at the end. Besides possibly being clearer, that would be more > efficient and leave less room for unexpected failures. All these comments seem to make sense. Will fix. Thanks! Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Sun, Mar 6, 2011 at 5:03 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> Why does internal_flush_if_writable compute bufptr differently from >> internal_flush? And shouldn't it be static? >> >> It seems to me that this ought to be refactored so that you don't >> duplicate so much code. Maybe static int internal_flush(bool >> nonblocking). >> >> I don't think that the while (bufptr < bufend) loop needs to contain >> the code to set and clear the nonblocking state. You could do the >> whole loop with nonblocking mode turned on and then reenable it just >> once at the end. Besides possibly being clearer, that would be more >> efficient and leave less room for unexpected failures. > > All these comments seem to make sense. Will fix. Thanks! Done. I attached the updated patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Sun, Mar 6, 2011 at 11:10 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Sun, Mar 6, 2011 at 5:03 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> Why does internal_flush_if_writable compute bufptr differently from >>> internal_flush? And shouldn't it be static? >>> >>> It seems to me that this ought to be refactored so that you don't >>> duplicate so much code. Maybe static int internal_flush(bool >>> nonblocking). >>> >>> I don't think that the while (bufptr < bufend) loop needs to contain >>> the code to set and clear the nonblocking state. You could do the >>> whole loop with nonblocking mode turned on and then reenable it just >>> once at the end. Besides possibly being clearer, that would be more >>> efficient and leave less room for unexpected failures. >> >> All these comments seem to make sense. Will fix. Thanks! > > Done. I attached the updated patch. I rebased the patch against current git master. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Mon, Mar 7, 2011 at 8:47 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Sun, Mar 6, 2011 at 11:10 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Sun, Mar 6, 2011 at 5:03 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>> Why does internal_flush_if_writable compute bufptr differently from >>>> internal_flush? And shouldn't it be static? >>>> >>>> It seems to me that this ought to be refactored so that you don't >>>> duplicate so much code. Maybe static int internal_flush(bool >>>> nonblocking). >>>> >>>> I don't think that the while (bufptr < bufend) loop needs to contain >>>> the code to set and clear the nonblocking state. You could do the >>>> whole loop with nonblocking mode turned on and then reenable it just >>>> once at the end. Besides possibly being clearer, that would be more >>>> efficient and leave less room for unexpected failures. >>> >>> All these comments seem to make sense. Will fix. Thanks! >> >> Done. I attached the updated patch. > > I rebased the patch against current git master. I added this replication timeout patch into next CF. I explain why this feature is required for the future review; Without this feature, walsender might unexpectedly remain for a while when the standby crashes or the network outage happens. TCP keepalive can improve this situation to a certain extent, but it's not perfect. Remaining walsender can cause some problems. For example, when hot_standby_feedback is enabled, such a remaining walsender would prevent oldest xmin from advancing and interfere with vacuuming on the master. For example, when you use synchronous replication and walsender in SYNC mode gets stuck, any synchronous standby candidate cannot switch to SYNC mode until that walsender exits, and all the transactions would pause. This feature causes walsender to exit when there is no reply from the standby before the replication timeout expires. Then we can avoid the above problems. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Mar 11, 2011 at 8:14 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, Mar 7, 2011 at 8:47 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Sun, Mar 6, 2011 at 11:10 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> On Sun, Mar 6, 2011 at 5:03 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>> Why does internal_flush_if_writable compute bufptr differently from >>>>> internal_flush? And shouldn't it be static? >>>>> >>>>> It seems to me that this ought to be refactored so that you don't >>>>> duplicate so much code. Maybe static int internal_flush(bool >>>>> nonblocking). >>>>> >>>>> I don't think that the while (bufptr < bufend) loop needs to contain >>>>> the code to set and clear the nonblocking state. You could do the >>>>> whole loop with nonblocking mode turned on and then reenable it just >>>>> once at the end. Besides possibly being clearer, that would be more >>>>> efficient and leave less room for unexpected failures. >>>> >>>> All these comments seem to make sense. Will fix. Thanks! >>> >>> Done. I attached the updated patch. >> >> I rebased the patch against current git master. > > I added this replication timeout patch into next CF. > > I explain why this feature is required for the future review; > > Without this feature, walsender might unexpectedly remain for a while when > the standby crashes or the network outage happens. TCP keepalive can > improve this situation to a certain extent, but it's not perfect. Remaining > walsender can cause some problems. > > For example, when hot_standby_feedback is enabled, such a remaining > walsender would prevent oldest xmin from advancing and interfere with > vacuuming on the master. For example, when you use synchronous > replication and walsender in SYNC mode gets stuck, any synchronous > standby candidate cannot switch to SYNC mode until that walsender exits, > and all the transactions would pause. > > This feature causes walsender to exit when there is no reply from the > standby before the replication timeout expires. Then we can avoid the > above problems. I think we should consider making this change for 9.1. This is a real wart, and it's going to become even more of a problem with sync rep, I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 11, 2011 at 10:18 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I added this replication timeout patch into next CF. >> >> I explain why this feature is required for the future review; >> >> Without this feature, walsender might unexpectedly remain for a while when >> the standby crashes or the network outage happens. TCP keepalive can >> improve this situation to a certain extent, but it's not perfect. Remaining >> walsender can cause some problems. >> >> For example, when hot_standby_feedback is enabled, such a remaining >> walsender would prevent oldest xmin from advancing and interfere with >> vacuuming on the master. For example, when you use synchronous >> replication and walsender in SYNC mode gets stuck, any synchronous >> standby candidate cannot switch to SYNC mode until that walsender exits, >> and all the transactions would pause. >> >> This feature causes walsender to exit when there is no reply from the >> standby before the replication timeout expires. Then we can avoid the >> above problems. > > I think we should consider making this change for 9.1. This is a real > wart, and it's going to become even more of a problem with sync rep, I > think. Yeah, that's a welcome! Please feel free to review the patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao wrote: > On Fri, Mar 11, 2011 at 10:18 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> I added this replication timeout patch into next CF. > >> > >> I explain why this feature is required for the future review; > >> > >> Without this feature, walsender might unexpectedly remain for a while when > >> the standby crashes or the network outage happens. TCP keepalive can > >> improve this situation to?a certain extent, but it's not perfect. Remaining > >> walsender can cause some problems. > >> > >> For example, when hot_standby_feedback is enabled, such a remaining > >> walsender would prevent oldest xmin from advancing and interfere with > >> vacuuming on the master. For example, when you use synchronous > >> replication and walsender in SYNC mode gets stuck, any synchronous > >> standby candidate cannot switch to SYNC mode until that walsender exits, > >> and all the transactions would pause. > >> > >> This feature causes walsender to exit when there is no reply from the > >> standby before the replication timeout expires. Then we can avoid the > >> above problems. > > > > I think we should consider making this change for 9.1. ?This is a real > > wart, and it's going to become even more of a problem with sync rep, I > > think. > > Yeah, that's a welcome! Please feel free to review the patch. It is already in the next commitfest, so if someone wants to add it as an open 9.1 item, go ahead. I am unclear of this so I am not adding it. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> I think we should consider making this change for 9.1. This is a real >> wart, and it's going to become even more of a problem with sync rep, I >> think. > > Yeah, that's a welcome! Please feel free to review the patch. I discussed this with Heikki on IM. I think we should rip all the GUC change stuff out of this patch and just decree that if you set a timeout, you get a timeout. If you set this inconsistently with wal_receiver_status_interval, then you'll get lots of disconnects. But that's your problem. This may seem a little unfriendly, but the logic in here is quite complex and still isn't going to really provide that much protection against bad configurations. The only realistic alternative I see is to define replication_timeout as a multiple of the client's wal_receiver_status_interval, but that seems quite annoyingly unfriendly. A single replication_timeout that applies to all slaves doesn't cover every configuration someone might want, but it's simple and easy to understand and should cover 95% of cases. If we find that it's really necessary to be able to customize it further, then we might go the route of adding the much-discussed standby registration stuff, where there's a separate config file or system table where you can stipulate that when a walsender with application_name=foo connects, you want it to get wal_receiver_status_interval=$FOO. But I think that complexity can certainly wait until 9.2 or later. I also think that the default for replication_timeout should not be 0.Something like 60s seems about right. That way, ifyou just use the default settings, you'll get pretty sane behavior - a connectivity hiccup that lasts more than a minute will bounce the client. We've already gotten reports of people who thought they were replicating when they really weren't, and had to fiddle with settings and struggle to try to make it robust. This should make things a lot nicer for people out of the box, but it won't if it's disabled out of the box. On another note, there doesn't appear to be any need to change the return value of WaitLatchOrSocket(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Mar 12, 2011 at 4:34 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> I think we should consider making this change for 9.1. This is a real >>> wart, and it's going to become even more of a problem with sync rep, I >>> think. >> >> Yeah, that's a welcome! Please feel free to review the patch. > > I discussed this with Heikki on IM. > > I think we should rip all the GUC change stuff out of this patch and > just decree that if you set a timeout, you get a timeout. If you set > this inconsistently with wal_receiver_status_interval, then you'll get > lots of disconnects. But that's your problem. This may seem a little > unfriendly, but the logic in here is quite complex and still isn't > going to really provide that much protection against bad > configurations. The only realistic alternative I see is to define > replication_timeout as a multiple of the client's > wal_receiver_status_interval, but that seems quite annoyingly > unfriendly. A single replication_timeout that applies to all slaves > doesn't cover every configuration someone might want, but it's simple > and easy to understand and should cover 95% of cases. If we find that > it's really necessary to be able to customize it further, then we > might go the route of adding the much-discussed standby registration > stuff, where there's a separate config file or system table where you > can stipulate that when a walsender with application_name=foo > connects, you want it to get wal_receiver_status_interval=$FOO. But I > think that complexity can certainly wait until 9.2 or later. > > I also think that the default for replication_timeout should not be 0. > Something like 60s seems about right. That way, if you just use the > default settings, you'll get pretty sane behavior - a connectivity > hiccup that lasts more than a minute will bounce the client. We've > already gotten reports of people who thought they were replicating > when they really weren't, and had to fiddle with settings and struggle > to try to make it robust. This should make things a lot nicer for > people out of the box, but it won't if it's disabled out of the box. > > On another note, there doesn't appear to be any need to change the > return value of WaitLatchOrSocket(). Agreed. I'll change the patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > Agreed. I'll change the patch. Done. I attached the updated patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On 16.03.2011 11:11, Fujii Masao wrote: > On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masao<masao.fujii@gmail.com> wrote: >> Agreed. I'll change the patch. > > Done. I attached the updated patch. I don't much like the API for this. Walsender shouldn't need to know about the details of the FE/BE protocol, pq_putbytes_if_available() seems too low level to be useful. I think a better API would be to have a non-blocking version of pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, so that when the message doesn't fit in the output buffer in pq_putmessage(), the buffer is enlarged instead of trying to flush it. Attached is a patch using that approach. This is a much smaller patch, and easier to understand. I'm not totally happy with the walsender main loop, it seems to work as it is, but the logic has become quite complicated. Ideas welcome on how to simplify that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
On Wed, Mar 23, 2011 at 7:33 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > I don't much like the API for this. Walsender shouldn't need to know about > the details of the FE/BE protocol, pq_putbytes_if_available() seems too low > level to be useful. > > I think a better API would be to have a non-blocking version of > pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, so > that when the message doesn't fit in the output buffer in pq_putmessage(), > the buffer is enlarged instead of trying to flush it. > > Attached is a patch using that approach. This is a much smaller patch, and > easier to understand. Agreed. Thanks for improving the patch. pq_flush_if_writable() calls internal_flush() without using PG_TRY block. This seems unsafe because for example pgwin32_waitforsinglesocket() called by secure_write() can throw ERROR. > I'm not totally happy with the walsender main loop, it > seems to work as it is, but the logic has become quite complicated. Ideas > welcome on how to simplify that. As the patch I proposed before did, how about leaving XLogSend() instead of WalSndLoop() to call pq_flush_if_writable() when there is pending data in output buffer? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Mar 23, 2011 at 6:33 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 16.03.2011 11:11, Fujii Masao wrote: >> >> On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masao<masao.fujii@gmail.com> >> wrote: >>> >>> Agreed. I'll change the patch. >> >> Done. I attached the updated patch. > > I don't much like the API for this. Walsender shouldn't need to know about > the details of the FE/BE protocol, pq_putbytes_if_available() seems too low > level to be useful. > > I think a better API would be to have a non-blocking version of > pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, so > that when the message doesn't fit in the output buffer in pq_putmessage(), > the buffer is enlarged instead of trying to flush it. > > Attached is a patch using that approach. This is a much smaller patch, and > easier to understand. I'm not totally happy with the walsender main loop, it > seems to work as it is, but the logic has become quite complicated. Ideas > welcome on how to simplify that. Heikki, are you planning to commit this, either with or without further revisions? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 24.03.2011 15:24, Fujii Masao wrote: > On Wed, Mar 23, 2011 at 7:33 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> I don't much like the API for this. Walsender shouldn't need to know about >> the details of the FE/BE protocol, pq_putbytes_if_available() seems too low >> level to be useful. >> >> I think a better API would be to have a non-blocking version of >> pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, so >> that when the message doesn't fit in the output buffer in pq_putmessage(), >> the buffer is enlarged instead of trying to flush it. >> >> Attached is a patch using that approach. This is a much smaller patch, and >> easier to understand. > > Agreed. Thanks for improving the patch. > > pq_flush_if_writable() calls internal_flush() without using PG_TRY block. > This seems unsafe because for example pgwin32_waitforsinglesocket() > called by secure_write() can throw ERROR. Perhaps it's time to give up on the assumption that the socket is in blocking mode except within those two functions. Attached patch adds the pq_set_nonblocking() function from your patch, and adds calls to it before all secure_read/write operations to put the socket in the right mode. There's only a few of those operations. Should we use COMMERROR instead of ERROR if we fail to put the socket in the right mode? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> pq_flush_if_writable() calls internal_flush() without using PG_TRY block. >> This seems unsafe because for example pgwin32_waitforsinglesocket() >> called by secure_write() can throw ERROR. > > Perhaps it's time to give up on the assumption that the socket is in > blocking mode except within those two functions. Attached patch adds the > pq_set_nonblocking() function from your patch, and adds calls to it before > all secure_read/write operations to put the socket in the right mode. > There's only a few of those operations. Sounds good. + pq_set_nonblocking(false); /* XXX: Is this required? */ No. Since secure_close and close_SSL don't use MyProcPort->sock and MyProcPort->noblock which can be changed in pq_set_nonblocking, I don't think that is required. + pq_putmessage_noblock('d', msgbuf, 1 + sizeof(WalDataMessageHeader) + nbytes); Don't we need to check the return value of pq_putmessage_noblock? That can return EOF when trouble happens (for example the send system call fails). > Should we use COMMERROR instead of ERROR if we fail to put the socket in the > right mode? Maybe. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao <masao.fujii@gmail.com> writes: > On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas >> Should we use COMMERROR instead of ERROR if we fail to put the socket in the >> right mode? > Maybe. COMMERROR exists to keep us from trying to send an error report down a failed socket. I would assume (perhaps wrongly) that walsender/walreceiver don't try to push error reports across the socket anyway, only to the postmaster log. If correct, there is no need for COMMERROR, and using it just muddies the code. regards, tom lane
On Tue, Mar 29, 2011 at 9:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Fujii Masao <masao.fujii@gmail.com> writes: >> On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas >>> Should we use COMMERROR instead of ERROR if we fail to put the socket in the >>> right mode? > >> Maybe. > > COMMERROR exists to keep us from trying to send an error report down a > failed socket. I would assume (perhaps wrongly) that > walsender/walreceiver don't try to push error reports across the socket > anyway, only to the postmaster log. If correct, there is no need for > COMMERROR, and using it just muddies the code. I don't think that's how it works. The error the server sends is copied into some of the messages in the client log, which is really useful for debugging. ERROR: can't connect to the server (server said: you're not authorized) ...or something like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 30, 2011 at 1:04 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> COMMERROR exists to keep us from trying to send an error report down a >> failed socket. I would assume (perhaps wrongly) that >> walsender/walreceiver don't try to push error reports across the socket >> anyway, only to the postmaster log. If correct, there is no need for >> COMMERROR, and using it just muddies the code. > > I don't think that's how it works. The error the server sends is > copied into some of the messages in the client log, which is really > useful for debugging. > > ERROR: can't connect to the server (server said: you're not authorized) > > ...or something like that. Yes. Walsender sends its error message to walreceiver, and walreceiver writes it down to the server log. For example; FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 000000010000000000000016 has already been removed The second FATAL message is sent from walsender. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 29.03.2011 07:55, Fujii Masao wrote: > On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >>> pq_flush_if_writable() calls internal_flush() without using PG_TRY block. >>> This seems unsafe because for example pgwin32_waitforsinglesocket() >>> called by secure_write() can throw ERROR. >> >> Perhaps it's time to give up on the assumption that the socket is in >> blocking mode except within those two functions. Attached patch adds the >> pq_set_nonblocking() function from your patch, and adds calls to it before >> all secure_read/write operations to put the socket in the right mode. >> There's only a few of those operations. > > Sounds good. > > + pq_set_nonblocking(false); /* XXX: Is this required? */ > > No. Since secure_close and close_SSL don't use MyProcPort->sock and > MyProcPort->noblock which can be changed in pq_set_nonblocking, > I don't think that is required. Ok, I took that out. > + pq_putmessage_noblock('d', msgbuf, 1 + sizeof(WalDataMessageHeader) + nbytes); > > Don't we need to check the return value of pq_putmessage_noblock? That > can return EOF when trouble happens (for example the send system call fails). No, pq_putmessage_noblock doesn't call send() because it enlarges the buffer to make sure the message fits, and it doesn't anything else that could fail else. I changed its return type to void, and added an Assert() to check that the pq_putmessage() call it does internally indeed doesn't fail. >> Should we use COMMERROR instead of ERROR if we fail to put the socket in the >> right mode? > > Maybe. I made it COMMERROR. ERRORs are sent to the client, and you could get into infinite recursion if sending the ERROR requires setting the blocking mode again. Committed with those changes. I also reworded the docs a bit. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> + pq_putmessage_noblock('d', msgbuf, 1 + >> sizeof(WalDataMessageHeader) + nbytes); >> >> Don't we need to check the return value of pq_putmessage_noblock? That >> can return EOF when trouble happens (for example the send system call >> fails). > > No, pq_putmessage_noblock doesn't call send() because it enlarges the buffer > to make sure the message fits, and it doesn't anything else that could fail > else. I changed its return type to void, and added an Assert() to check that > the pq_putmessage() call it does internally indeed doesn't fail. Oh, you're right. > Committed with those changes. I also reworded the docs a bit. Thanks a lot! + A value of zero means wait forever. This parameter can only be set in The first sentence sounds misleading. Even if you set the parameter to zero, replication connections can be terminated because of keepalive or socket error. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 30.03.2011 10:58, Fujii Masao wrote: > On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > + A value of zero means wait forever. This parameter can only be set in > > The first sentence sounds misleading. Even if you set the parameter to zero, > replication connections can be terminated because of keepalive or socket error. Hmm, should I change it back to "A value of zero disables the timeout" ? Any better suggestions? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 30.03.2011 10:58, Fujii Masao wrote: >> >> On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >> + A value of zero means wait forever. This parameter can only be >> set in >> >> The first sentence sounds misleading. Even if you set the parameter to >> zero, >> replication connections can be terminated because of keepalive or socket >> error. > > Hmm, should I change it back to "A value of zero disables the timeout" ? Any > better suggestions? I like that. But I appreciate if anyone suggests the better. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Mar 30, 2011 at 4:08 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> On 30.03.2011 10:58, Fujii Masao wrote: >>> >>> On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas >>> <heikki.linnakangas@enterprisedb.com> wrote: >>> + A value of zero means wait forever. This parameter can only be >>> set in >>> >>> The first sentence sounds misleading. Even if you set the parameter to >>> zero, >>> replication connections can be terminated because of keepalive or socket >>> error. >> >> Hmm, should I change it back to "A value of zero disables the timeout" ? Any >> better suggestions? > > I like that. But I appreciate if anyone suggests the better. Maybe sticking the word "mechanism" in there would be a bit better. "A value of zero disables the timeout mechanism"? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 30, 2011 at 10:54 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Mar 30, 2011 at 4:08 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> On 30.03.2011 10:58, Fujii Masao wrote: >>>> >>>> On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas >>>> <heikki.linnakangas@enterprisedb.com> wrote: >>>> + A value of zero means wait forever. This parameter can only be >>>> set in >>>> >>>> The first sentence sounds misleading. Even if you set the parameter to >>>> zero, >>>> replication connections can be terminated because of keepalive or socket >>>> error. >>> >>> Hmm, should I change it back to "A value of zero disables the timeout" ? Any >>> better suggestions? >> >> I like that. But I appreciate if anyone suggests the better. > > Maybe sticking the word "mechanism" in there would be a bit better. > "A value of zero disables the timeout mechanism"? I'm OK with that. Or, what about "A value of zero turns this off" which is used in statement_timeout for the sake of consistency? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 31.03.2011 05:46, Fujii Masao wrote: > On Wed, Mar 30, 2011 at 10:54 PM, Robert Haas<robertmhaas@gmail.com> wrote: >> On Wed, Mar 30, 2011 at 4:08 AM, Fujii Masao<masao.fujii@gmail.com> wrote: >>> On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas >>> <heikki.linnakangas@enterprisedb.com> wrote: >>>> On 30.03.2011 10:58, Fujii Masao wrote: >>>>> >>>>> On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas >>>>> <heikki.linnakangas@enterprisedb.com> wrote: >>>>> + A value of zero means wait forever. This parameter can only be >>>>> set in >>>>> >>>>> The first sentence sounds misleading. Even if you set the parameter to >>>>> zero, >>>>> replication connections can be terminated because of keepalive or socket >>>>> error. >>>> >>>> Hmm, should I change it back to "A value of zero disables the timeout" ? Any >>>> better suggestions? >>> >>> I like that. But I appreciate if anyone suggests the better. >> >> Maybe sticking the word "mechanism" in there would be a bit better. >> "A value of zero disables the timeout mechanism"? > > I'm OK with that. Or, what about "A value of zero turns this off" which is > used in statement_timeout for the sake of consistency? Committed Robert's suggestion. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com