Thread: Replication server timeout patch

Replication server timeout patch

From

Daniel Farina

Date:

11 February 2011, 15:03:21

Hello list,

I split this out of the synchronous replication patch for independent
review. I'm dashing out the door, so I haven't put it on the CF yet or
anything, but I just wanted to get it out there...I'll be around in
Not Too Long to finish any other details.

--
fdr

Attachment

0001-Split-and-rename-out-server-timeouts.patch

Re: Replication server timeout patch

From

Robert Haas

Date:

11 February 2011, 16:11:47

On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina <drfarina@acm.org> wrote:
> I split this out of the synchronous replication patch for independent
> review. I'm dashing out the door, so I haven't put it on the CF yet or
> anything, but I just wanted to get it out there...I'll be around in
> Not Too Long to finish any other details.

This looks like a useful and separately committable change.

However, it looks to me like this renders wal_sender_delay aka
WalSndDelay completely unused.  If we don't need that GUC any more, we
should rip it out completely.

The comment in WalSndHandshake should have a tab at the beginning of
every line.  Right now the first line has a tab and the rest have
spaces.

The first hunk in WalSndLoop is a meaningless whitespace change.

I wonder if we ought to just call this replication_timeout, rather
than replication_timeout_server.  Simon's patch (from which this
extracted) also has replication_timeout_client, but the two aren't
symmetrical.  The replication_timeout_client in this patch is the
amount of time after which the master acknowledges the commit even
though the synchronous standby hasn't acked yet.  So it only applies
to the synchronous replication case, whereas this is useful for both
synchronous and asynchronous replication.  I'm inclined to think that
knob is utterly useless anyway; surely waiting more than zero will
reduce the throughput of the system to some minute fraction of its
normal value, while waiting less than infinity throws out the data
guarantee that made you pick synchronous replication in the first
place.  Even if we do decide to keep that knob, I don't think we'll
want the value to be symmetric with this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Replication server timeout patch

From

Heikki Linnakangas

Date:

11 February 2011, 17:30:49

On 11.02.2011 22:11, Robert Haas wrote:
> On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina<drfarina@acm.org>  wrote:
>> I split this out of the synchronous replication patch for independent
>> review. I'm dashing out the door, so I haven't put it on the CF yet or
>> anything, but I just wanted to get it out there...I'll be around in
>> Not Too Long to finish any other details.
>
> This looks like a useful and separately committable change.

Hmm, so this patch implements a watchdog, where the master disconnects 
the standby if the heartbeat from the standby stops for more than 
'replication_[server]_timeout' seconds. The standby sends the heartbeat 
every wal_receiver_status_interval seconds.

It would be nice if the master and standby could negotiate those 
settings. As the patch stands, it's easy to have a pathological 
configuration where replication_server_timeout < 
wal_receiver_status_interval, so that the master repeatedly disconnects 
the standby because it doesn't reply in time. Maybe the standby should 
report how often it's going to send a heartbeat, and master should wait 
for that long + some safety margin. Or maybe the master should tell the 
standby how often it should send the heartbeat?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Replication server timeout patch

From

Robert Haas

Date:

11 February 2011, 17:38:38

On Fri, Feb 11, 2011 at 4:30 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 11.02.2011 22:11, Robert Haas wrote:
>>
>> On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina<drfarina@acm.org>  wrote:
>>>
>>> I split this out of the synchronous replication patch for independent
>>> review. I'm dashing out the door, so I haven't put it on the CF yet or
>>> anything, but I just wanted to get it out there...I'll be around in
>>> Not Too Long to finish any other details.
>>
>> This looks like a useful and separately committable change.
>
> Hmm, so this patch implements a watchdog, where the master disconnects the
> standby if the heartbeat from the standby stops for more than
> 'replication_[server]_timeout' seconds. The standby sends the heartbeat
> every wal_receiver_status_interval seconds.
>
> It would be nice if the master and standby could negotiate those settings.
> As the patch stands, it's easy to have a pathological configuration where
> replication_server_timeout < wal_receiver_status_interval, so that the
> master repeatedly disconnects the standby because it doesn't reply in time.
> Maybe the standby should report how often it's going to send a heartbeat,
> and master should wait for that long + some safety margin. Or maybe the
> master should tell the standby how often it should send the heartbeat?

I guess the biggest use case for that behavior would be in a case
where you have two standbys, one of which doesn't send a heartbeat and
the other of which does.  Then you really can't rely on a single
timeout.

Maybe we could change the server parameter to indicate what multiple
of wal_receiver_status_interval causes a hangup, and then change the
client to notify the server what value it's using.  But that gets
complicated, because the value could be changed while the standby is
running.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Replication server timeout patch

From

Daniel Farina

Date:

11 February 2011, 19:59:28

On Fri, Feb 11, 2011 at 12:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina <drfarina@acm.org> wrote:
>> I split this out of the synchronous replication patch for independent
>> review. I'm dashing out the door, so I haven't put it on the CF yet or
>> anything, but I just wanted to get it out there...I'll be around in
>> Not Too Long to finish any other details.
>
> This looks like a useful and separately committable change.
>
> However, it looks to me like this renders wal_sender_delay aka
> WalSndDelay completely unused.  If we don't need that GUC any more, we
> should rip it out completely.

Indeed; I have cleaned this up.

> The comment in WalSndHandshake should have a tab at the beginning of
> every line.  Right now the first line has a tab and the rest have
> spaces.
>
Also correct. Done.

> The first hunk in WalSndLoop is a meaningless whitespace change.

I was trying to get it under 80 columns wide, but yes, it is unnecessary.

I think this closes out the small fry.

I have rebased my splitorific branch to reflect these changes:

https://github.com/fdr/postgres/commits/splitorific

Context diff equivalent attached.

--
fdr

Attachment

0001-Split-and-rename-out-server-timeout-of-clients-2.patch

Re: Replication server timeout patch

From

Robert Haas

Date:

12 February 2011, 00:20:57

On Fri, Feb 11, 2011 at 4:38 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Feb 11, 2011 at 4:30 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> On 11.02.2011 22:11, Robert Haas wrote:
>>>
>>> On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina<drfarina@acm.org>  wrote:
>>>>
>>>> I split this out of the synchronous replication patch for independent
>>>> review. I'm dashing out the door, so I haven't put it on the CF yet or
>>>> anything, but I just wanted to get it out there...I'll be around in
>>>> Not Too Long to finish any other details.
>>>
>>> This looks like a useful and separately committable change.
>>
>> Hmm, so this patch implements a watchdog, where the master disconnects the
>> standby if the heartbeat from the standby stops for more than
>> 'replication_[server]_timeout' seconds. The standby sends the heartbeat
>> every wal_receiver_status_interval seconds.
>>
>> It would be nice if the master and standby could negotiate those settings.
>> As the patch stands, it's easy to have a pathological configuration where
>> replication_server_timeout < wal_receiver_status_interval, so that the
>> master repeatedly disconnects the standby because it doesn't reply in time.
>> Maybe the standby should report how often it's going to send a heartbeat,
>> and master should wait for that long + some safety margin. Or maybe the
>> master should tell the standby how often it should send the heartbeat?
>
> I guess the biggest use case for that behavior would be in a case
> where you have two standbys, one of which doesn't send a heartbeat and
> the other of which does.  Then you really can't rely on a single
> timeout.
>
> Maybe we could change the server parameter to indicate what multiple
> of wal_receiver_status_interval causes a hangup, and then change the
> client to notify the server what value it's using.  But that gets
> complicated, because the value could be changed while the standby is
> running.

On reflection I'm deeply uncertain this is a good idea.  It's pretty
hopeless to suppose that we can keep the user from choosing parameter
settings which will cause them problems, and there are certainly far
stupider things they could do then set replication_timeout <
wal_receiver_status_interval.  They could, for example, set fsync=off
or work_mem=4GB or checkpoint_segments=3 (never mind that we ship that
last one out of the box).  Any of those settings have the potential to
thoroughly destroy their system in one way or another, and there's not
a darn thing we can do about it.  Setting up some kind of handshake
system based on a multiple of the wal_receiver_status_interval is
going to be complex, and it's not necessarily going to deliver the
behavior someone wants anyway.  If someone has
wal_receiver_status_interval=10 on one system and =30 on another
system, does it therefore follow that the timeouts should also be
different by 3X?  Perhaps, but it's non-obvious.

There are two things that I think are pretty clear.  If the receiver
has wal_receiver_status_interval=0, then we should ignore
replication_timeout for that connection.  And also we need to make
sure that the replication_timeout can't kill off a connection that is
in the middle of streaming a large base backup.  Maybe we should try
to get those two cases right and not worry about the rest.  Dan, can
you check whether the base backup thing is a problem with this as
implemented?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Replication server timeout patch

From

Daniel Farina

Date:

12 February 2011, 00:51:14

<p>On Feb 11, 2011 8:20 PM, "Robert Haas" <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>
wrote:<br/> ><br /> > On Fri, Feb 11, 2011 at 4:38 PM, Robert Haas <<a
href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>wrote:<br /> > > On Fri, Feb 11, 2011 at 4:30
PM,Heikki Linnakangas<br /> > > <<a
href="mailto:heikki.linnakangas@enterprisedb.com">heikki.linnakangas@enterprisedb.com</a>>wrote:<br /> > >>
On11.02.2011 22:11, Robert Haas wrote:<br /> > >>><br /> > >>> On Fri, Feb 11, 2011 at 2:02 PM,
DanielFarina<<a href="mailto:drfarina@acm.org">drfarina@acm.org</a>>  wrote:<br /> > >>>><br />
>>>>> I split this out of the synchronous replication patch for independent<br /> > >>>>
review.I'm dashing out the door, so I haven't put it on the CF yet or<br /> > >>>> anything, but I just
wantedto get it out there...I'll be around in<br /> > >>>> Not Too Long to finish any other details.<br
/>> >>><br /> > >>> This looks like a useful and separately committable change.<br /> >
>><br/> > >> Hmm, so this patch implements a watchdog, where the master disconnects the<br /> >
>>standby if the heartbeat from the standby stops for more than<br /> > >>
'replication_[server]_timeout'seconds. The standby sends the heartbeat<br /> > >> every
wal_receiver_status_intervalseconds.<br /> > >><br /> > >> It would be nice if the master and standby
couldnegotiate those settings.<br /> > >> As the patch stands, it's easy to have a pathological configuration
where<br/> > >> replication_server_timeout < wal_receiver_status_interval, so that the<br /> > >>
masterrepeatedly disconnects the standby because it doesn't reply in time.<br /> > >> Maybe the standby should
reporthow often it's going to send a heartbeat,<br /> > >> and master should wait for that long + some safety
margin.Or maybe the<br /> > >> master should tell the standby how often it should send the heartbeat?<br />
>><br /> > > I guess the biggest use case for that behavior would be in a case<br /> > > where you
havetwo standbys, one of which doesn't send a heartbeat and<br /> > > the other of which does.  Then you really
can'trely on a single<br /> > > timeout.<br /> > ><br /> > > Maybe we could change the server
parameterto indicate what multiple<br /> > > of wal_receiver_status_interval causes a hangup, and then change
the<br/> > > client to notify the server what value it's using.  But that gets<br /> > > complicated,
becausethe value could be changed while the standby is<br /> > > running.<br /> ><br /> > On reflection I'm
deeplyuncertain this is a good idea.  It's pretty<br /> > hopeless to suppose that we can keep the user from
choosingparameter<br /> > settings which will cause them problems, and there are certainly far<br /> > stupider
thingsthey could do then set replication_timeout <<br /> > wal_receiver_status_interval.  They could, for
example,set fsync=off<br /> > or work_mem=4GB or checkpoint_segments=3 (never mind that we ship that<br /> > last
oneout of the box).  Any of those settings have the potential to<br /> > thoroughly destroy their system in one way
oranother, and there's not<br /> > a darn thing we can do about it.  Setting up some kind of handshake<br /> >
systembased on a multiple of the wal_receiver_status_interval is<br /> > going to be complex, and it's not
necessarilygoing to deliver the<br /> > behavior someone wants anyway.  If someone has<br /> >
wal_receiver_status_interval=10on one system and =30 on another<br /> > system, does it therefore follow that the
timeoutsshould also be<br /> > different by 3X?  Perhaps, but it's non-obvious.<br /> ><br /> > There are two
thingsthat I think are pretty clear.  If the receiver<br /> > has wal_receiver_status_interval=0, then we should
ignore<br/> > replication_timeout for that connection.  And also we need to make<br /> > sure that the
replication_timeoutcan't kill off a connection that is<br /> > in the middle of streaming a large base backup.
 Maybewe should try<br /> > to get those two cases right and not worry about the rest.  Dan, can<br /> > you
checkwhether the base backup thing is a problem with this as<br /> > implemented?<p>Yes, I will have something to
saycome Saturday.<p>--<br /> fdr

Re: Replication server timeout patch

From

Fujii Masao

Date:

14 February 2011, 04:48:50

On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina <daniel@heroku.com> wrote:
> Context diff equivalent attached.

Thanks for the patch!

As I said before, the timeout which this patch provides doesn't work well
when the walsender gets blocked in sending WAL. At first, we would
need to implement a non-blocking write function as an infrastructure
of the replication timeout, I think.
http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Replication server timeout patch

From

Daniel Farina

Date:

14 February 2011, 18:13:59

On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina <daniel@heroku.com> wrote:
>> Context diff equivalent attached.
>
> Thanks for the patch!
>
> As I said before, the timeout which this patch provides doesn't work well
> when the walsender gets blocked in sending WAL. At first, we would
> need to implement a non-blocking write function as an infrastructure
> of the replication timeout, I think.
> http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com

Interesting point...if that's accepted as required-for-commit, what
are the perceptions of the odds that, presuming I can write the code
quickly enough, that there's enough infrastructure/ports already in
postgres to allow for a non-blocking write on all our supported
platforms?

--
fdr

Re: Replication server timeout patch

From

Simon Riggs

Date:

14 February 2011, 21:29:37

On Mon, 2011-02-14 at 14:13 -0800, Daniel Farina wrote:
> On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina <daniel@heroku.com> wrote:
> >> Context diff equivalent attached.
> >
> > Thanks for the patch!
> >
> > As I said before, the timeout which this patch provides doesn't work well
> > when the walsender gets blocked in sending WAL. At first, we would
> > need to implement a non-blocking write function as an infrastructure
> > of the replication timeout, I think.
> > http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com

I wasn't aware that had been raised before. Thanks for noting it again.

I guess that's why you thought "wait forever" was a good idea ;-)

> Interesting point...if that's accepted as required-for-commit, what
> are the perceptions of the odds that, presuming I can write the code
> quickly enough, that there's enough infrastructure/ports already in
> postgres to allow for a non-blocking write on all our supported
> platforms?

I'd like to see what you come up with. I would rate that as important,
though not essential for sync replication.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Replication server timeout patch

From

Robert Haas

Date:

15 February 2011, 13:21:54

On Mon, Feb 14, 2011 at 5:13 PM, Daniel Farina <daniel@heroku.com> wrote:
> On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina <daniel@heroku.com> wrote:
>>> Context diff equivalent attached.
>>
>> Thanks for the patch!
>>
>> As I said before, the timeout which this patch provides doesn't work well
>> when the walsender gets blocked in sending WAL. At first, we would
>> need to implement a non-blocking write function as an infrastructure
>> of the replication timeout, I think.
>> http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com
>
> Interesting point...if that's accepted as required-for-commit, what
> are the perceptions of the odds that, presuming I can write the code
> quickly enough, that there's enough infrastructure/ports already in
> postgres to allow for a non-blocking write on all our supported
> platforms?

Are you working on this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Replication server timeout patch

From

Fujii Masao

Date:

15 February 2011, 22:34:56

On Tue, Feb 15, 2011 at 7:13 AM, Daniel Farina <daniel@heroku.com> wrote:
> On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina <daniel@heroku.com> wrote:
>>> Context diff equivalent attached.
>>
>> Thanks for the patch!
>>
>> As I said before, the timeout which this patch provides doesn't work well
>> when the walsender gets blocked in sending WAL. At first, we would
>> need to implement a non-blocking write function as an infrastructure
>> of the replication timeout, I think.
>> http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com
>
> Interesting point...if that's accepted as required-for-commit, what
> are the perceptions of the odds that, presuming I can write the code
> quickly enough, that there's enough infrastructure/ports already in
> postgres to allow for a non-blocking write on all our supported
> platforms?

I'm not sure if there's already enough infrastructure for a non-blocking
write. But the patch which I submitted before might help to implement that.
http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Replication server timeout patch

From

Simon Riggs

Date:

17 February 2011, 17:21:38

On Wed, 2011-02-16 at 11:34 +0900, Fujii Masao wrote:
> On Tue, Feb 15, 2011 at 7:13 AM, Daniel Farina <daniel@heroku.com> wrote:
> > On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> >> On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina <daniel@heroku.com> wrote:
> >>> Context diff equivalent attached.
> >>
> >> Thanks for the patch!
> >>
> >> As I said before, the timeout which this patch provides doesn't work well
> >> when the walsender gets blocked in sending WAL. At first, we would
> >> need to implement a non-blocking write function as an infrastructure
> >> of the replication timeout, I think.
> >> http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com
> >
> > Interesting point...if that's accepted as required-for-commit, what
> > are the perceptions of the odds that, presuming I can write the code
> > quickly enough, that there's enough infrastructure/ports already in
> > postgres to allow for a non-blocking write on all our supported
> > platforms?
> 
> I'm not sure if there's already enough infrastructure for a non-blocking
> write. But the patch which I submitted before might help to implement that.
> http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com

So, in summary, the position is that we have a timeout, but that timeout
doesn't work in all cases. But it does work in some, so that seems
enough for me to say "let's commit". Not committing gives us nothing at
all, which is as much use as a chocolate teapot.

I will be looking to commit this tomorrow morning, unless I hear some
clear No comments, with reasons.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Replication server timeout patch

From

Robert Haas

Date:

17 February 2011, 17:42:41

On Thu, Feb 17, 2011 at 4:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, 2011-02-16 at 11:34 +0900, Fujii Masao wrote:
>> On Tue, Feb 15, 2011 at 7:13 AM, Daniel Farina <daniel@heroku.com> wrote:
>> > On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> >> On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina <daniel@heroku.com> wrote:
>> >>> Context diff equivalent attached.
>> >>
>> >> Thanks for the patch!
>> >>
>> >> As I said before, the timeout which this patch provides doesn't work well
>> >> when the walsender gets blocked in sending WAL. At first, we would
>> >> need to implement a non-blocking write function as an infrastructure
>> >> of the replication timeout, I think.
>> >> http://archives.postgresql.org/message-id/AANLkTi%3DPu2ne%3DVO-%2BCLMXLQh9y85qumLCbBP15CjnyUS%40mail.gmail.com
>> >
>> > Interesting point...if that's accepted as required-for-commit, what
>> > are the perceptions of the odds that, presuming I can write the code
>> > quickly enough, that there's enough infrastructure/ports already in
>> > postgres to allow for a non-blocking write on all our supported
>> > platforms?
>>
>> I'm not sure if there's already enough infrastructure for a non-blocking
>> write. But the patch which I submitted before might help to implement that.
>> http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com
>
> So, in summary, the position is that we have a timeout, but that timeout
> doesn't work in all cases. But it does work in some, so that seems
> enough for me to say "let's commit". Not committing gives us nothing at
> all, which is as much use as a chocolate teapot.
>
> I will be looking to commit this tomorrow morning, unless I hear some
> clear No comments, with reasons.

I guess the question is whether it works in 10% of cases or 95% of
cases.  In the first case there's probably no point in pretending we
have a feature if it doesn't really work.  In the second case, it
might make sense.  But I don't have a good feeling for which it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Replication server timeout patch

From

Josh Berkus

Date:

17 February 2011, 18:55:26

> So, in summary, the position is that we have a timeout, but that timeout
> doesn't work in all cases. But it does work in some, so that seems
> enough for me to say "let's commit". Not committing gives us nothing at
> all, which is as much use as a chocolate teapot.

Can someone summarize the cases where it does and doesn't work?
There's been a longish gap in this thread.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com

Re: Replication server timeout patch

From

Simon Riggs

Date:

17 February 2011, 18:56:34

On Thu, 2011-02-17 at 16:42 -0500, Robert Haas wrote:
> >
> > So, in summary, the position is that we have a timeout, but that timeout
> > doesn't work in all cases. But it does work in some, so that seems
> > enough for me to say "let's commit". Not committing gives us nothing at
> > all, which is as much use as a chocolate teapot.
> >
> > I will be looking to commit this tomorrow morning, unless I hear some
> > clear No comments, with reasons.
> 
> I guess the question is whether it works in 10% of cases or 95% of
> cases.  In the first case there's probably no point in pretending we
> have a feature if it doesn't really work.  In the second case, it
> might make sense.  But I don't have a good feeling for which it is.

Well, I guess the people that wanted to wait forever may get their wish.

For sync rep, I intend to put in place a client timeout, which we do
have code for. The server side timeout still makes sense, but it's not a
requirement for sync rep.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Replication server timeout patch

From

Fujii Masao

Date:

17 February 2011, 22:10:55

On Fri, Feb 18, 2011 at 7:55 AM, Josh Berkus <josh@agliodbs.com> wrote:
>> So, in summary, the position is that we have a timeout, but that timeout
>> doesn't work in all cases. But it does work in some, so that seems
>> enough for me to say "let's commit". Not committing gives us nothing at
>> all, which is as much use as a chocolate teapot.
>
> Can someone summarize the cases where it does and doesn't work?
> There's been a longish gap in this thread.

The timeout doesn't work when walsender gets blocked during sending the
WAL because the send buffer has been filled up, I'm afraid. IOW, it doesn't
work when the standby becomes unresponsive while WAL is generated on
the master one after another. Since walsender tries to continue sending the
WAL while the standby is unresponsive, the send buffer gets filled up and
the blocking send function (e.g., pq_flush) blocks the walsender.

OTOH, if the standby becomes unresponsive when there is no workload
which causes WAL, the timeout would work.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Replication server timeout patch

From

Robert Haas

Date:

17 February 2011, 23:11:06

On Thu, Feb 17, 2011 at 9:10 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Feb 18, 2011 at 7:55 AM, Josh Berkus <josh@agliodbs.com> wrote:
>>> So, in summary, the position is that we have a timeout, but that timeout
>>> doesn't work in all cases. But it does work in some, so that seems
>>> enough for me to say "let's commit". Not committing gives us nothing at
>>> all, which is as much use as a chocolate teapot.
>>
>> Can someone summarize the cases where it does and doesn't work?
>> There's been a longish gap in this thread.
>
> The timeout doesn't work when walsender gets blocked during sending the
> WAL because the send buffer has been filled up, I'm afraid. IOW, it doesn't
> work when the standby becomes unresponsive while WAL is generated on
> the master one after another. Since walsender tries to continue sending the
> WAL while the standby is unresponsive, the send buffer gets filled up and
> the blocking send function (e.g., pq_flush) blocks the walsender.
>
> OTOH, if the standby becomes unresponsive when there is no workload
> which causes WAL, the timeout would work.

IMHO, that's so broken as to be useless.

I would really like to have a solution to this problem, though.
Relying on TCP keepalives is weak.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Replication server timeout patch

From

Fujii Masao

Date:

26 February 2011, 22:52:56

On Fri, Feb 18, 2011 at 12:10 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> IMHO, that's so broken as to be useless.
>
> I would really like to have a solution to this problem, though.
> Relying on TCP keepalives is weak.

Agreed.

I updated the replication timeout patch which I submitted before.
http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com

Since the patch implements also non-blocking send functions,
the timeout can work properly even when the send buffer has
been filled up.

> There are two things that I think are pretty clear.  If the receiver
> has wal_receiver_status_interval=0, then we should ignore
> replication_timeout for that connection.

The patch still doesn't check that wal_receiver_status_interval
is set up properly. I'll implement that later.

Regards,

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

replication_timeout_v2.patch

Re: Replication server timeout patch

From

Fujii Masao

Date:

28 February 2011, 09:08:23

On Sun, Feb 27, 2011 at 11:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> There are two things that I think are pretty clear.  If the receiver
>> has wal_receiver_status_interval=0, then we should ignore
>> replication_timeout for that connection.
>
> The patch still doesn't check that wal_receiver_status_interval
> is set up properly. I'll implement that later.

Done. I attached the updated patch.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

replication_timeout_v3.patch

Re: Replication server timeout patch

From

Robert Haas

Date:

05 March 2011, 14:23:50

On Mon, Feb 28, 2011 at 8:08 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Sun, Feb 27, 2011 at 11:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> There are two things that I think are pretty clear.  If the receiver
>>> has wal_receiver_status_interval=0, then we should ignore
>>> replication_timeout for that connection.
>>
>> The patch still doesn't check that wal_receiver_status_interval
>> is set up properly. I'll implement that later.
>
> Done. I attached the updated patch.

Why does internal_flush_if_writable compute bufptr differently from
internal_flush?  And shouldn't it be static?

It seems to me that this ought to be refactored so that you don't
duplicate so much code.  Maybe static int internal_flush(bool
nonblocking).

I don't think that the while (bufptr < bufend) loop needs to contain
the code to set and clear the nonblocking state.  You could do the
whole loop with nonblocking mode turned on and then reenable it just
once at the end.  Besides possibly being clearer, that would be more
efficient and leave less room for unexpected failures.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Replication server timeout patch

From

Fujii Masao

Date:

06 March 2011, 04:03:31

On Sun, Mar 6, 2011 at 3:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Feb 28, 2011 at 8:08 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Sun, Feb 27, 2011 at 11:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> There are two things that I think are pretty clear.  If the receiver
>>>> has wal_receiver_status_interval=0, then we should ignore
>>>> replication_timeout for that connection.
>>>
>>> The patch still doesn't check that wal_receiver_status_interval
>>> is set up properly. I'll implement that later.
>>
>> Done. I attached the updated patch.
>
> Why does internal_flush_if_writable compute bufptr differently from
> internal_flush?  And shouldn't it be static?
>
> It seems to me that this ought to be refactored so that you don't
> duplicate so much code.  Maybe static int internal_flush(bool
> nonblocking).
>
> I don't think that the while (bufptr < bufend) loop needs to contain
> the code to set and clear the nonblocking state.  You could do the
> whole loop with nonblocking mode turned on and then reenable it just
> once at the end.  Besides possibly being clearer, that would be more
> efficient and leave less room for unexpected failures.

All these comments seem to make sense. Will fix. Thanks!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Replication server timeout patch

From

Fujii Masao

Date:

06 March 2011, 10:11:05

On Sun, Mar 6, 2011 at 5:03 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Why does internal_flush_if_writable compute bufptr differently from
>> internal_flush?  And shouldn't it be static?
>>
>> It seems to me that this ought to be refactored so that you don't
>> duplicate so much code.  Maybe static int internal_flush(bool
>> nonblocking).
>>
>> I don't think that the while (bufptr < bufend) loop needs to contain
>> the code to set and clear the nonblocking state.  You could do the
>> whole loop with nonblocking mode turned on and then reenable it just
>> once at the end.  Besides possibly being clearer, that would be more
>> efficient and leave less room for unexpected failures.
>
> All these comments seem to make sense. Will fix. Thanks!

Done. I attached the updated patch.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

replication_timeout_v4.patch

Re: Replication server timeout patch

From

Fujii Masao

Date:

07 March 2011, 07:47:12

On Sun, Mar 6, 2011 at 11:10 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Sun, Mar 6, 2011 at 5:03 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> Why does internal_flush_if_writable compute bufptr differently from
>>> internal_flush?  And shouldn't it be static?
>>>
>>> It seems to me that this ought to be refactored so that you don't
>>> duplicate so much code.  Maybe static int internal_flush(bool
>>> nonblocking).
>>>
>>> I don't think that the while (bufptr < bufend) loop needs to contain
>>> the code to set and clear the nonblocking state.  You could do the
>>> whole loop with nonblocking mode turned on and then reenable it just
>>> once at the end.  Besides possibly being clearer, that would be more
>>> efficient and leave less room for unexpected failures.
>>
>> All these comments seem to make sense. Will fix. Thanks!
>
> Done. I attached the updated patch.

I rebased the patch against current git master.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

replication_timeout_v5.patch

Re: Replication server timeout patch

From

Fujii Masao

Date:

11 March 2011, 09:15:04

On Mon, Mar 7, 2011 at 8:47 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Sun, Mar 6, 2011 at 11:10 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Sun, Mar 6, 2011 at 5:03 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> Why does internal_flush_if_writable compute bufptr differently from
>>>> internal_flush?  And shouldn't it be static?
>>>>
>>>> It seems to me that this ought to be refactored so that you don't
>>>> duplicate so much code.  Maybe static int internal_flush(bool
>>>> nonblocking).
>>>>
>>>> I don't think that the while (bufptr < bufend) loop needs to contain
>>>> the code to set and clear the nonblocking state.  You could do the
>>>> whole loop with nonblocking mode turned on and then reenable it just
>>>> once at the end.  Besides possibly being clearer, that would be more
>>>> efficient and leave less room for unexpected failures.
>>>
>>> All these comments seem to make sense. Will fix. Thanks!
>>
>> Done. I attached the updated patch.
>
> I rebased the patch against current git master.

I added this replication timeout patch into next CF.

I explain why this feature is required for the future review;

Without this feature, walsender might unexpectedly remain for a while when
the standby crashes or the network outage happens. TCP keepalive can
improve this situation to　a certain extent, but it's not perfect. Remaining
walsender can cause some problems.

For example, when hot_standby_feedback is enabled, such a remaining
walsender would prevent oldest xmin from advancing and interfere with
vacuuming on the master. For example, when you use synchronous
replication and walsender in SYNC mode gets stuck, any synchronous
standby candidate cannot switch to SYNC mode until that walsender exits,
and all the transactions would pause.

This feature causes walsender to exit when there is no reply from the
standby before the replication timeout expires. Then we can avoid the
above problems.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Replication server timeout patch

From

Robert Haas

Date:

11 March 2011, 09:18:30

On Fri, Mar 11, 2011 at 8:14 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Mar 7, 2011 at 8:47 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Sun, Mar 6, 2011 at 11:10 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Sun, Mar 6, 2011 at 5:03 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>> Why does internal_flush_if_writable compute bufptr differently from
>>>>> internal_flush?  And shouldn't it be static?
>>>>>
>>>>> It seems to me that this ought to be refactored so that you don't
>>>>> duplicate so much code.  Maybe static int internal_flush(bool
>>>>> nonblocking).
>>>>>
>>>>> I don't think that the while (bufptr < bufend) loop needs to contain
>>>>> the code to set and clear the nonblocking state.  You could do the
>>>>> whole loop with nonblocking mode turned on and then reenable it just
>>>>> once at the end.  Besides possibly being clearer, that would be more
>>>>> efficient and leave less room for unexpected failures.
>>>>
>>>> All these comments seem to make sense. Will fix. Thanks!
>>>
>>> Done. I attached the updated patch.
>>
>> I rebased the patch against current git master.
>
> I added this replication timeout patch into next CF.
>
> I explain why this feature is required for the future review;
>
> Without this feature, walsender might unexpectedly remain for a while when
> the standby crashes or the network outage happens. TCP keepalive can
> improve this situation to　a certain extent, but it's not perfect. Remaining
> walsender can cause some problems.
>
> For example, when hot_standby_feedback is enabled, such a remaining
> walsender would prevent oldest xmin from advancing and interfere with
> vacuuming on the master. For example, when you use synchronous
> replication and walsender in SYNC mode gets stuck, any synchronous
> standby candidate cannot switch to SYNC mode until that walsender exits,
> and all the transactions would pause.
>
> This feature causes walsender to exit when there is no reply from the
> standby before the replication timeout expires. Then we can avoid the
> above problems.

I think we should consider making this change for 9.1.  This is a real
wart, and it's going to become even more of a problem with sync rep, I
think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Replication server timeout patch

From

Fujii Masao

Date:

11 March 2011, 09:29:52

On Fri, Mar 11, 2011 at 10:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I added this replication timeout patch into next CF.
>>
>> I explain why this feature is required for the future review;
>>
>> Without this feature, walsender might unexpectedly remain for a while when
>> the standby crashes or the network outage happens. TCP keepalive can
>> improve this situation to　a certain extent, but it's not perfect. Remaining
>> walsender can cause some problems.
>>
>> For example, when hot_standby_feedback is enabled, such a remaining
>> walsender would prevent oldest xmin from advancing and interfere with
>> vacuuming on the master. For example, when you use synchronous
>> replication and walsender in SYNC mode gets stuck, any synchronous
>> standby candidate cannot switch to SYNC mode until that walsender exits,
>> and all the transactions would pause.
>>
>> This feature causes walsender to exit when there is no reply from the
>> standby before the replication timeout expires. Then we can avoid the
>> above problems.
>
> I think we should consider making this change for 9.1.  This is a real
> wart, and it's going to become even more of a problem with sync rep, I
> think.

Yeah, that's a welcome! Please feel free to review the patch.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Replication server timeout patch

From

Bruce Momjian

Date:

11 March 2011, 10:27:57

Fujii Masao wrote:
> On Fri, Mar 11, 2011 at 10:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> I added this replication timeout patch into next CF.
> >>
> >> I explain why this feature is required for the future review;
> >>
> >> Without this feature, walsender might unexpectedly remain for a while when
> >> the standby crashes or the network outage happens. TCP keepalive can
> >> improve this situation to?a certain extent, but it's not perfect. Remaining
> >> walsender can cause some problems.
> >>
> >> For example, when hot_standby_feedback is enabled, such a remaining
> >> walsender would prevent oldest xmin from advancing and interfere with
> >> vacuuming on the master. For example, when you use synchronous
> >> replication and walsender in SYNC mode gets stuck, any synchronous
> >> standby candidate cannot switch to SYNC mode until that walsender exits,
> >> and all the transactions would pause.
> >>
> >> This feature causes walsender to exit when there is no reply from the
> >> standby before the replication timeout expires. Then we can avoid the
> >> above problems.
> >
> > I think we should consider making this change for 9.1. ?This is a real
> > wart, and it's going to become even more of a problem with sync rep, I
> > think.
> 
> Yeah, that's a welcome! Please feel free to review the patch.

It is already in the next commitfest, so if someone wants to add it as
an open 9.1 item, go ahead.  I am unclear of this so I am not adding it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: Replication server timeout patch

From

Robert Haas

Date:

11 March 2011, 15:34:34

On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> I think we should consider making this change for 9.1.  This is a real
>> wart, and it's going to become even more of a problem with sync rep, I
>> think.
>
> Yeah, that's a welcome! Please feel free to review the patch.

I discussed this with Heikki on IM.

I think we should rip all the GUC change stuff out of this patch and
just decree that if you set a timeout, you get a timeout.  If you set
this inconsistently with wal_receiver_status_interval, then you'll get
lots of disconnects.  But that's your problem.  This may seem a little
unfriendly, but the logic in here is quite complex and still isn't
going to really provide that much protection against bad
configurations.  The only realistic alternative I see is to define
replication_timeout as a multiple of the client's
wal_receiver_status_interval, but that seems quite annoyingly
unfriendly.  A single replication_timeout that applies to all slaves
doesn't cover every configuration someone might want, but it's simple
and easy to understand and should cover 95% of cases.  If we find that
it's really necessary to be able to customize it further, then we
might go the route of adding the much-discussed standby registration
stuff, where there's a separate config file or system table where you
can stipulate that when a walsender with application_name=foo
connects, you want it to get wal_receiver_status_interval=$FOO.  But I
think that complexity can certainly wait until 9.2 or later.

I also think that the default for replication_timeout should not be 0.Something like 60s seems about right.  That way,
ifyou just use the 
default settings, you'll get pretty sane behavior - a connectivity
hiccup that lasts more than a minute will bounce the client.  We've
already gotten reports of people who thought they were replicating
when they really weren't, and had to fiddle with settings and struggle
to try to make it robust.  This should make things a lot nicer for
people out of the box, but it won't if it's disabled out of the box.

On another note, there doesn't appear to be any need to change the
return value of WaitLatchOrSocket().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Replication server timeout patch

From

Fujii Masao

Date:

16 March 2011, 04:49:38

On Sat, Mar 12, 2011 at 4:34 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> I think we should consider making this change for 9.1.  This is a real
>>> wart, and it's going to become even more of a problem with sync rep, I
>>> think.
>>
>> Yeah, that's a welcome! Please feel free to review the patch.
>
> I discussed this with Heikki on IM.
>
> I think we should rip all the GUC change stuff out of this patch and
> just decree that if you set a timeout, you get a timeout.  If you set
> this inconsistently with wal_receiver_status_interval, then you'll get
> lots of disconnects.  But that's your problem.  This may seem a little
> unfriendly, but the logic in here is quite complex and still isn't
> going to really provide that much protection against bad
> configurations.  The only realistic alternative I see is to define
> replication_timeout as a multiple of the client's
> wal_receiver_status_interval, but that seems quite annoyingly
> unfriendly.  A single replication_timeout that applies to all slaves
> doesn't cover every configuration someone might want, but it's simple
> and easy to understand and should cover 95% of cases.  If we find that
> it's really necessary to be able to customize it further, then we
> might go the route of adding the much-discussed standby registration
> stuff, where there's a separate config file or system table where you
> can stipulate that when a walsender with application_name=foo
> connects, you want it to get wal_receiver_status_interval=$FOO.  But I
> think that complexity can certainly wait until 9.2 or later.
>
> I also think that the default for replication_timeout should not be 0.
>  Something like 60s seems about right.  That way, if you just use the
> default settings, you'll get pretty sane behavior - a connectivity
> hiccup that lasts more than a minute will bounce the client.  We've
> already gotten reports of people who thought they were replicating
> when they really weren't, and had to fiddle with settings and struggle
> to try to make it robust.  This should make things a lot nicer for
> people out of the box, but it won't if it's disabled out of the box.
>
> On another note, there doesn't appear to be any need to change the
> return value of WaitLatchOrSocket().

Agreed. I'll change the patch.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Replication server timeout patch

From

Fujii Masao

Date:

16 March 2011, 06:12:01

On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Agreed. I'll change the patch.

Done. I attached the updated patch.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

replication_timeout_v6.patch

Re: Replication server timeout patch

From

Heikki Linnakangas

Date:

23 March 2011, 07:33:29

On 16.03.2011 11:11, Fujii Masao wrote:
> On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>> Agreed. I'll change the patch.
>
> Done. I attached the updated patch.

I don't much like the API for this. Walsender shouldn't need to know
about the details of the FE/BE protocol, pq_putbytes_if_available()
seems too low level to be useful.

I think a better API would be to have a non-blocking version of
pq_putmessage(). We can make the output buffer in pqcomm.c resizeable,
so that when the message doesn't fit in the output buffer in
pq_putmessage(), the buffer is enlarged instead of trying to flush it.

Attached is a patch using that approach. This is a much smaller patch,
and easier to understand. I'm not totally happy with the walsender main
loop, it seems to work as it is, but the logic has become quite
complicated. Ideas welcome on how to simplify that.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

replication_timeout_v7.patch

Re: Replication server timeout patch

From

Fujii Masao

Date:

24 March 2011, 10:25:19

On Wed, Mar 23, 2011 at 7:33 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> I don't much like the API for this. Walsender shouldn't need to know about
> the details of the FE/BE protocol, pq_putbytes_if_available() seems too low
> level to be useful.
>
> I think a better API would be to have a non-blocking version of
> pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, so
> that when the message doesn't fit in the output buffer in pq_putmessage(),
> the buffer is enlarged instead of trying to flush it.
>
> Attached is a patch using that approach. This is a much smaller patch, and
> easier to understand.

Agreed. Thanks for improving the patch.

pq_flush_if_writable() calls internal_flush() without using PG_TRY block.
This seems unsafe because for example pgwin32_waitforsinglesocket()
called by secure_write() can throw ERROR.

> I'm not totally happy with the walsender main loop, it
> seems to work as it is, but the logic has become quite complicated. Ideas
> welcome on how to simplify that.

As the patch I proposed before did, how about leaving XLogSend() instead
of WalSndLoop() to call pq_flush_if_writable() when there is pending data
in output buffer?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Replication server timeout patch

From

Robert Haas

Date:

25 March 2011, 22:42:09

On Wed, Mar 23, 2011 at 6:33 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 16.03.2011 11:11, Fujii Masao wrote:
>>
>> On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masao<masao.fujii@gmail.com>
>>  wrote:
>>>
>>> Agreed. I'll change the patch.
>>
>> Done. I attached the updated patch.
>
> I don't much like the API for this. Walsender shouldn't need to know about
> the details of the FE/BE protocol, pq_putbytes_if_available() seems too low
> level to be useful.
>
> I think a better API would be to have a non-blocking version of
> pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, so
> that when the message doesn't fit in the output buffer in pq_putmessage(),
> the buffer is enlarged instead of trying to flush it.
>
> Attached is a patch using that approach. This is a much smaller patch, and
> easier to understand. I'm not totally happy with the walsender main loop, it
> seems to work as it is, but the logic has become quite complicated. Ideas
> welcome on how to simplify that.

Heikki, are you planning to commit this, either with or without
further revisions?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Replication server timeout patch

From

Heikki Linnakangas

Date:

28 March 2011, 07:49:41

On 24.03.2011 15:24, Fujii Masao wrote:
> On Wed, Mar 23, 2011 at 7:33 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> I don't much like the API for this. Walsender shouldn't need to know about
>> the details of the FE/BE protocol, pq_putbytes_if_available() seems too low
>> level to be useful.
>>
>> I think a better API would be to have a non-blocking version of
>> pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, so
>> that when the message doesn't fit in the output buffer in pq_putmessage(),
>> the buffer is enlarged instead of trying to flush it.
>>
>> Attached is a patch using that approach. This is a much smaller patch, and
>> easier to understand.
>
> Agreed. Thanks for improving the patch.
>
> pq_flush_if_writable() calls internal_flush() without using PG_TRY block.
> This seems unsafe because for example pgwin32_waitforsinglesocket()
> called by secure_write() can throw ERROR.

Perhaps it's time to give up on the assumption that the socket is in
blocking mode except within those two functions. Attached patch adds the
pq_set_nonblocking() function from your patch, and adds calls to it
before all secure_read/write operations to put the socket in the right
mode. There's only a few of those operations.

Should we use COMMERROR instead of ERROR if we fail to put the socket in
the right mode?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

replication_timeout_v8.patch

Re: Replication server timeout patch

From

Fujii Masao

Date:

29 March 2011, 01:56:10

On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>> pq_flush_if_writable() calls internal_flush() without using PG_TRY block.
>> This seems unsafe because for example pgwin32_waitforsinglesocket()
>> called by secure_write() can throw ERROR.
>
> Perhaps it's time to give up on the assumption that the socket is in
> blocking mode except within those two functions. Attached patch adds the
> pq_set_nonblocking() function from your patch, and adds calls to it before
> all secure_read/write operations to put the socket in the right mode.
> There's only a few of those operations.

Sounds good.

+        pq_set_nonblocking(false); /* XXX: Is this required? */

No. Since secure_close and close_SSL don't use MyProcPort->sock and
MyProcPort->noblock which can be changed in pq_set_nonblocking,
I don't think that is required.

+    pq_putmessage_noblock('d', msgbuf, 1 + sizeof(WalDataMessageHeader) + nbytes);

Don't we need to check the return value of pq_putmessage_noblock? That
can return EOF when trouble happens (for example the send system call fails).

> Should we use COMMERROR instead of ERROR if we fail to put the socket in the
> right mode?

Maybe.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Replication server timeout patch

From

Tom Lane

Date:

29 March 2011, 10:25:16

Fujii Masao <masao.fujii@gmail.com> writes:
> On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas
>> Should we use COMMERROR instead of ERROR if we fail to put the socket in the
>> right mode?

> Maybe.

COMMERROR exists to keep us from trying to send an error report down a
failed socket.  I would assume (perhaps wrongly) that
walsender/walreceiver don't try to push error reports across the socket
anyway, only to the postmaster log.  If correct, there is no need for
COMMERROR, and using it just muddies the code.
        regards, tom lane

Re: Replication server timeout patch

From

Robert Haas

Date:

29 March 2011, 13:04:51

On Tue, Mar 29, 2011 at 9:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Fujii Masao <masao.fujii@gmail.com> writes:
>> On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas
>>> Should we use COMMERROR instead of ERROR if we fail to put the socket in the
>>> right mode?
>
>> Maybe.
>
> COMMERROR exists to keep us from trying to send an error report down a
> failed socket.  I would assume (perhaps wrongly) that
> walsender/walreceiver don't try to push error reports across the socket
> anyway, only to the postmaster log.  If correct, there is no need for
> COMMERROR, and using it just muddies the code.

I don't think that's how it works.  The error the server sends is
copied into some of the messages in the client log, which is really
useful for debugging.

ERROR: can't connect to the server (server said: you're not authorized)

...or something like that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Replication server timeout patch

From

Fujii Masao

Date:

29 March 2011, 22:31:04

On Wed, Mar 30, 2011 at 1:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> COMMERROR exists to keep us from trying to send an error report down a
>> failed socket.  I would assume (perhaps wrongly) that
>> walsender/walreceiver don't try to push error reports across the socket
>> anyway, only to the postmaster log.  If correct, there is no need for
>> COMMERROR, and using it just muddies the code.
>
> I don't think that's how it works.  The error the server sends is
> copied into some of the messages in the client log, which is really
> useful for debugging.
>
> ERROR: can't connect to the server (server said: you're not authorized)
>
> ...or something like that.

Yes. Walsender sends its error message to walreceiver, and walreceiver
writes it down to the server log. For example;

FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
segment 000000010000000000000016 has already been removed

The second FATAL message is sent from walsender.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Replication server timeout patch

From

Heikki Linnakangas

Date:

30 March 2011, 04:25:09

On 29.03.2011 07:55, Fujii Masao wrote:
> On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>>> pq_flush_if_writable() calls internal_flush() without using PG_TRY block.
>>> This seems unsafe because for example pgwin32_waitforsinglesocket()
>>> called by secure_write() can throw ERROR.
>>
>> Perhaps it's time to give up on the assumption that the socket is in
>> blocking mode except within those two functions. Attached patch adds the
>> pq_set_nonblocking() function from your patch, and adds calls to it before
>> all secure_read/write operations to put the socket in the right mode.
>> There's only a few of those operations.
>
> Sounds good.
>
> +        pq_set_nonblocking(false); /* XXX: Is this required? */
>
> No. Since secure_close and close_SSL don't use MyProcPort->sock and
> MyProcPort->noblock which can be changed in pq_set_nonblocking,
> I don't think that is required.

Ok, I took that out.

> +    pq_putmessage_noblock('d', msgbuf, 1 + sizeof(WalDataMessageHeader) + nbytes);
>
> Don't we need to check the return value of pq_putmessage_noblock? That
> can return EOF when trouble happens (for example the send system call fails).

No, pq_putmessage_noblock doesn't call send() because it enlarges the 
buffer to make sure the message fits, and it doesn't anything else that 
could fail else. I changed its return type to void, and added an 
Assert() to check that the pq_putmessage() call it does internally 
indeed doesn't fail.

>> Should we use COMMERROR instead of ERROR if we fail to put the socket in the
>> right mode?
>
> Maybe.

I made it COMMERROR. ERRORs are sent to the client, and you could get 
into infinite recursion if sending the ERROR requires setting the 
blocking mode again.

Committed with those changes. I also reworded the docs a bit.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Replication server timeout patch

From

Fujii Masao

Date:

30 March 2011, 04:58:49

On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>> +       pq_putmessage_noblock('d', msgbuf, 1 +
>> sizeof(WalDataMessageHeader) + nbytes);
>>
>> Don't we need to check the return value of pq_putmessage_noblock? That
>> can return EOF when trouble happens (for example the send system call
>> fails).
>
> No, pq_putmessage_noblock doesn't call send() because it enlarges the buffer
> to make sure the message fits, and it doesn't anything else that could fail
> else. I changed its return type to void, and added an Assert() to check that
> the pq_putmessage() call it does internally indeed doesn't fail.

Oh, you're right.

> Committed with those changes. I also reworded the docs a bit.

Thanks a lot!

+        A value of zero means wait forever.  This parameter can only be set in

The first sentence sounds misleading. Even if you set the parameter to zero,
replication connections can be terminated because of keepalive or socket error.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Replication server timeout patch

From

Heikki Linnakangas

Date:

30 March 2011, 05:03:48

On 30.03.2011 10:58, Fujii Masao wrote:
> On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
> +        A value of zero means wait forever.  This parameter can only be set in
>
> The first sentence sounds misleading. Even if you set the parameter to zero,
> replication connections can be terminated because of keepalive or socket error.

Hmm, should I change it back to "A value of zero disables the timeout" ? 
Any better suggestions?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Replication server timeout patch

From

Fujii Masao

Date:

30 March 2011, 05:08:32

On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 30.03.2011 10:58, Fujii Masao wrote:
>>
>> On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com>  wrote:
>> +        A value of zero means wait forever.  This parameter can only be
>> set in
>>
>> The first sentence sounds misleading. Even if you set the parameter to
>> zero,
>> replication connections can be terminated because of keepalive or socket
>> error.
>
> Hmm, should I change it back to "A value of zero disables the timeout" ? Any
> better suggestions?

I like that. But I appreciate if anyone suggests the better.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Replication server timeout patch

From

Robert Haas

Date:

30 March 2011, 10:54:16

On Wed, Mar 30, 2011 at 4:08 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> On 30.03.2011 10:58, Fujii Masao wrote:
>>>
>>> On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas
>>> <heikki.linnakangas@enterprisedb.com>  wrote:
>>> +        A value of zero means wait forever.  This parameter can only be
>>> set in
>>>
>>> The first sentence sounds misleading. Even if you set the parameter to
>>> zero,
>>> replication connections can be terminated because of keepalive or socket
>>> error.
>>
>> Hmm, should I change it back to "A value of zero disables the timeout" ? Any
>> better suggestions?
>
> I like that. But I appreciate if anyone suggests the better.

Maybe sticking the word "mechanism" in there would be a bit better.
"A value of zero disables the timeout mechanism"?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Replication server timeout patch

From

Fujii Masao

Date:

30 March 2011, 23:47:02

On Wed, Mar 30, 2011 at 10:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 30, 2011 at 4:08 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> On 30.03.2011 10:58, Fujii Masao wrote:
>>>>
>>>> On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas
>>>> <heikki.linnakangas@enterprisedb.com>  wrote:
>>>> +        A value of zero means wait forever.  This parameter can only be
>>>> set in
>>>>
>>>> The first sentence sounds misleading. Even if you set the parameter to
>>>> zero,
>>>> replication connections can be terminated because of keepalive or socket
>>>> error.
>>>
>>> Hmm, should I change it back to "A value of zero disables the timeout" ? Any
>>> better suggestions?
>>
>> I like that. But I appreciate if anyone suggests the better.
>
> Maybe sticking the word "mechanism" in there would be a bit better.
> "A value of zero disables the timeout mechanism"?

I'm OK with that. Or, what about "A value of zero turns this off" which is
used in statement_timeout for the sake of consistency?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Replication server timeout patch

From

Heikki Linnakangas

Date:

31 March 2011, 04:21:00

On 31.03.2011 05:46, Fujii Masao wrote:
> On Wed, Mar 30, 2011 at 10:54 PM, Robert Haas<robertmhaas@gmail.com>  wrote:
>> On Wed, Mar 30, 2011 at 4:08 AM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>>> On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas
>>> <heikki.linnakangas@enterprisedb.com>  wrote:
>>>> On 30.03.2011 10:58, Fujii Masao wrote:
>>>>>
>>>>> On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas
>>>>> <heikki.linnakangas@enterprisedb.com>    wrote:
>>>>> +        A value of zero means wait forever.  This parameter can only be
>>>>> set in
>>>>>
>>>>> The first sentence sounds misleading. Even if you set the parameter to
>>>>> zero,
>>>>> replication connections can be terminated because of keepalive or socket
>>>>> error.
>>>>
>>>> Hmm, should I change it back to "A value of zero disables the timeout" ? Any
>>>> better suggestions?
>>>
>>> I like that. But I appreciate if anyone suggests the better.
>>
>> Maybe sticking the word "mechanism" in there would be a bit better.
>> "A value of zero disables the timeout mechanism"?
>
> I'm OK with that. Or, what about "A value of zero turns this off" which is
> used in statement_timeout for the sake of consistency?

Committed Robert's suggestion.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com