Thread: walsender doesn't send keepalives when writes are pending

walsender doesn't send keepalives when writes are pending

From
Andres Freund
Date:
Hi,

In WalSndLoop() we do:

    wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT |
        WL_SOCKET_READABLE;

    if (pq_is_send_pending())
        wakeEvents |= WL_SOCKET_WRITEABLE;
    else if (wal_sender_timeout > 0 && !ping_sent)
    {
...
        if (GetCurrentTimestamp() >= timeout)
            WalSndKeepalive(true);
...

I think those two if's should simply be separate. There's no reason not
to ask for a ping when we're writing. On a busy server that might be the
case most of the time.
The if (pq_is_send_pending()) should also be *after* sending the
keepalive, after all, it might not immediately have been flushed as
unlikely as that is.

The attached patch is unsurprisingly simple.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: walsender doesn't send keepalives when writes are pending

From
Greg Stark
Date:
On Fri, Feb 14, 2014 at 12:05 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> There's no reason not
> to ask for a ping when we're writing.


Is there a reason to ask for a ping? The point of keepalives is to
ensure there's some traffic on idle connections so that if the
connection is dead it doesn't linger forever and so that any on-demand
links (or more recently NAT routers or stateful firewalls) don't time
out and disconnect and have to reconnect (or more recently just fail
outright).

By analogy TCP doesn't send any keepalives if there is any traffic on
the link. On the other hand I happen to know (the hard way) that
typical PPPoE routers *do* send LCP pings even when the link is busy
so there's precedent for either behaviour. I'm guess it comes down to
why you want the keepalives.


-- 
greg



Re: walsender doesn't send keepalives when writes are pending

From
Andres Freund
Date:
On 2014-02-14 12:55:06 +0000, Greg Stark wrote:
> On Fri, Feb 14, 2014 at 12:05 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > There's no reason not
> > to ask for a ping when we're writing.

> Is there a reason to ask for a ping? The point of keepalives is to
> ensure there's some traffic on idle connections so that if the
> connection is dead it doesn't linger forever and so that any on-demand
> links (or more recently NAT routers or stateful firewalls) don't time
> out and disconnect and have to reconnect (or more recently just fail
> outright).

This ain't TCP keepalives. The reason is that we want to kill walsenders
if they haven't responded to a ping inside wal_sender_timeout. That's
rather important e.g. for sychronous replication, so we can quickly fall
over to the next standby. In such scenarios you'll usually want a
timeout *far* below anything TCP provides.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: walsender doesn't send keepalives when writes are pending

From
Andres Freund
Date:
On 2014-02-14 13:58:59 +0100, Andres Freund wrote:
> On 2014-02-14 12:55:06 +0000, Greg Stark wrote:
> > On Fri, Feb 14, 2014 at 12:05 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > > There's no reason not
> > > to ask for a ping when we're writing.
> 
> > Is there a reason to ask for a ping? The point of keepalives is to
> > ensure there's some traffic on idle connections so that if the
> > connection is dead it doesn't linger forever and so that any on-demand
> > links (or more recently NAT routers or stateful firewalls) don't time
> > out and disconnect and have to reconnect (or more recently just fail
> > outright).
> 
> This ain't TCP keepalives. The reason is that we want to kill walsenders
> if they haven't responded to a ping inside wal_sender_timeout. That's
> rather important e.g. for sychronous replication, so we can quickly fall
> over to the next standby. In such scenarios you'll usually want a
> timeout *far* below anything TCP provides.

walreceiver sends pings everytime it receives a 'w' message, so it's
probably not an issue there, but pg_receivexlog/basebackup don't; they
use their own configured intervarl. So this might be an explanation of
the latter two being disconnected too early. I've seen reports of
that...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: walsender doesn't send keepalives when writes are pending

From
Amit Kapila
Date:
On Fri, Feb 14, 2014 at 5:35 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hi,
>
> In WalSndLoop() we do:
>
>     wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT |
>         WL_SOCKET_READABLE;
>
>     if (pq_is_send_pending())
>         wakeEvents |= WL_SOCKET_WRITEABLE;
>     else if (wal_sender_timeout > 0 && !ping_sent)
>     {
> ...
>         if (GetCurrentTimestamp() >= timeout)
>             WalSndKeepalive(true);
> ...
>
> I think those two if's should simply be separate. There's no reason not
> to ask for a ping when we're writing. On a busy server that might be the
> case most of the time.

I think the main reason of ping is to detect n/w break sooner.
On a busy server, wouldn't WALSender can detect it when next time it
will try to send the remaining data?

Each time in below loop, it sleeps for some time and then will again
try to send data and at that time it can detect n/w failure.
if ((caughtup && !streamingDoneSending) || pq_is_send_pending())
{
..

if (wal_sender_timeout > 0)
{
..
sleeptime = 1 + (wal_sender_timeout / 10);
}
..
WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents, MyProcPort->sock, sleeptime);
}

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: walsender doesn't send keepalives when writes are pending

From
Andres Freund
Date:
On 2014-02-21 10:08:44 +0530, Amit Kapila wrote:
> On Fri, Feb 14, 2014 at 5:35 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > Hi,
> >
> > In WalSndLoop() we do:
> >
> >     wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT |
> >         WL_SOCKET_READABLE;
> >
> >     if (pq_is_send_pending())
> >         wakeEvents |= WL_SOCKET_WRITEABLE;
> >     else if (wal_sender_timeout > 0 && !ping_sent)
> >     {
> > ...
> >         if (GetCurrentTimestamp() >= timeout)
> >             WalSndKeepalive(true);
> > ...
> >
> > I think those two if's should simply be separate. There's no reason not
> > to ask for a ping when we're writing. On a busy server that might be the
> > case most of the time.
> 
> I think the main reason of ping is to detect n/w break sooner.
> On a busy server, wouldn't WALSender can detect it when next time it
> will try to send the remaining data?

Well, especially on a pipelined connection, that can take a fair
bit. TCP timeouts aren't fun. There's a reason we have the keepalives,
and that they measure application to application performance. And
detecting systems as down is important for e.g. synchronous rep.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: walsender doesn't send keepalives when writes are pending

From
Amit Kapila
Date:
On Fri, Feb 21, 2014 at 2:36 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-02-21 10:08:44 +0530, Amit Kapila wrote:
>>
>> I think the main reason of ping is to detect n/w break sooner.
>> On a busy server, wouldn't WALSender can detect it when next time it
>> will try to send the remaining data?
>
> Well, especially on a pipelined connection, that can take a fair
> bit. TCP timeouts aren't fun.

Okay, but the behaviour should be same for both keepalive message
and wal data it needs to send. What I mean to say here is that if n/w
is down, wal sender will detect it early by sending some data (either
keepalive or wal data). Also the ping is sent only after
wal_sender_timeout/2 w.r.t last reply time and wal data will be
sent after each sleeptime (1 + wal_sender_timeout/10) which
I think is should be lesser than the time to send ping.


> There's a reason we have the keepalives,
> and that they measure application to application performance.

Do you mean to say the info about receiver (uphill what it has flushed..)?
If yes, then won't we get this information in other reply message or
sending keepalive will achieve any other purpose (may be it can get
info more quickly)?

> And
> detecting systems as down is important for e.g. synchronous rep.

If you are right, then I am missing some point which is how on a
busy server sending keepalive can detect n/w break more quickly
then current way.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: walsender doesn't send keepalives when writes are pending

From
Andres Freund
Date:
On 2014-02-21 19:03:29 +0530, Amit Kapila wrote:
> On Fri, Feb 21, 2014 at 2:36 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-02-21 10:08:44 +0530, Amit Kapila wrote:
> >>
> >> I think the main reason of ping is to detect n/w break sooner.
> >> On a busy server, wouldn't WALSender can detect it when next time it
> >> will try to send the remaining data?
> >
> > Well, especially on a pipelined connection, that can take a fair
> > bit. TCP timeouts aren't fun.
> 
> Okay, but the behaviour should be same for both keepalive message
> and wal data it needs to send. What I mean to say here is that if n/w
> is down, wal sender will detect it early by sending some data (either
> keepalive or wal data). Also the ping is sent only after
> wal_sender_timeout/2 w.r.t last reply time and wal data will be
> sent after each sleeptime (1 + wal_sender_timeout/10) which
> I think is should be lesser than the time to send ping.

The danger is rather that *no* keepalive is sent (with requestReply =
true triggering a reply by the client) by the walsender. Try to run
pg_receivexlog against a busy server with a low walsender timeout. Since
we'll never get to sending a keepalive we'll not trigger a reply by the
receiver. Now

> > There's a reason we have the keepalives,
> > and that they measure application to application performance.
> 
> Do you mean to say the info about receiver (uphill what it has
> flushed..)?

The important bit is updating walsender.c's last_reply_timestamp so we
know that the standby has updated and we won't kill it.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: walsender doesn't send keepalives when writes are pending

From
Amit Kapila
Date:
On Fri, Feb 21, 2014 at 7:10 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-02-21 19:03:29 +0530, Amit Kapila wrote:
>> On Fri, Feb 21, 2014 at 2:36 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > Well, especially on a pipelined connection, that can take a fair
>> > bit. TCP timeouts aren't fun.
>>
>> Okay, but the behaviour should be same for both keepalive message
>> and wal data it needs to send. What I mean to say here is that if n/w
>> is down, wal sender will detect it early by sending some data (either
>> keepalive or wal data). Also the ping is sent only after
>> wal_sender_timeout/2 w.r.t last reply time and wal data will be
>> sent after each sleeptime (1 + wal_sender_timeout/10) which
>> I think is should be lesser than the time to send ping.
>
> The danger is rather that *no* keepalive is sent (with requestReply =
> true triggering a reply by the client) by the walsender. Try to run
> pg_receivexlog against a busy server with a low walsender timeout. Since
> we'll never get to sending a keepalive we'll not trigger a reply by the
> receiver. Now

Looking at code of pg_receivexlog in function HandleCopyStream(),
it seems that it can only happen if user has not configured
--status-interval in pg_receivexlog. Code referred is as below:

HandleCopyStream()
{
..
/*
* Potentially send a status message to the master
*/
now = localGetCurrentTimestamp();
if (still_sending && standby_message_timeout > 0 &&
localTimestampDifferenceExceeds(last_status, now,
standby_message_timeout))
{
/* Time to send feedback! */
if (!sendFeedback(conn, blockpos, now, false))
goto error;
last_status = now;
}


Even if this is not happening due to some reason, shouldn't this be
anyway the responsibility of pg_receivexlog to send the status from time
to time rather than sending when server asked for it?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: walsender doesn't send keepalives when writes are pending

From
Andres Freund
Date:
On 2014-02-22 09:08:39 +0530, Amit Kapila wrote:
> > The danger is rather that *no* keepalive is sent (with requestReply =
> > true triggering a reply by the client) by the walsender. Try to run
> > pg_receivexlog against a busy server with a low walsender timeout. Since
> > we'll never get to sending a keepalive we'll not trigger a reply by the
> > receiver. Now
> 
> Looking at code of pg_receivexlog in function HandleCopyStream(),
> it seems that it can only happen if user has not configured
> --status-interval in pg_receivexlog. Code referred is as below:

The interval interval is configured independently from the primary and
pg_receivexlog doesn't tune it automatically to the one configured for
the walsender.

> Even if this is not happening due to some reason, shouldn't this be
> anyway the responsibility of pg_receivexlog to send the status from time
> to time rather than sending when server asked for it?

It does. At it's own interval. I don't see what's to discuss here,
sorry. There's really barely any cost to doing the keepalive correctly,
otherwise it'd be problematic in the half dozen cases where *we* do send
it. The keepalive mechanism doesn't work in one edgecase. So, let's fix
it, and not discuss why we think the entire mechanism might be useless.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: walsender doesn't send keepalives when writes are pending

From
Robert Haas
Date:
On Fri, Feb 14, 2014 at 7:05 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> In WalSndLoop() we do:
>
>     wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT |
>         WL_SOCKET_READABLE;
>
>     if (pq_is_send_pending())
>         wakeEvents |= WL_SOCKET_WRITEABLE;
>     else if (wal_sender_timeout > 0 && !ping_sent)
>     {
> ...
>         if (GetCurrentTimestamp() >= timeout)
>             WalSndKeepalive(true);
> ...
>
> I think those two if's should simply be separate. There's no reason not
> to ask for a ping when we're writing. On a busy server that might be the
> case most of the time.
> The if (pq_is_send_pending()) should also be *after* sending the
> keepalive, after all, it might not immediately have been flushed as
> unlikely as that is.ackers

I spent an inordinate amount of time looking at this patch.  I'm not
sure your proposed change will accomplish the desired effect.  The
thing is that the code you quote is within an if-block gated by
(caughtup && !streamingDoneSending) || pq_is_send_pending().
Currently, therefore, the behavior is that we wait on the latch
*either when* we're caught up (and thus need to wait for more WAL) or
when the outbound queue is full (and thus we need to wait for it to
drain), but we send a keep-alive only in the former case (namely,
we're caught up).

Your proposed change would cause us to send keep-alives when we're
caught up, or when we're not caught up but the write queue happens to
be full.  That doesn't make much sense.  There might be a reason to
start sending keep-alives when we're not caught up, but even if we
want that behavior change there's no reason to do it only when the
write queue is full.

At least, that's how it looks to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: walsender doesn't send keepalives when writes are pending

From
Andres Freund
Date:
On 2014-02-25 10:45:55 -0500, Robert Haas wrote:
> On Fri, Feb 14, 2014 at 7:05 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > In WalSndLoop() we do:
> >
> >     wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT |
> >         WL_SOCKET_READABLE;
> >
> >     if (pq_is_send_pending())
> >         wakeEvents |= WL_SOCKET_WRITEABLE;
> >     else if (wal_sender_timeout > 0 && !ping_sent)
> >     {
> > ...
> >         if (GetCurrentTimestamp() >= timeout)
> >             WalSndKeepalive(true);
> > ...
> >
> > I think those two if's should simply be separate. There's no reason not
> > to ask for a ping when we're writing. On a busy server that might be the
> > case most of the time.
> > The if (pq_is_send_pending()) should also be *after* sending the
> > keepalive, after all, it might not immediately have been flushed as
> > unlikely as that is.ackers
> 
> I spent an inordinate amount of time looking at this patch.  I'm not
> sure your proposed change will accomplish the desired effect.  The
> thing is that the code you quote is within an if-block gated by
> (caughtup && !streamingDoneSending) || pq_is_send_pending().
> Currently, therefore, the behavior is that we wait on the latch
> *either when* we're caught up (and thus need to wait for more WAL) or
> when the outbound queue is full (and thus we need to wait for it to
> drain), but we send a keep-alive only in the former case (namely,
> we're caught up).
>
> Your proposed change would cause us to send keep-alives when we're
> caught up, or when we're not caught up but the write queue happens to
> be full.  That doesn't make much sense.  There might be a reason to
> start sending keep-alives when we're not caught up, but even if we
> want that behavior change there's no reason to do it only when the
> write queue is full.

I am not sure I can follow. Why doesn't it make sense to send out the
keepalive (with replyRequested = true) when we're busy sending stuff
(which will be the case most of the time on a busy server)?

The point is that clients like pg_receivexlog and pg_basebackup don't
send an keepalive response all the time they receive something (which is
perfectly fine), but just when their internal timer says it's time to do
so, or if the walsender asks them to do so. So, if the server has a
*lower* timeout than configured for pg_receivexlog and the server is a
busy one, you'll get timeouts relatively often. This is a pretty
frequent situation in synchronous replication setups because the default
timeout is *way* too high for that.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: walsender doesn't send keepalives when writes are pending

From
Robert Haas
Date:
On Tue, Feb 25, 2014 at 10:54 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> I am not sure I can follow. Why doesn't it make sense to send out the
> keepalive (with replyRequested = true) when we're busy sending stuff
> (which will be the case most of the time on a busy server)?

It may very well make sense, but your patch won't generally have that
effect, because with the patch you proposed, the keep-alive can only
be sent when the server is busy if the write queue is also full.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: walsender doesn't send keepalives when writes are pending

From
Andres Freund
Date:
On 2014-02-25 11:15:46 -0500, Robert Haas wrote:
> On Tue, Feb 25, 2014 at 10:54 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > I am not sure I can follow. Why doesn't it make sense to send out the
> > keepalive (with replyRequested = true) when we're busy sending stuff
> > (which will be the case most of the time on a busy server)?
> 
> It may very well make sense, but your patch won't generally have that
> effect, because with the patch you proposed, the keep-alive can only
> be sent when the server is busy if the write queue is also full.

Well, it either needs to be caughtup *or*/and have a busy write queue,
right? Usually that state will be reached very quickly because before
that we're writing data to the network as fast as it can be read from
disk.
Also, there's no timeout checks outside that if (caughtup ||
send_pending()) block, so there's not much of a window to hit problems when
that loop isn't entered.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: walsender doesn't send keepalives when writes are pending

From
Robert Haas
Date:
On Tue, Feb 25, 2014 at 11:23 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-02-25 11:15:46 -0500, Robert Haas wrote:
>> On Tue, Feb 25, 2014 at 10:54 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > I am not sure I can follow. Why doesn't it make sense to send out the
>> > keepalive (with replyRequested = true) when we're busy sending stuff
>> > (which will be the case most of the time on a busy server)?
>>
>> It may very well make sense, but your patch won't generally have that
>> effect, because with the patch you proposed, the keep-alive can only
>> be sent when the server is busy if the write queue is also full.
>
> Well, it either needs to be caughtup *or*/and have a busy write queue,
> right?

Right.

> Usually that state will be reached very quickly because before
> that we're writing data to the network as fast as it can be read from
> disk.

I'm unimpressed.  Even if that is in practice true, making the code
self-consistent is a goal of non-trivial value.  The timing of sending
keep-alives has no business depending on the state of the write queue,
and right now it doesn't.  Your patch would make it depend on that,
mostly by accident AFAICS.

> Also, there's no timeout checks outside that if (caughtup ||
> send_pending()) block, so there's not much of a window to hit problems when
> that loop isn't entered.

That might be true, but waiting until the write queue is full to send
a ping and then checking whether we've timed out before the queue has
drained isn't a sane design pattern.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: walsender doesn't send keepalives when writes are pending

From
Heikki Linnakangas
Date:
On 02/25/2014 06:41 PM, Robert Haas wrote:
> On Tue, Feb 25, 2014 at 11:23 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Usually that state will be reached very quickly because before
>> that we're writing data to the network as fast as it can be read from
>> disk.
>
> I'm unimpressed.  Even if that is in practice true, making the code
> self-consistent is a goal of non-trivial value.  The timing of sending
> keep-alives has no business depending on the state of the write queue,
> and right now it doesn't.  Your patch would make it depend on that,
> mostly by accident AFAICS.

Yeah, the timeout should should be checked regardless of the status of
the write queue. Per the attached patch.

While looking at this, I noticed that the time we sleep between each
iteration is calculated in a funny way. It's not this patch's fault, but
moving the code made it more glaring. After the patch, it looks like this:

>     TimestampTz timeout;
>     long        sleeptime = 10000;        /* 10 s */
> ...
>     /*
>      * If wal_sender_timeout is active, sleep in smaller increments
>      * to not go over the timeout too much. XXX: Why not just sleep
>      * until the timeout has elapsed?
>      */
>     if (wal_sender_timeout > 0)
>         sleeptime = 1 + (wal_sender_timeout / 10);
>
>     /* Sleep until something happens or we time out */
> ...
>     WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents,
>               MyProcPort->sock, sleeptime);
> ...
>     /*
>      * Check for replication timeout.  Note we ignore the corner case
>      * possibility that the client replied just as we reached the
>      * timeout ... he's supposed to reply *before* that.
>      */
>     timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
>                       wal_sender_timeout);
>     if (wal_sender_timeout > 0 && GetCurrentTimestamp() >= timeout)
>     {
>         ...
>     }

The logic was the same before the patch, but I added the XXX comment
above. Why do we sleep in increments of 1/10 of wal_sender_timeout?
Originally, the code calculated when the next wakeup should happen, by
adding wal_sender_timeout (or replication_timeout, as it was called back
then) to the time of the last reply. Why don't we do that?

The code seems to have just slowly devolved into that. It was changed
from sleep-until-timeout to sleep 1/10th of wal_sender_timeout by a
patch that made walsender send a keep-alive message to the client every
time it wakes up, and I guess the idea was to send a message at an
interval that's 1/10th of the timeout. But that idea is long gone by
now; first, commit da4efa13d801ccc179f1d2c24d8a60c4a2f8ede9 turned off
sending the keep-alive messages by default, and then
6f60fdd7015b032bf49273c99f80913d57eac284 turned them back on, but so
that it's only sent once when 1/2 of wal_sender_timeout has elapsed.

I think that should be changed back to sleep until the next deadline of
when something needs to happen, instead of polling. But that can be a
separate patch.

- Heikki

Attachment

Re: walsender doesn't send keepalives when writes are pending

From
Andres Freund
Date:
On 2014-03-05 18:26:13 +0200, Heikki Linnakangas wrote:
> On 02/25/2014 06:41 PM, Robert Haas wrote:
> >On Tue, Feb 25, 2014 at 11:23 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> >>Usually that state will be reached very quickly because before
> >>that we're writing data to the network as fast as it can be read from
> >>disk.
> >
> >I'm unimpressed.  Even if that is in practice true, making the code
> >self-consistent is a goal of non-trivial value.  The timing of sending
> >keep-alives has no business depending on the state of the write queue,
> >and right now it doesn't.  Your patch would make it depend on that,
> >mostly by accident AFAICS.

I still don't see how my proposed patch increases the dependency, rather
the contrary, it's less dependant on the flushing behaviour.

But I agree that a more sweeping change is a good idea.

> The logic was the same before the patch, but I added the XXX comment above.
> Why do we sleep in increments of 1/10 of wal_sender_timeout? Originally, the
> code calculated when the next wakeup should happen, by adding
> wal_sender_timeout (or replication_timeout, as it was called back then) to
> the time of the last reply. Why don't we do that?
> [ archeology ]

It imo makes sense to wakeup after last_reply + wal_sender_timeout/2, so
a requested reply actually has time to arrive, but otherwise I agree.

I think your patch makes sense. Additionally imo the timeout checking
should be moved outside the if (caughtup || pq_is_send_pending()), but
that's probably a separate patch.

Any chance you could apply your patch soon? I've a patch pending that'll
surely conflict with this and it seems better to fix it first.

Greetings,

Andres Freund



Re: walsender doesn't send keepalives when writes are pending

From
Heikki Linnakangas
Date:
On 03/05/2014 10:57 PM, Andres Freund wrote:
> On 2014-03-05 18:26:13 +0200, Heikki Linnakangas wrote:
>> The logic was the same before the patch, but I added the XXX comment above.
>> Why do we sleep in increments of 1/10 of wal_sender_timeout? Originally, the
>> code calculated when the next wakeup should happen, by adding
>> wal_sender_timeout (or replication_timeout, as it was called back then) to
>> the time of the last reply. Why don't we do that?
>> [ archeology ]
>
> It imo makes sense to wakeup after last_reply + wal_sender_timeout/2, so
> a requested reply actually has time to arrive, but otherwise I agree.
>
> I think your patch makes sense. Additionally imo the timeout checking
> should be moved outside the if (caughtup || pq_is_send_pending()), but
> that's probably a separate patch.
>
> Any chance you could apply your patch soon? I've a patch pending that'll
> surely conflict with this and it seems better to fix it first.

Ok, pushed. I left the polling-style sleep in place for now.

Thanks!

- Heikki