Re: walsender doesn't send keepalives when writes are pending - Mailing list pgsql-hackers

From Andres Freund
Subject Re: walsender doesn't send keepalives when writes are pending
Date
Msg-id 20140225155447.GQ6718@awork2.anarazel.de
Whole thread Raw
In response to Re: walsender doesn't send keepalives when writes are pending  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: walsender doesn't send keepalives when writes are pending  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2014-02-25 10:45:55 -0500, Robert Haas wrote:
> On Fri, Feb 14, 2014 at 7:05 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > In WalSndLoop() we do:
> >
> >     wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT |
> >         WL_SOCKET_READABLE;
> >
> >     if (pq_is_send_pending())
> >         wakeEvents |= WL_SOCKET_WRITEABLE;
> >     else if (wal_sender_timeout > 0 && !ping_sent)
> >     {
> > ...
> >         if (GetCurrentTimestamp() >= timeout)
> >             WalSndKeepalive(true);
> > ...
> >
> > I think those two if's should simply be separate. There's no reason not
> > to ask for a ping when we're writing. On a busy server that might be the
> > case most of the time.
> > The if (pq_is_send_pending()) should also be *after* sending the
> > keepalive, after all, it might not immediately have been flushed as
> > unlikely as that is.ackers
> 
> I spent an inordinate amount of time looking at this patch.  I'm not
> sure your proposed change will accomplish the desired effect.  The
> thing is that the code you quote is within an if-block gated by
> (caughtup && !streamingDoneSending) || pq_is_send_pending().
> Currently, therefore, the behavior is that we wait on the latch
> *either when* we're caught up (and thus need to wait for more WAL) or
> when the outbound queue is full (and thus we need to wait for it to
> drain), but we send a keep-alive only in the former case (namely,
> we're caught up).
>
> Your proposed change would cause us to send keep-alives when we're
> caught up, or when we're not caught up but the write queue happens to
> be full.  That doesn't make much sense.  There might be a reason to
> start sending keep-alives when we're not caught up, but even if we
> want that behavior change there's no reason to do it only when the
> write queue is full.

I am not sure I can follow. Why doesn't it make sense to send out the
keepalive (with replyRequested = true) when we're busy sending stuff
(which will be the case most of the time on a busy server)?

The point is that clients like pg_receivexlog and pg_basebackup don't
send an keepalive response all the time they receive something (which is
perfectly fine), but just when their internal timer says it's time to do
so, or if the walsender asks them to do so. So, if the server has a
*lower* timeout than configured for pg_receivexlog and the server is a
busy one, you'll get timeouts relatively often. This is a pretty
frequent situation in synchronous replication setups because the default
timeout is *way* too high for that.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PATCH] Use MAP_HUGETLB where supported (v3)
Next
From: Andres Freund
Date:
Subject: Re: [PATCH] Use MAP_HUGETLB where supported (v3)