Re: walsender doesn't send keepalives when writes are pending - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: walsender doesn't send keepalives when writes are pending |
Date | |
Msg-id | 53175025.9040805@vmware.com Whole thread Raw |
In response to | Re: walsender doesn't send keepalives when writes are pending (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: walsender doesn't send keepalives when writes are
pending
|
List | pgsql-hackers |
On 02/25/2014 06:41 PM, Robert Haas wrote: > On Tue, Feb 25, 2014 at 11:23 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> Usually that state will be reached very quickly because before >> that we're writing data to the network as fast as it can be read from >> disk. > > I'm unimpressed. Even if that is in practice true, making the code > self-consistent is a goal of non-trivial value. The timing of sending > keep-alives has no business depending on the state of the write queue, > and right now it doesn't. Your patch would make it depend on that, > mostly by accident AFAICS. Yeah, the timeout should should be checked regardless of the status of the write queue. Per the attached patch. While looking at this, I noticed that the time we sleep between each iteration is calculated in a funny way. It's not this patch's fault, but moving the code made it more glaring. After the patch, it looks like this: > TimestampTz timeout; > long sleeptime = 10000; /* 10 s */ > ... > /* > * If wal_sender_timeout is active, sleep in smaller increments > * to not go over the timeout too much. XXX: Why not just sleep > * until the timeout has elapsed? > */ > if (wal_sender_timeout > 0) > sleeptime = 1 + (wal_sender_timeout / 10); > > /* Sleep until something happens or we time out */ > ... > WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents, > MyProcPort->sock, sleeptime); > ... > /* > * Check for replication timeout. Note we ignore the corner case > * possibility that the client replied just as we reached the > * timeout ... he's supposed to reply *before* that. > */ > timeout = TimestampTzPlusMilliseconds(last_reply_timestamp, > wal_sender_timeout); > if (wal_sender_timeout > 0 && GetCurrentTimestamp() >= timeout) > { > ... > } The logic was the same before the patch, but I added the XXX comment above. Why do we sleep in increments of 1/10 of wal_sender_timeout? Originally, the code calculated when the next wakeup should happen, by adding wal_sender_timeout (or replication_timeout, as it was called back then) to the time of the last reply. Why don't we do that? The code seems to have just slowly devolved into that. It was changed from sleep-until-timeout to sleep 1/10th of wal_sender_timeout by a patch that made walsender send a keep-alive message to the client every time it wakes up, and I guess the idea was to send a message at an interval that's 1/10th of the timeout. But that idea is long gone by now; first, commit da4efa13d801ccc179f1d2c24d8a60c4a2f8ede9 turned off sending the keep-alive messages by default, and then 6f60fdd7015b032bf49273c99f80913d57eac284 turned them back on, but so that it's only sent once when 1/2 of wal_sender_timeout has elapsed. I think that should be changed back to sleep until the next deadline of when something needs to happen, instead of polling. But that can be a separate patch. - Heikki
Attachment
pgsql-hackers by date: