Re: pg_basebackup may fail to send feedbacks. - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: pg_basebackup may fail to send feedbacks.
Date
Msg-id 20150217.194519.58137941.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: pg_basebackup may fail to send feedbacks.  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: pg_basebackup may fail to send feedbacks.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
Thank you for the comment. Three new patches are attached. I
forgot to give a revision number on the previous patch, but I
think this is the 2nd version.

1. walrcv_reply_fix_94_v2.patch  Walreceiver patch applyable on master/REL9_4_STBLE/REL9_3_STABLE

2. walrcv_reply_fix_92_v2.patch  Walreceiver patch applyable on REL9_2_STABLE

3. walrcv_reply_fix_91_v2.patch  Walreceiver patch applyable on REL9_1_STABLE


At Sat, 14 Feb 2015 03:01:22 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwHZ_4yWyokA+5ks9s_698adjH8+0haMCSC9WYFh37GY7g@mail.gmail.com>
> On Tue, Feb 10, 2015 at 7:48 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >> Introduce the maximum number of records that we can receive and
> >> process between feedbacks? For example, we can change
> >> XLogWalRcvSendHSFeedback() so that it's called per at least
> >> 8 records. I'm not sure what number is good, though...
> >
> > At the beginning of the "while(len > 0){if (len > 0){" block,
> > last_recv_timestamp is always kept up to date (by using
> > gettimeotda():). So we can use the variable instead of
> > gettimeofday() in my previous proposal.
> 
> Yes, but something like last_recv_timestamp doesn't exist in
> REL9_1_STABLE. So you don't think that your proposed fix
> should be back-patched to all supported versions. Right?

Back to 9.3 has the loop with the same structure. 9.2 is in a bit
differenct shape but looks to have the same issue. 9.1 and 9.0
has the same staps with 9.2 but 9.0 doesn't has wal sender
timeout and 9.1 does not have a variable having current time.

9.3 and later: the previous patch works, but revised as your comment.

9.2: The time of the last reply is stored in the file-scope variable reply_message, and the current time is stored in
walrcv->lastMsgReceiptTime.The timeout is determined using theses variables.
 

9.1: walrcv doesn't have lastMsgReceiptTime. The current time should be acquired explicitly in the innermost loop. I
triedcalling gettimeofday() once per several loops. The skip count is determined by anticipated worst duration to
processa wal record and wal_receiver_status_interval. However, I doubt the necessity to do so.. The value of the worst
durationis simply a random guess.
 

9.0: No point to do this kind of fix.


> > The start time of the timeout could be last_recv_timestamp at the
> > beginning of the while loop, since it is a bit earlier than the
> > actual time at the point.
> 
> Sounds strange to me. I think that last_recv_timestamp should be
> compared with the time when the last feedback message was sent,
> e.g., maybe you can expose sendTime in XLogWalRcvSendReplay()
> as global variable, and use it to compare with last_recv_timestamp.

You're right to do exactly right thing, but, as you mentioned,
exposing sendTime is requied to do so. The solution I proposed is
the second-best assuming that wal_sender_timeout is recommended
to be longer enough (several times longer) than
wal_receiver_status_interval. If no one minds to expose sendTime,
it is the best solution. The attached patch does it.

# The added comment in the patch was wrong, though.

> When we get out of the WAL receive loop due to the timeout check
> which your patch added, XLogWalRcvFlush() is always executed.
> I don't think that current WAL file needs to be flushed at that time.

Thank you for pointing it out. Run XLogWalRcvSendReply(force =
true) immediately instead of breaking from the loop.

> > The another solution would be using
> > RegisterTimeout/enable_timeout_after and it seemed to be work for
> > me. It can throw out the time counting stuff in the loop we are
> > talking about and that of XLogWalRcvSendHSFeedback and
> > XLogWalRcvSendReply, but it might be a bit too large for the
> > gain.
> 
> Yes, sounds overkill.

However, gettimeofday() for each recv is also a overkill for most
cases. I'll post the patches for receivelog.c based on the patch
for 9.1 wal receiver.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

Previous
From: "happy times"
Date:
Subject: Re: restrict global access to be readonly
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: How about to have relnamespace and relrole?