Re: pg_basebackup may fail to send feedbacks. - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: pg_basebackup may fail to send feedbacks.
Date
Msg-id 20150218.173441.13047480.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: pg_basebackup may fail to send feedbacks.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: pg_basebackup may fail to send feedbacks.  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
Hello, this is the last patch for pg_basebackup/pg_receivexlog on
master (9.5). Preor versions don't have this issue.

4. basebackup_reply_fix_mst_v2.patch receivelog.c patch applyable on master.

This is based on the same design with
walrcv_reply_fix_91_v2.patch in the aspect of gettimeofday().

regards,

At Tue, 17 Feb 2015 19:45:19 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20150217.194519.58137941.horiguchi.kyotaro@lab.ntt.co.jp>
> Thank you for the comment. Three new patches are attached. I
> forgot to give a revision number on the previous patch, but I
> think this is the 2nd version.
> 
> 1. walrcv_reply_fix_94_v2.patch
>    Walreceiver patch applyable on master/REL9_4_STBLE/REL9_3_STABLE
> 
> 2. walrcv_reply_fix_92_v2.patch
>    Walreceiver patch applyable on REL9_2_STABLE
> 
> 3. walrcv_reply_fix_91_v2.patch
>    Walreceiver patch applyable on REL9_1_STABLE
> 
> 
> At Sat, 14 Feb 2015 03:01:22 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwHZ_4yWyokA+5ks9s_698adjH8+0haMCSC9WYFh37GY7g@mail.gmail.com>
> > On Tue, Feb 10, 2015 at 7:48 PM, Kyotaro HORIGUCHI
> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > >> Introduce the maximum number of records that we can receive and
> > >> process between feedbacks? For example, we can change
> > >> XLogWalRcvSendHSFeedback() so that it's called per at least
> > >> 8 records. I'm not sure what number is good, though...
> > >
> > > At the beginning of the "while(len > 0){if (len > 0){" block,
> > > last_recv_timestamp is always kept up to date (by using
> > > gettimeotda():). So we can use the variable instead of
> > > gettimeofday() in my previous proposal.
> > 
> > Yes, but something like last_recv_timestamp doesn't exist in
> > REL9_1_STABLE. So you don't think that your proposed fix
> > should be back-patched to all supported versions. Right?
> 
> Back to 9.3 has the loop with the same structure. 9.2 is in a bit
> differenct shape but looks to have the same issue. 9.1 and 9.0
> has the same staps with 9.2 but 9.0 doesn't has wal sender
> timeout and 9.1 does not have a variable having current time.
> 
> 9.3 and later: the previous patch works, but revised as your
>   comment.
> 
> 9.2: The time of the last reply is stored in the file-scope
>   variable reply_message, and the current time is stored in
>   walrcv->lastMsgReceiptTime. The timeout is determined using
>   theses variables.
> 
> 9.1: walrcv doesn't have lastMsgReceiptTime. The current time
>   should be acquired explicitly in the innermost loop. I tried
>   calling gettimeofday() once per several loops. The skip count
>   is determined by anticipated worst duration to process a wal
>   record and wal_receiver_status_interval. However, I doubt the
>   necessity to do so.. The value of the worst duration is simply
>   a random guess.
> 
> 9.0: No point to do this kind of fix.
> 
> 
> > > The start time of the timeout could be last_recv_timestamp at the
> > > beginning of the while loop, since it is a bit earlier than the
> > > actual time at the point.
> > 
> > Sounds strange to me. I think that last_recv_timestamp should be
> > compared with the time when the last feedback message was sent,
> > e.g., maybe you can expose sendTime in XLogWalRcvSendReplay()
> > as global variable, and use it to compare with last_recv_timestamp.
> 
> You're right to do exactly right thing, but, as you mentioned,
> exposing sendTime is requied to do so. The solution I proposed is
> the second-best assuming that wal_sender_timeout is recommended
> to be longer enough (several times longer) than
> wal_receiver_status_interval. If no one minds to expose sendTime,
> it is the best solution. The attached patch does it.
> 
> # The added comment in the patch was wrong, though.
> 
> > When we get out of the WAL receive loop due to the timeout check
> > which your patch added, XLogWalRcvFlush() is always executed.
> > I don't think that current WAL file needs to be flushed at that time.
> 
> Thank you for pointing it out. Run XLogWalRcvSendReply(force =
> true) immediately instead of breaking from the loop.
> 
> > > The another solution would be using
> > > RegisterTimeout/enable_timeout_after and it seemed to be work for
> > > me. It can throw out the time counting stuff in the loop we are
> > > talking about and that of XLogWalRcvSendHSFeedback and
> > > XLogWalRcvSendReply, but it might be a bit too large for the
> > > gain.
> > 
> > Yes, sounds overkill.
> 
> However, gettimeofday() for each recv is also a overkill for most
> cases. I'll post the patches for receivelog.c based on the patch
> for 9.1 wal receiver.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: How about to have relnamespace and relrole?