Re: [BUG] False indication in pg_stat_replication.sync_state - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: [BUG] False indication in pg_stat_replication.sync_state
Date
Msg-id CAHGQGwGGjYH2yPFwftZFi7hHTsqjVytbPjGmB530+xudz4VonQ@mail.gmail.com
Whole thread Raw
In response to [BUG] False indication in pg_stat_replication.sync_state  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: [BUG] False indication in pg_stat_replication.sync_state  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
On Thu, Oct 18, 2012 at 5:42 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello. My colleague found that pg_stat_replication.sync_state
> shows false state for some condition.
>
> This prevents Pacemaker from completing fail-over that could
> safely be done.
>
> The point is in walsender.c, pg_stat_get_wal_senders() below, (as
> of REL9_2_1)
>
>   1555:    if (walsnd->pid != 0)
>   1556:    {
>   1557:      /*
>   1558:       * Treat a standby such as a pg_basebackup background process
>   1559:       * which always returns an invalid flush location, as an
>   1560:       * asynchronous standby.
>   1561:       */
> ! 1562:       sync_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ?
>   1563:          0 : walsnd->sync_standby_priority;
>
> Here, XLogRecPtrIsInvalid(walsnd->flush) is defined as
> (walsnd->flush.xrecoff == 0) which becomes true as usual at every
> WAL 'file's (not segments) boundary. xrecoff == 0 is certainly
> invalid for the start point of WAL *RECORD*, but should be
> considered valid in replication stream. This check was introduced
> at 9.2.0 and the version up between 9.1.4 and 9.1.5.
>
>  | DEBUG:  write 4/0 flush 3/FEFFEC68 apply 3/FEFFEC68
>  | DEBUG:  write 4/0 flush 4/0 apply 3/FEFFEC68
>  | DEBUG:  HOGE: flush = 3/FEFFEC68 sync_priority[0] = 1
>  | DEBUG:  write 4/111C0 flush 4/0 apply 3/FEFFECC0
> !| DEBUG:  HOGE: flush = 4/0 sync_priority[0] = 0
>
> This value zero of sync_priority[0] makes sync_status 'async'
> errorneously and confuses Pacemaker.
>
> # The log line marked with 'HOGE' above printed by applying the
> # patch at the bottom of this message and invoking 'select
> # sync_state from pg_stat_replication' periodically. To increase
> # the chance to see the symptom, sleep 1 second for 'file'
> # boundaries :-)
>
> The Heikki's recent(?) commit
> 0ab9d1c4b31622e9176472b4276f3e9831e3d6ba which changes the format
> of XLogRecPtr from logid:xrecoff struct to 64 bit linear address
> would fix the false indication. But I suppose this patch won't be
> applied to existing 9.1.x and 9.2.x because of the modification
> onto streaming protocol.
>
> As far as I see the patch, it would'nt change the meaning of
> XLogRecPtr to change XLogRecPtrIsInvalid from (xrecoff == 0) to
> (xrecoff == 0 && xlogid == 0). But this change affects rather
> wide portion where handling WAL nevertheless what is needed here
> is only to stop the false indication.
>
> On the other hand, pg_basebackup seems return 0/0 as flush and
> apply positions so it seems enough only to add xlogid == 0 into
> the condition. The patch attached for REL9_2_1 does this and
> yields the result following.
>
>  | DEBUG:  write 2/FEFFFD48 flush 2/FEFFFD48 apply 2/FEFF7AB0
>  | DEBUG:  write 3/0 flush 2/FEFFFD48 apply 2/FEFF7E88
>  | DEBUG:  write 3/0 flush 3/0 apply 2/FEFFFD48
>  | DEBUG:  HOGE: flush = 2/FEFFFD48 sync_priority[0] = 1
>  | DEBUG:  write 3/E338 flush 3/0 apply 2/FEFFFF80
> !| DEBUG:  HOGE: flush = 3/0 sync_priority[0] = 1
>
> I think this patch should be applied for 9.2.2 and 9.1.7.

Looks good to me, though I don't think the source code comment needs
to be updated in the way the patch does.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: September 2012 commitfest
Next
From: Robert Haas
Date:
Subject: Re: First draft of snapshot snapshot building design document