[BUG] False indication in pg_stat_replication.sync_state - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | [BUG] False indication in pg_stat_replication.sync_state |
Date | |
Msg-id | 20121018.174208.239638814.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Reduce palloc's in numeric operations. (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: [BUG] False indication in pg_stat_replication.sync_state
(Fujii Masao <masao.fujii@gmail.com>)
|
List | pgsql-hackers |
Hello. My colleague found that pg_stat_replication.sync_state shows false state for some condition. This prevents Pacemaker from completing fail-over that could safely be done. The point is in walsender.c, pg_stat_get_wal_senders() below, (as of REL9_2_1) 1555: if (walsnd->pid != 0) 1556: { 1557: /* 1558: * Treat a standby such as a pg_basebackup backgroundprocess 1559: * which always returns an invalid flush location, as an 1560: * asynchronous standby.1561: */ ! 1562: sync_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ? 1563: 0 : walsnd->sync_standby_priority; Here, XLogRecPtrIsInvalid(walsnd->flush) is defined as (walsnd->flush.xrecoff == 0) which becomes true as usual at every WAL 'file's (not segments) boundary. xrecoff == 0 is certainly invalid for the start point of WAL *RECORD*, but should be considered valid in replication stream. This check was introduced at 9.2.0 and the version up between 9.1.4 and 9.1.5. | DEBUG: write 4/0 flush 3/FEFFEC68 apply 3/FEFFEC68| DEBUG: write 4/0 flush 4/0 apply 3/FEFFEC68| DEBUG: HOGE: flush= 3/FEFFEC68 sync_priority[0] = 1| DEBUG: write 4/111C0 flush 4/0 apply 3/FEFFECC0 !| DEBUG: HOGE: flush = 4/0 sync_priority[0] = 0 This value zero of sync_priority[0] makes sync_status 'async' errorneously and confuses Pacemaker. # The log line marked with 'HOGE' above printed by applying the # patch at the bottom of this message and invoking 'select # sync_state from pg_stat_replication' periodically. To increase # the chance to see the symptom, sleep 1 second for 'file' # boundaries :-) The Heikki's recent(?) commit 0ab9d1c4b31622e9176472b4276f3e9831e3d6ba which changes the format of XLogRecPtr from logid:xrecoff struct to 64 bit linear address would fix the false indication. But I suppose this patch won't be applied to existing 9.1.x and 9.2.x because of the modification onto streaming protocol. As far as I see the patch, it would'nt change the meaning of XLogRecPtr to change XLogRecPtrIsInvalid from (xrecoff == 0) to (xrecoff == 0 && xlogid == 0). But this change affects rather wide portion where handling WAL nevertheless what is needed here is only to stop the false indication. On the other hand, pg_basebackup seems return 0/0 as flush and apply positions so it seems enough only to add xlogid == 0 into the condition. The patch attached for REL9_2_1 does this and yields the result following. | DEBUG: write 2/FEFFFD48 flush 2/FEFFFD48 apply 2/FEFF7AB0| DEBUG: write 3/0 flush 2/FEFFFD48 apply 2/FEFF7E88| DEBUG: write 3/0 flush 3/0 apply 2/FEFFFD48| DEBUG: HOGE: flush = 2/FEFFFD48 sync_priority[0] = 1| DEBUG: write 3/E338flush 3/0 apply 2/FEFFFF80 !| DEBUG: HOGE: flush = 3/0 sync_priority[0] = 1 I think this patch should be applied for 9.2.2 and 9.1.7. regards, -- Kyotaro Horiguchi NTT Open Source Software Center ============================================================ ===== The patch for this test. ============================================================ diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 064ddd5..19f79d1 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -618,6 +618,10 @@ ProcessStandbyReplyMessage(void) reply.flush.xlogid, reply.flush.xrecoff, reply.apply.xlogid,reply.apply.xrecoff); + if (reply.write.xrecoff == 0 || + reply.flush.xrecoff == 0) + sleep(1); + /* * Update shared state for this WalSender process based on reply data from * standby. @@ -1561,7 +1565,10 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS) */ sync_priority[i] = XLogRecPtrIsInvalid(walsnd->flush)? 0 : walsnd->sync_standby_priority; - + elog(DEBUG1, "HOGE: flush = %X/%X sync_priority[%d] = %d", + walsnd->flush.xlogid, walsnd->flush.xrecoff, + i, sync_priority[i]); + if (walsnd->state == WALSNDSTATE_STREAMING && walsnd->sync_standby_priority > 0 && (priority == 0 || diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 064ddd5..1d4cbc4 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1555,11 +1555,11 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS) if (walsnd->pid != 0) { /* - * Treat a standby such as a pg_basebackup background process - * which always returns an invalid flush location, as an + * Treat a standby such as a pg_basebackup background process which + * always returns 0/0 (InvalidXLogRecPtr) as flush location, as an * asynchronous standby. */ - sync_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ? + sync_priority[i] = XLByteEQ(walsnd->flush, InvalidXLogRecPtr) ? 0 : walsnd->sync_standby_priority; if (walsnd->state == WALSNDSTATE_STREAMING &&
pgsql-hackers by date: