Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24 - Mailing list pgsql-bugs

From Tom Lane
Subject Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24
Date
Msg-id 15920.1507562110@sss.pgh.pa.us
Whole thread Raw
In response to Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24  (Marko Tiikkaja <marko@joh.to>)
List pgsql-bugs
Marko Tiikkaja <marko@joh.to> writes:
> After running it for a few days I start getting logged messages such as:

>   out of order notification Q_97882353: 97882353 != 97882349 + 1 (prefix Q)
>   out of order notification F_97947433: 97947433 != 97947429 + 1 (prefix F)
>   out of order notification F_97947439: 97947439 != 97947436 + 1 (prefix F)

> I did it on both 9.1.24 and 9.6.5 and they both exhibit the same behavior:
> it takes days to get into this state, but then notifications are missed all
> the time.  I currently have both systems in this state, so any idea what to
> look at to try and debug this further?

You might try gdb'ing the recipient and stepping through
asyncQueueProcessPageEntries to see what happens.  Are the missing
entries present in the queue but it decides to ignore them for some
reason, or are they just not there?

An interesting black-box test might be to do this with two receiver
processes and see if they miss identical sets of messages.  That
would be a different way of triangulating on question number 1,
which is whether the sender or the recipient is at fault.

I wonder whether the long ramp-up time indicates that you have to
wrap around some counter somewhere before things go south.  Although
the only obvious candidate is wrapping the pg_notify SLRU queue,
and I'd think that would have happened many times already.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

pgsql-bugs by date:

Previous
From: Marko Tiikkaja
Date:
Subject: Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24
Next
From: Masahiko Sawada
Date:
Subject: Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF trigger