On Thu, Mar 31, 2016 at 04:48:26PM +0900, Masahiko Sawada wrote:
> On Thu, Mar 31, 2016 at 2:02 PM, Noah Misch <noah@leadboat.com> wrote:
> > On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
> >> As a result of looked into code around the recvoery, ISTM that the
> >> cause is related to relation cache clear.
> >> In heap_xlog_visible, if the standby server receives WAL record then
> >> relation cache is eventually cleared in vm_extend, but If standby
> >> server receives FPI then relation cache would not be cleared.
> >> For example, after I applied attached patch to HEAD, (it might not be
> >> right way but) this problem seems to be resolved.
> >>
> >> Is this a bug? or not?
> >
> > It's a bug. I don't expect it causes queries to return wrong answers, because
> > visibilitymap.c says "it's always safe to clear a bit in the map from
> > correctness point of view." (The bug makes a visibility map bit temporarily
> > appear to have been cleared.) I still call it a bug, because recovery
> > behavior becomes too difficult to verify when xlog replay produces conditions
> > that don't happen outside of recovery. Even if there's no way to get a wrong
> > query answer today, this would be too easy to break later. I wonder if we
> > make the same omission in other xlog replay functions. Similar omissions may
> > cause wrong query answers, even if this particular one does not.
> >
> > Would you like to bisect for the commit, or at least the major release, at
> > which the bug first appeared?
> >
> > I wonder if your discovery has any relationship to this recently-reported case
> > of insufficient smgr invalidation:
> > http://www.postgresql.org/message-id/flat/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA@mail.gmail.com
> >
>
> I'm not sure this bug has relationship to another issue you mentioned
> but after further investigation, this bug seems to be reproduced even
> on more older version.
> At least I reproduced it at 9.0.0.
Would you try PostgreSQL 9.2.16? The visibility map was not crash safe and
had no correctness implications until 9.2. If 9.2 behaves this way, it's
almost certainly not a recent regression.