Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE. - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.
Date
Msg-id CAD21AoAEpxeYu6z0Pg6=qC12D5bL1dORwXEQMHGek48ShDcsgg@mail.gmail.com
Whole thread Raw
In response to Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Fri, Apr 1, 2016 at 9:10 AM, Noah Misch <noah@leadboat.com> wrote:
> On Thu, Mar 31, 2016 at 04:48:26PM +0900, Masahiko Sawada wrote:
>> On Thu, Mar 31, 2016 at 2:02 PM, Noah Misch <noah@leadboat.com> wrote:
>> > On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
>> >> As a result of looked into code around the recvoery, ISTM that the
>> >> cause is related to relation cache clear.
>> >> In heap_xlog_visible, if the standby server receives WAL record then
>> >> relation cache is eventually cleared in vm_extend,  but If standby
>> >> server receives FPI then relation cache would not be cleared.
>> >> For example, after I applied attached patch to HEAD, (it might not be
>> >> right way but) this problem seems to be resolved.
>> >>
>> >> Is this a bug? or not?
>> >
>> > It's a bug.  I don't expect it causes queries to return wrong answers, because
>> > visibilitymap.c says "it's always safe to clear a bit in the map from
>> > correctness point of view."  (The bug makes a visibility map bit temporarily
>> > appear to have been cleared.)  I still call it a bug, because recovery
>> > behavior becomes too difficult to verify when xlog replay produces conditions
>> > that don't happen outside of recovery.  Even if there's no way to get a wrong
>> > query answer today, this would be too easy to break later.  I wonder if we
>> > make the same omission in other xlog replay functions.  Similar omissions may
>> > cause wrong query answers, even if this particular one does not.
>> >
>> > Would you like to bisect for the commit, or at least the major release, at
>> > which the bug first appeared?
>> >
>> > I wonder if your discovery has any relationship to this recently-reported case
>> > of insufficient smgr invalidation:
>> > http://www.postgresql.org/message-id/flat/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA@mail.gmail.com
>> >
>>
>> I'm not sure this bug has relationship to another issue you mentioned
>> but after further investigation, this bug seems to be reproduced even
>> on more older version.
>> At least I reproduced it at 9.0.0.
>
> Would you try PostgreSQL 9.2.16?  The visibility map was not crash safe and
> had no correctness implications until 9.2.  If 9.2 behaves this way, it's
> almost certainly not a recent regression.

Yeah, I reproduced it on 9.2.0 and 9.2.16, it's not recent regression.
The commit is 503c7305a1e379f95649eef1a694d0c1dbdc674a which
introduces crash-safe visibility map.

Regards,

--
Masahiko Sawada



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: IF (NOT) EXISTS in psql-completion
Next
From: Fabien COELHO
Date:
Subject: Re: pgbench - remove unused clientDone parameter