On Wed, Apr 18, 2018 at 10:36 PM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
> Pavan Deolasee wrote:
>> On Wed, Apr 18, 2018 at 7:37 AM, Wood, Dan <hexpert@amazon.com> wrote:
>
>> > My analysis is that heap_prepare_freeze_tuple->FreezeMultiXactId()
>> > returns FRM_NOOP if the MultiXACT locked rows haven't committed. This
>> > results in changed=false and totally_frozen=true(as initialized). When
>> > this returns to lazy_scan_heap(), no rows are added to the frozen[] array.
>> > Yet, tuple_totally_frozen is true. This means the page is marked frozen in
>> > the VM, even though the MultiXACT row wasn't left untouch.
>> >
>> > A fix to heap_prepare_freeze_tuple() that seems to do the trick is:
>> > else
>> > {
>> > Assert(flags & FRM_NOOP);
>> > + totally_frozen = false;
>> > }
>> >
>>
>> That's a great find!
>
> Indeed.
>
> This family of bugs (introduced by freeze map changes in 9.6) was
> initially fixed in 38e9f90a227d but this spot was missed in that fix.
>
> IMO the cause is the totally_frozen variable, which starts life in a
> bogus state (true) and the different code paths can set it to the right
> state, or by inaction end up deciding that the initial bogus state was
> correct in the first place. Errors of omission are far too easy in that
> kind of model, ISTM, so I propose this slightly different patch, which
> albeit yet untested seems easier to verify and easier to get right.
>
Thank you for the patch!
Agreed. The patch looks to fix this issue correctly.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center