Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING
Date
Msg-id CAFiTN-sNVcCv5=CBA0JeFTGH1k16a+AtjTDSB3Yj35S2r-3cBQ@mail.gmail.com
Whole thread Raw
In response to Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
List pgsql-hackers
On Sun, Jul 19, 2020 at 4:56 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> Hi Dilip!
>
>
> > 17 июля 2020 г., в 15:46, Dilip Kumar <dilipbalaut@gmail.com> написал(а):
> >
> > The attached patch allows the vacuum to continue by emitting WARNING
> > for the corrupted tuple instead of immediately error out as discussed
> > at [1].
> >
> > Basically, it provides a new GUC called vacuum_tolerate_damage, to
> > control whether to continue the vacuum or to stop on the occurrence of
> > a corrupted tuple.  So if the vacuum_tolerate_damage is set then in
> > all the cases in heap_prepare_freeze_tuple where the corrupted xid is
> > detected, it will emit a warning and return that nothing is changed in
> > the tuple and the 'tuple_totally_frozen' will also be set to false.
> > Since we are returning false the caller will not try to freeze such
> > tuple and the tuple_totally_frozen is also set to false so that the
> > page will not be marked to all frozen even if all other tuples in the
> > page are frozen.
> >
> > Alternatively,  we can try to freeze other XIDs in the tuple which is
> > not corrupted but I don't think we will gain anything from this,
> > because if one of the xmin or xmax is wrong then next time also if we
> > run the vacuum then we are going to get the same WARNING or the ERROR.
> > Is there any other opinion on this?
>
> FWIW AFAIK this ERROR was the reason why we had to use older versions of heap_prepare_freeze_tuple() in our recovery
kit[0]. 
> So +1 from me.

Thanks for showing interest in this patch.

> But I do not think that just ignoring corruption here is sufficient. Soon after this freeze problem user will,
probably,have to deal with absent CLOG. 
> I think this GUC is only a part of an incomplete solution.
> Personally I'd be happy if this is backported - our recovery kit would be much smaller. But this does not seem like a
validreason. 

I agree that this is just solving one part of the problem and in some
cases, it may not work if the CLOG itself is corrupted i.e does not
exist for the xid which are not yet frozen.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Next
From: Amit Kapila
Date:
Subject: Re: problem with RETURNING and update row movement