Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers

From Ashutosh Sharma
Subject Re: recovering from "found xmin ... from before relfrozenxid ..."
Date
Msg-id CAE9k0P=wxSw7+LRSCGQi_fbi6RYvZghE8t0_VNq6eh52uRjm-A@mail.gmail.com
Whole thread Raw
In response to Re: recovering from "found xmin ... from before relfrozenxid ..."  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Responses Re: recovering from "found xmin ... from before relfrozenxid ..."  (Ashutosh Sharma <ashu.coek88@gmail.com>)
List pgsql-hackers
On Thu, Aug 20, 2020 at 11:04 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 19 Aug 2020 at 20:45, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > On Wed, Aug 19, 2020 at 3:55 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Wed, 19 Aug 2020 at 15:09, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 9:27 AM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > On Mon, 17 Aug 2020 at 15:05, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> > > > > >
> > > > > > > pg_force_freeze() can revival a tuple that is already deleted but not
> > > > > > > vacuumed yet. Therefore, the user might need to reindex indexes after
> > > > > > > using that function. For instance, with the following script, the last
> > > > > > > two queries: index scan and seq scan, will return different results.
> > > > > > >
> > > > > > > set enable_seqscan to off;
> > > > > > > set enable_bitmapscan to off;
> > > > > > > set enable_indexonlyscan to off;
> > > > > > > create table tbl (a int primary key);
> > > > > > > insert into tbl values (1);
> > > > > > >
> > > > > > > update tbl set a = a + 100 where a = 1;
> > > > > > >
> > > > > > > explain analyze select * from tbl where a < 200;
> > > > > > >
> > > > > > > -- revive deleted tuple on heap
> > > > > > > select heap_force_freeze('tbl', array['(0,1)'::tid]);
> > > > > > >
> > > > > > > -- index scan returns 2 tuples
> > > > > > > explain analyze select * from tbl where a < 200;
> > > > > > >
> > > > > > > -- seq scan returns 1 tuple
> > > > > > > set enable_seqscan to on;
> > > > > > > explain analyze select * from tbl;
> > > > > > >
> > > > > >
> > > > > > I am not sure if this is the right use-case of pg_force_freeze
> > > > > > function. I think we should only be running pg_force_freeze function
> > > > > > on a tuple for which VACUUM reports "found xmin ABC from before
> > > > > > relfrozenxid PQR" sort of error otherwise it might worsen the things
> > > > > > instead of making it better.
> > > > >
> > > > > Should this also be documented? I think that it's hard to force the
> > > > > user to always use this module in the right situation but we need to
> > > > > show at least when to use.
> > > > >
> > > >
> > > > I've already added some examples in the documentation explaining the
> > > > use-case of force_freeze function. If required, I will also add a note
> > > > about it.
> > > >
> > > > > > > Also, if a tuple updated and moved to another partition is revived by
> > > > > > > heap_force_freeze(), its ctid still has special values:
> > > > > > > MovedPartitionsOffsetNumber and MovedPartitionsBlockNumber. I don't
> > > > > > > see a problem yet caused by a visible tuple having the special ctid
> > > > > > > value, but it might be worth considering either to reset ctid value as
> > > > > > > well or to not freezing already-deleted tuple.
> > > > > > >
> > > > > >
> > > > > > For this as well, the answer remains the same as above.
> > > > >
> > > > > Perhaps the same is true when a tuple header is corrupted including
> > > > > xmin and ctid for some reason and the user wants to fix it? I'm
> > > > > concerned that a live tuple having the wrong ctid will cause SEGV or
> > > > > PANIC error in the future.
> > > > >
> > > >
> > > > If a tuple header itself is corrupted, then I think we must kill that
> > > > tuple. If only xmin and t_ctid fields are corrupted, then probably we
> > > > can think of resetting the ctid value of that tuple. However, it won't
> > > > be always possible to detect the corrupted ctid value. It's quite
> > > > possible that the corrupted ctid field has valid values for block
> > > > number and offset number in it, but it's actually corrupted and it
> > > > would be difficult to consider such ctid as corrupted. Hence, we can't
> > > > do anything about such types of corruption. Probably in such cases we
> > > > need to run VACUUM FULL on such tables so that new ctid gets generated
> > > > for each tuple in the table.
> > >
> > > Understood.
> > >
> > > Perhaps such corruption will be able to be detected by new heapam
> > > check functions discussed on another thread. My point was that it
> > > might be better to attempt making the tuple header sane state as much
> > > as possible when fixing a live tuple in order to prevent further
> > > problems such as databases crash by malicious attackers.
> > >
> >
> > Agreed. So, to handle the ctid related concern that you raised, I'm
> > planning to do the following changes to ensure that the tuple being
> > marked as frozen contains the correct item pointer value. Please let
> > me know if you are okay with these changes.
>
> Given that a live tuple never indicates to ve moved partitions, I
> guess the first condition in the if statement is not necessary. The
> rest looks good to me, although other hackers might think differently.
>

Okay, thanks for confirming. I am planning to go ahead with this
approach. Will later see what others have to say about it.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: remove spurious CREATE INDEX CONCURRENTLY wait
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Implement UNLOGGED clause for COPY FROM