Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers

From MBeena Emerson
Subject Re: recovering from "found xmin ... from before relfrozenxid ..."
Date
Msg-id CANPX-3WnCi64W9n1Gesxhh66rmdCyaSHMc1TrodbrS1M5CrMSg@mail.gmail.com
Whole thread Raw
In response to Re: recovering from "found xmin ... from before relfrozenxid ..."  (Ashutosh Sharma <ashu.coek88@gmail.com>)
Responses Re: recovering from "found xmin ... from before relfrozenxid ..."  (Ashutosh Sharma <ashu.coek88@gmail.com>)
List pgsql-hackers
Hello Ashutosh,

On Fri, 24 Jul 2020 at 14:35, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Hi All,
>
> Attached is the patch that adds heap_force_kill(regclass, tid[]) and heap_force_freeze(regclass, tid[]) functions
whichRobert mentioned in the first email in this thread. The patch basically adds an extension named pg_surgery that
containsthese functions.  Please have a look and let me know your feedback. Thank you. 
>

Thanks for the patch.

1. We would be marking buffer dirty and writing wal even if we have
not done any changes( ex if we pass invalid/dead tids). Maybe we can
handle this better?

cosmetic changes
1. Maybe "HTupleSurgicalOption" instead of "HTupleForceOption" and
also the variable names could use surgery instead.
2. extension comment pg_surgery.control "extension to perform surgery
the damaged heap table" -> "extension to perform surgery on the
damaged heap table"

> On Thu, Jul 16, 2020 at 9:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Thu, Jul 16, 2020 at 10:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
>> > I see your point, though: the tuple has to be able to survive
>> > HOT-pruning in order to cause a problem when we check whether it needs
>> > freezing.
>>
>> Here's an example where the new sanity checks fail on an invisible
>> tuple without any concurrent transactions:
>>
>> $ initdb
>> $ pg_ctl start -l ~/logfile
>> $ createdb
>> $ psql
>>
>> create table simpsons (a int, b text);
>> vacuum freeze;
>>
>> $ cat > txid.sql
>> select txid_current();
>> $ pgbench -t 131072 -c 8 -j 8 -n -f txid.sql
>> $ psql
>>
>> insert into simpsons values (1, 'homer');
>>
>> $ pg_ctl stop
>> $ pg_resetwal -x 1000 $PGDATA
>> $ pg_ctl start -l ~/logfile
>> $ psql
>>
>> update pg_class set relfrozenxid = (relfrozenxid::text::integer +
>> 2000000)::text::xid where relname = 'simpsons';
>>
>> rhaas=# select * from simpsons;
>>  a | b
>> ---+---
>> (0 rows)
>>
>> rhaas=# vacuum simpsons;
>> ERROR:  found xmin 1049082 from before relfrozenxid 2000506
>> CONTEXT:  while scanning block 0 of relation "public.simpsons"
>>
>> This is a fairly insane situation, because we should have relfrozenxid
>> < tuple xid < xid counter, but instead we have xid counter < tuple xid
>> < relfrozenxid, but it demonstrates that it's possible to have a
>> database which is sufficiently corrupt that you can't escape from the
>> new sanity checks using only INSERT, UPDATE, and DELETE.
>>
>> Now, an even easier way to create a table with a tuple that prevents
>> vacuuming and also can't just be deleted is to simply remove a
>> required pg_clog file (and maybe restart the server to clear out any
>> cached data in the SLRUs). What we typically do with customers who
>> need to recover from that situation today is give them a script to
>> fabricate a bogus CLOG file that shows all transactions as committed
>> (or, perhaps, aborted). But I think that the tools proposed on this
>> thread might be a better approach in certain cases. If the problem is
>> that a pg_clog file vanished, then recreating it with whatever content
>> you think is closest to what was probably there before is likely the
>> best you can do. But if you've got some individual tuples with crazy
>> xmin values, you don't really want to drop matching files in pg_clog;
>> it's better to fix the tuples.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>>


--

M Beena Emerson

Sr. Software Engineer


edbpostgres.com



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
Next
From: Peter Geoghegan
Date:
Subject: Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal