Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers
From | MBeena Emerson |
---|---|
Subject | Re: recovering from "found xmin ... from before relfrozenxid ..." |
Date | |
Msg-id | CANPX-3WnCi64W9n1Gesxhh66rmdCyaSHMc1TrodbrS1M5CrMSg@mail.gmail.com Whole thread Raw |
In response to | Re: recovering from "found xmin ... from before relfrozenxid ..." (Ashutosh Sharma <ashu.coek88@gmail.com>) |
Responses |
Re: recovering from "found xmin ... from before relfrozenxid ..."
(Ashutosh Sharma <ashu.coek88@gmail.com>)
|
List | pgsql-hackers |
Hello Ashutosh, On Fri, 24 Jul 2020 at 14:35, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > Hi All, > > Attached is the patch that adds heap_force_kill(regclass, tid[]) and heap_force_freeze(regclass, tid[]) functions whichRobert mentioned in the first email in this thread. The patch basically adds an extension named pg_surgery that containsthese functions. Please have a look and let me know your feedback. Thank you. > Thanks for the patch. 1. We would be marking buffer dirty and writing wal even if we have not done any changes( ex if we pass invalid/dead tids). Maybe we can handle this better? cosmetic changes 1. Maybe "HTupleSurgicalOption" instead of "HTupleForceOption" and also the variable names could use surgery instead. 2. extension comment pg_surgery.control "extension to perform surgery the damaged heap table" -> "extension to perform surgery on the damaged heap table" > On Thu, Jul 16, 2020 at 9:44 PM Robert Haas <robertmhaas@gmail.com> wrote: >> >> On Thu, Jul 16, 2020 at 10:00 AM Robert Haas <robertmhaas@gmail.com> wrote: >> > I see your point, though: the tuple has to be able to survive >> > HOT-pruning in order to cause a problem when we check whether it needs >> > freezing. >> >> Here's an example where the new sanity checks fail on an invisible >> tuple without any concurrent transactions: >> >> $ initdb >> $ pg_ctl start -l ~/logfile >> $ createdb >> $ psql >> >> create table simpsons (a int, b text); >> vacuum freeze; >> >> $ cat > txid.sql >> select txid_current(); >> $ pgbench -t 131072 -c 8 -j 8 -n -f txid.sql >> $ psql >> >> insert into simpsons values (1, 'homer'); >> >> $ pg_ctl stop >> $ pg_resetwal -x 1000 $PGDATA >> $ pg_ctl start -l ~/logfile >> $ psql >> >> update pg_class set relfrozenxid = (relfrozenxid::text::integer + >> 2000000)::text::xid where relname = 'simpsons'; >> >> rhaas=# select * from simpsons; >> a | b >> ---+--- >> (0 rows) >> >> rhaas=# vacuum simpsons; >> ERROR: found xmin 1049082 from before relfrozenxid 2000506 >> CONTEXT: while scanning block 0 of relation "public.simpsons" >> >> This is a fairly insane situation, because we should have relfrozenxid >> < tuple xid < xid counter, but instead we have xid counter < tuple xid >> < relfrozenxid, but it demonstrates that it's possible to have a >> database which is sufficiently corrupt that you can't escape from the >> new sanity checks using only INSERT, UPDATE, and DELETE. >> >> Now, an even easier way to create a table with a tuple that prevents >> vacuuming and also can't just be deleted is to simply remove a >> required pg_clog file (and maybe restart the server to clear out any >> cached data in the SLRUs). What we typically do with customers who >> need to recover from that situation today is give them a script to >> fabricate a bogus CLOG file that shows all transactions as committed >> (or, perhaps, aborted). But I think that the tools proposed on this >> thread might be a better approach in certain cases. If the problem is >> that a pg_clog file vanished, then recreating it with whatever content >> you think is closest to what was probably there before is likely the >> best you can do. But if you've got some individual tuples with crazy >> xmin values, you don't really want to drop matching files in pg_clog; >> it's better to fix the tuples. >> >> -- >> Robert Haas >> EnterpriseDB: http://www.enterprisedb.com >> The Enterprise PostgreSQL Company >> >> -- M Beena Emerson Sr. Software Engineer edbpostgres.com
pgsql-hackers by date: