Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers

From Tom Lane
Subject Re: recovering from "found xmin ... from before relfrozenxid ..."
Date
Msg-id 665524.1599948007@sss.pgh.pa.us
Whole thread Raw
In response to Re: recovering from "found xmin ... from before relfrozenxid ..."  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: recovering from "found xmin ... from before relfrozenxid ..."  (Ashutosh Sharma <ashu.coek88@gmail.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> I have committed this version.

This failure says that the test case is not entirely stable:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2020-09-12%2005%3A13%3A12

diff -U3 /home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/expected/heap_surgery.out
/home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/results/heap_surgery.out
--- /home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/expected/heap_surgery.out    2020-09-11 06:31:36.000000000
+0000
+++ /home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/results/heap_surgery.out    2020-09-12 11:40:26.000000000
+0000
@@ -116,7 +116,6 @@
  vacuum freeze htab2;
  -- unused TIDs should be skipped
  select heap_force_kill('htab2'::regclass, ARRAY['(0, 2)']::tid[]);
- NOTICE:  skipping tid (0, 2) for relation "htab2" because it is marked unused
   heap_force_kill
  -----------------


sungazer's first run after pg_surgery went in was successful, so it's
not a hard failure.  I'm guessing that it's timing dependent.

The most obvious theory for the cause is that what VACUUM does with
a tuple depends on whether the tuple's xmin is below global xmin,
and a concurrent autovacuum could very easily be holding back global
xmin.  While I can't easily get autovac to run at just the right
time, I did verify that a concurrent regular session holding back
global xmin produces the symptom seen above.  (To replicate, insert
"select pg_sleep(...)" in heap_surgery.sql before "-- now create an unused
line pointer"; run make installcheck; and use the delay to connect
to the database manually, start a serializable transaction, and do
any query to acquire a snapshot.)

I suggest that the easiest way to make this test reliable is to
make the test tables be temp tables (which allows dropping the
autovacuum_enabled = off property, too).  In the wake of commit
a7212be8b, that should guarantee that vacuum has stable tuple-level
behavior regardless of what is happening concurrently.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Missing "Up" navigation link between parts and doc root?
Next
From: Pavel Stehule
Date:
Subject: How to get position in array with JSONPath