Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers

From Ashutosh Sharma
Subject Re: recovering from "found xmin ... from before relfrozenxid ..."
Date
Msg-id CAE9k0P=9Lu6GWFsWmBDEz4H6sfn=cfRmHCBgxzeihpKn4CNYpA@mail.gmail.com
Whole thread Raw
In response to Re: recovering from "found xmin ... from before relfrozenxid ..."  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: recovering from "found xmin ... from before relfrozenxid ..."  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sun, Sep 13, 2020 at 3:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > I have committed this version.
>
> This failure says that the test case is not entirely stable:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2020-09-12%2005%3A13%3A12
>
> diff -U3 /home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/expected/heap_surgery.out
/home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/results/heap_surgery.out
> --- /home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/expected/heap_surgery.out   2020-09-11 06:31:36.000000000
+0000
> +++ /home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/results/heap_surgery.out    2020-09-12 11:40:26.000000000
+0000
> @@ -116,7 +116,6 @@
>   vacuum freeze htab2;
>   -- unused TIDs should be skipped
>   select heap_force_kill('htab2'::regclass, ARRAY['(0, 2)']::tid[]);
> - NOTICE:  skipping tid (0, 2) for relation "htab2" because it is marked unused
>    heap_force_kill
>   -----------------
>
>
> sungazer's first run after pg_surgery went in was successful, so it's
> not a hard failure.  I'm guessing that it's timing dependent.
>
> The most obvious theory for the cause is that what VACUUM does with
> a tuple depends on whether the tuple's xmin is below global xmin,
> and a concurrent autovacuum could very easily be holding back global
> xmin.  While I can't easily get autovac to run at just the right
> time, I did verify that a concurrent regular session holding back
> global xmin produces the symptom seen above.  (To replicate, insert
> "select pg_sleep(...)" in heap_surgery.sql before "-- now create an unused
> line pointer"; run make installcheck; and use the delay to connect
> to the database manually, start a serializable transaction, and do
> any query to acquire a snapshot.)
>

Thanks for reporting. I'm able to reproduce the issue by creating some
delay just before "-- now create an unused line pointer" and use the
delay to start a new session either with repeatable read or
serializable transaction isolation level and run some query on the
test table. To fix this, as you suggested I've converted the test
table to the temp table. Attached is the patch with the changes.
Please have a look and let me know about any concerns.

Thanks,

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Surafel Temesgen
Date:
Subject: Re: pg_dump --where option
Next
From: Amit Kapila
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions