Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers

From Tom Lane
Subject Re: recovering from "found xmin ... from before relfrozenxid ..."
Date
Msg-id 686987.1600527170@sss.pgh.pa.us
Whole thread Raw
In response to Re: recovering from "found xmin ... from before relfrozenxid ..."  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: recovering from "found xmin ... from before relfrozenxid ..."  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Amit Kapila <amit.kapila16@gmail.com> writes:
> I think our assumption that changing the tests to have temp tables
> will make them safe w.r.t concurrent activity doesn't seem to be
> correct. We do set OldestXmin for temp tables aggressive enough that
> it allows us to remove all dead tuples but the test case behavior lies
> on whether we are able to prune the chain. AFAICS, we are using
> different cut-offs in heap_page_prune when it is called via
> lazy_scan_heap. So that seems to be causing both the failures.

Hm, reasonable theory.

I was able to partially reproduce whelk's failure here.  I got a
couple of cases of "cannot freeze committed xmax", which then leads
to the second NOTICE diff; but I couldn't reproduce the first
NOTICE diff.  That was out of about a thousand tries :-( so it's not
looking like a promising thing to reproduce without modifying the test.

I wonder whether "cannot freeze committed xmax" doesn't represent an
actual bug, ie is a7212be8b setting the cutoff *too* aggressively?
But if so, why's it so hard to reproduce?

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: XversionUpgrade tests broken by postfix operator removal
Next
From: Justin Pryzby
Date:
Subject: Re: please update ps display for recovery checkpoint