Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple - Mailing list pgsql-committers

From Andres Freund
Subject Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple
Date
Msg-id 20171109222431.qwj6dtl5chumc6lt@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-committers
On 2017-11-04 06:15:00 -0700, Andres Freund wrote:
> The reason for that is that I hadn't yet quite figured out how the bug I
> described in the commit message (and the previously committed testcase)
> would cause that. I was planning to diagnose / experiment more with this
> and write an email if I couldn't come up with an explanation.   The
> committed test does *not* actually trigger that.
> 
> The reason I couldn't quite figure out how the problem triggers is that
> [ long explanation ]

Attached is a version of the already existing regression test that both
reproduces the broken hot chain (and thus failing index lookups) and
then also the tuple reviving.  I don't see any need for letting this run
with arbitrary permutations.

Thanks to whoever allowed isolationtester permutations to go over
multiple lines and allow comments. I was wondering about adding that as
a feature just to discover it's already there ;)


What I'm currently wondering about is how much we need to harden
postgres against such existing corruption. If e.g. the hot chains are
broken somebody might have reindexed thinking the problem is fixed - but
if they then later vacuum everything goes to shit again, with dead rows
reappearing.  There's no way we can fix hot chains after the fact, but
preventing dead rows from reapparing seems important.  A minimal version
of that is fairly easy - we slap a bunch of if if
!TransactionIdDidCommit() elog(ERROR) at various code paths. But that'll
often trigger clog access errors when the problem occurred - if we want
to do better we need to pass down relfrozenxid/relminmxid to a few
functions.  I'm inclined to do so, but it'll make the patch larger...

Comments?

Greetings,

Andres Freund

-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Attachment

pgsql-committers by date:

Previous
From: Peter Eisentraut
Date:
Subject: [COMMITTERS] pgsql: Remove junk left from DSSSL to XSL conversion
Next
From: Robert Haas
Date:
Subject: [COMMITTERS] pgsql: Add hash partitioning.