On Thu, Nov 2, 2017 at 4:20 AM, Andres Freund <andres@anarazel.de> wrote:
> I think the problem is on the pruning, rather than the freezing side. We
> can't freeze a tuple if it has an alive predecessor - rather than
> weakining this, we should be fixing the pruning to not have the alive
> predecessor.
Excellent catch.
> If the update xmin is actually below the cutoff we can remove the tuple
> even if there's live lockers - the lockers will also be present in the
> newer version of the tuple. I verified that for me that fixes the
> problem. Obviously that'd require some comment work and more careful
> diagnosis.
I didn't even know that that was safe.
> I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in
> the face of previously corrupted tuple chains and pg_upgraded clusters -
> it can lead to tuples being considered related, even though they they're
> from entirely independent hot chains. Especially when upgrading 9.3 post
> your fix, to current releases.
Frankly, I'm relieved that you got to this. I was highly suspicious of
a5736bf754c82d8b86674e199e232096c679201d, even beyond my specific,
actionable concern about how it failed to handle the
9.3/FrozenTransactionId xmin case as special. As I went into in the
"heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead
bug" thread, these commits left us with a situation where there didn't
seem to be a reliable way of knowing whether or not it is safe to
interrogate clog for a given heap tuple using a tool like amcheck.
And, it wasn't obvious that you couldn't have a codepath that failed
to account for pre-cutoff non-frozen tuples -- codepaths that call
TransactionIdDidCommit() despite it actually being unsafe.
If I'm not mistaken, your proposed fix restores sanity there.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers