preserving forensic information when we freeze - Mailing list pgsql-hackers

From Robert Haas
Subject preserving forensic information when we freeze
Date
Msg-id CA+TgmoaEmnoLZmVbb8gvY69NA8zw9BWpiZ9+TLz-LnaBOZi7JA@mail.gmail.com
Whole thread Raw
Responses Re: preserving forensic information when we freeze
Re: preserving forensic information when we freeze
Re: preserving forensic information when we freeze
List pgsql-hackers
Various people, including at least Heikki, Andres, and myself, have
proposed various schemes for avoiding freezing that amount to doing it
sooner, when we're already writing WAL anyway, or at least when the
buffer is already dirty anyway, or at least while the buffer is
already in shared_buffers anyway.  Various people, including at least
Tom and Andres, have raised the point that this would lose
possibly-useful forensic information that they have in the past found
to be of tangible value in previous debugging of databases that have
somehow gotten messed up.  I don't know who originally proposed it,
but I've had many conversations about how we could address this
concern: instead of replacing xmin when we freeze, just set an
infomask bit that means "xmin is frozen" and leave the old, literal
xmin in place.  FrozenTransactionId would still exist and still be
understood, of course, but new freezing operations wouldn't use it.

I have attempted to implement this.  Trouble is, we're out of infomask
bits.  Using an infomask2 bit might work but we don't have many of
them left either, so it's worth casting about a bit for a better
solution.   Andres proposed using HEAP_MOVED_IN|HEAP_MOVED_OFF for
this purpose, but I think we're better off trying to reclaim those
bits in a future release.  Exactly how to do that is a topic for
another email, but I believe it's very possible.  What I'd like to
propose instead is using HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID to
indicate that xmin is frozen.  This bit pattern isn't used for
anything else, so there's no confusion possible with existing data
already on disk, but it's necessary to audit all code that checks
HEAP_XMIN_INVALID to make sure it doesn't get confused.  I've done
this, and there's little enough of it that it seems pretty easy to
handle.

A somewhat larger problem is that this requires auditing every place
that looks at a tuple xmin and deciding whether any changes are needed
to handle the possibility that the tuple may be frozen even though
xmin != FrozenTransactionId.  This is a somewhat more significant
change, but it doesn't seem to be too bad.  But there are a couple of
cases that are tricky enough that they seem worth expounding upon:

- When we follow HOT chains, we determine where the HOT chain ends by
matching the xmax of each tuple with the xmin of the next tuple.  If
they don't match, we conclude that the HOT chain has ended.  I
initially thought this logic might be buggy even as things stand today
if the latest tuple in the chain is frozen, but as Andres pointed out
to me, that's not so.  If the newest tuple in the chain is all-visible
(the earliest point at which we can theoretically freeze it), all
earlier tuples are dead altogether, and heap_page_prune() is always
called after locking the buffer and before freezing, so any tuple we
freeze must be the first in its HOT chain.  For the same reason, this
logic doesn't need any adjustment for the new freezing system: it's
never looking at anything old enough to be frozen in the first place.

- Various procedural languages use the combination of TID and XMIN to
determine whether a function needs to be recompiled.  Although the
possibility of this doing the wrong thing seems quite remote, it's not
obvious to me why it's theoretically correct even as things stand
today.  Suppose that previously-frozen tuple is vacuumed away and
another tuple is placed at the same TID and then frozen.  Then, we
check whether the cache is still valid and, behold, it is.  This would
actually get better with this patch, since it wouldn't be enough
merely for the old and new tuples to both be frozen; they'd have to
have had the same XID prior to freezing.  I think that could only
happen if a backend sticks around for at least 2^32 transactions, but
I don't know what would prevent it in that case.

- heap_get_latest_tid() appears broken even without this patch.  It's
only used on user-specified TIDs, either in a TID scan, or due to the
use of currtid_byreloid() and currtid_byrelname().  It attempts find
the latest version of the tuple referenced by the given TID by
following the CTID links.  Like HOT, it uses XMAX/XMIN matching to
detect when the chain is broken.  However, unlike HOT, update chains
can in general span multiple blocks.  It is not now nor has it ever
been safe to assume that the next tuple in the chain can't be frozen
before the previous one is vacuumed away.  Andres came up with the
best example: suppose the tuple to be frozen physically precedes its
predecessor; then, an in-progress vacuum might reach the to-be-frozen
tuple before it reaches (and removes) the previous row version.  In
newer releases, the same problem could be caused by vacuum's
occasional page-skipping behavior.  As with the previous point, the
"don't actually change xmin when we freeze" approach actually makes it
harder for a chain to get "broken" when it shouldn't, but I suspect
it's just moving us from one set of extremely-obscure failure cases to
another.

This patch probably needs some documentation updates.  Suggestions as
to where would be appreciated.

As a general statement, I view this work as something that is likely
needed no matter which one of the "remove freezing" approaches that
have been proposed we choose to adopt.  It does not fix anything in
and of itself, but it (hopefully) removes an objection to the entire
line of inquiry.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Cédric Villemain
Date:
Subject: Re: PostgreSQL 9.3 beta breaks some extensions "make install"
Next
From: Robert Haas
Date:
Subject: Re: Move unused buffers to freelist