Re: preserving forensic information when we freeze - Mailing list pgsql-hackers

From Andres Freund
Subject Re: preserving forensic information when we freeze
Date
Msg-id 20130529000042.GC818@awork2.anarazel.de
Whole thread Raw
In response to preserving forensic information when we freeze  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: preserving forensic information when we freeze
Re: preserving forensic information when we freeze
List pgsql-hackers
On 2013-05-28 09:21:27 -0400, Robert Haas wrote:
> I have attempted to implement this.  Trouble is, we're out of infomask
> bits.  Using an infomask2 bit might work but we don't have many of
> them left either, so it's worth casting about a bit for a better
> solution.   Andres proposed using HEAP_MOVED_IN|HEAP_MOVED_OFF for
> this purpose, but I think we're better off trying to reclaim those
> bits in a future release.  Exactly how to do that is a topic for
> another email, but I believe it's very possible.  What I'd like to
> propose instead is using HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID to
> indicate that xmin is frozen.  This bit pattern isn't used for
> anything else, so there's no confusion possible with existing data
> already on disk, but it's necessary to audit all code that checks
> HEAP_XMIN_INVALID to make sure it doesn't get confused.  I've done
> this, and there's little enough of it that it seems pretty easy to
> handle.

I only suggested MOVED_IN/OFF because I didn't remember
HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
combination could have been produced in the past in a way I couldn't
find so far, I am all for it. I don't see a problem with breaking
backward compatibility on that level and I don't think there's any way
to get there without some level of low level compat break.

> - When we follow HOT chains, we determine where the HOT chain ends by
> matching the xmax of each tuple with the xmin of the next tuple.  If
> they don't match, we conclude that the HOT chain has ended.  I
> initially thought this logic might be buggy even as things stand today
> if the latest tuple in the chain is frozen, but as Andres pointed out
> to me, that's not so.  If the newest tuple in the chain is all-visible
> (the earliest point at which we can theoretically freeze it), all
> earlier tuples are dead altogether, and heap_page_prune() is always
> called after locking the buffer and before freezing, so any tuple we
> freeze must be the first in its HOT chain.  For the same reason, this
> logic doesn't need any adjustment for the new freezing system: it's
> never looking at anything old enough to be frozen in the first place.

I am all for adding a comment why this is safe though. We thought about
it for some while before we were convinced...

> - Various procedural languages use the combination of TID and XMIN to
> determine whether a function needs to be recompiled.  Although the
> possibility of this doing the wrong thing seems quite remote, it's not
> obvious to me why it's theoretically correct even as things stand
> today.  Suppose that previously-frozen tuple is vacuumed away and
> another tuple is placed at the same TID and then frozen.  Then, we
> check whether the cache is still valid and, behold, it is.  This would
> actually get better with this patch, since it wouldn't be enough
> merely for the old and new tuples to both be frozen; they'd have to
> have had the same XID prior to freezing.  I think that could only
> happen if a backend sticks around for at least 2^32 transactions, but
> I don't know what would prevent it in that case.

Hm. As previously said, I am less than convinced of those adhoc
mechanisms and I think this should get properly integrated into the
normal cache invalidation mechanisms.
But: I think this is safe since we compare the stored/cached xmin/tid
with one gotten from the SearchSysCache just before which will point to
the correct location as of the last AcceptInvalidationMessages(). I
can't think of a way were this would allow the case you describe.

> - heap_get_latest_tid() appears broken even without this patch.  It's
> only used on user-specified TIDs, either in a TID scan, or due to the
> use of currtid_byreloid() and currtid_byrelname().  It attempts find
> the latest version of the tuple referenced by the given TID by
> following the CTID links.  Like HOT, it uses XMAX/XMIN matching to
> detect when the chain is broken.  However, unlike HOT, update chains
> can in general span multiple blocks.  It is not now nor has it ever
> been safe to assume that the next tuple in the chain can't be frozen
> before the previous one is vacuumed away.  Andres came up with the
> best example: suppose the tuple to be frozen physically precedes its
> predecessor; then, an in-progress vacuum might reach the to-be-frozen
> tuple before it reaches (and removes) the previous row version.  In
> newer releases, the same problem could be caused by vacuum's
> occasional page-skipping behavior.  As with the previous point, the
> "don't actually change xmin when we freeze" approach actually makes it
> harder for a chain to get "broken" when it shouldn't, but I suspect
> it's just moving us from one set of extremely-obscure failure cases to
> another.

I don't think this is especially problematic though. If you do a tidscan
starting from a tid that is so old that it can be removed: you're doing
it wrong. The tid could have been reused for something else anyway. I
think the ctid chaining is only meaningful if the tuple got updated very
recently, i.e. you hold a snapshot that prevents the removal of the
root tuple's snapshot.

Nice work!

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: getting rid of freezing
Next
From: Josh Berkus
Date:
Subject: Re: commit fest schedule for 9.4