Some thoughts on heaps and freezing - Mailing list pgsql-hackers

From Thomas Munro
Subject Some thoughts on heaps and freezing
Date
Msg-id CA+hUKGJKHJsdHaxZC8UWDka98GL54VEOo_93=8NPqEoCMv+DNA@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hello hackers,

Here's a set of ideas that I think could get rid of wraparound freezes
from the traditional heap, using undo logs technology.  They're
inspired by things that Heikki has said over various threads, adapted
to our proposed undo infrastructure.

1.  Don't freeze committed xids by brute force search.  Keep using the
same tuple header as today, but add a pair of 64 bit FullTrasactionIds
to the page header (low fxid, high fxid) so that xids are not
ambiguous even after they wrap around.  If you ever find yourself
updating high fxid to a value that is too far ahead of low fxid, you
need to do a micro-freeze of the page, but you were already writing to
the page so that's cool.

2.  Get rid of aborted xids eagerly, instead of relying on brute force
scans to move the horizon.  Remove the xid references at rollback time
with the undo machinery we've built for zheap.  While zheap uses undo
records to rollback the effects of a transaction (reversing in-place
updates etc), these would be very simple undo records that simply
remove item pointers relating to aborted transactions, so their xids
vanish from the heap.  Now the horizon for oldest aborted xid that you
can find anywhere in the system is oldest-xid-having-undo, which is
tracked by the undo machinery.  You don't need to keep more clog than
that AFAIK, other than to support the txid_status() function.

3.  Don't freeze multixacts by brute force search.  Instead, invent 64
bit multixacts and track (low fmxid, high fmxid) and do micro-freezing
on the page when the range would be too wide, as we did in point 1 for
xids.

4.  Get rid of multixacts eagerly.  Move the contents of
pg_mutixact/members into undo logs, using the new UNDO_SHARED records
that we invented at PGCon[1] for essentially the same purpose in
zheap.  This is storage that is automatically cleaned up by a "discard
worker" when every member of a set of xids is no longer running (and
it's a bit like the "TED" storage that Heikki talked about a few years
back[2]).  Keep pg_multixact/offsets, but change it to contain undo
record pointers that point to UNDO_SHARED records holding the members.
It is a map of multixact ID -> undo holding the members, and it needs
to exist only to preserve the 32 bit size of multixact IDs; it'd be
nicer to use the undo rec ptr directly, but the goal in this thought
experiment is to make minimal format changes to kill freezing (if you
want more drastic changes, see zheap).  Now you just have to figure
out how to trim pg_multixact/offsets, and I think that could be done
periodically by testing the oldest multixact it holds: has the undo
record it points to been discarded?  If so we can trim this multixact.

Finding room for 4 64 bit values on the page header is of course
tricky and incompatible with pg_upgrade, and hard to support
incrementally.  I also don't know exactly at which point you'd
consider high fxid in visibility computations, considering that in
places where you have a tuple pointer, you can't easily find the high
fxid you need.  One cute but scary idea is that when you're scanning
the heap you'd non-durably clobber xmin and xmax with
FrozenTrasactionId if appropriate.

[1] https://www.postgresql.org/message-id/CA+hUKGKni7EEU4FT71vZCCwPeaGb2PQOeKOFjQJavKnD577UMQ@mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/55511D1F.7050902%40iki.fi

-- 
Thomas Munro
https://enterprisedb.com



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [sqlsmith] Crash in mcv_get_match_bitmap
Next
From: Thomas Munro
Date:
Subject: Re: warning to publication created and wal_level is not set to logical