Re: Multixid hindsight design - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Multixid hindsight design
Date
Msg-id CA+TgmoY8NMWnr8TaEnATV56y3NwyRZ0WFaAA9gSBz2Y61D7rxA@mail.gmail.com
Whole thread Raw
In response to Re: Multixid hindsight design  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Multixid hindsight design  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Fri, Jun 5, 2015 at 10:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> It would be a great deal nicer if we didn't have to keep ANY of the
> transactional data for a tuple around once it's all-visible.  Heikki
> defined ephemeral as "only needed when xmin or xmax is in-progress",
> but if we extended that definition slightly to "only needed when xmin
> or xmax is in-progress or commited but not all-visible" then the
> amount non-ephemeral data in the tuple header is 5 bytes (infomasks +
> t_hoff).

OK, I was wrong here: if you only have that stuff, you can't
distinguish between a tuple that is visible to everyone and a tuple
that is visible to no one.  I think the minimal amount of data we need
in order to distinguish visibility once no relevant transactions are
in progress is one XID: either XMIN, if the tuple was never updated at
all or only be the inserting transaction or one of its subxacts; or
XMAX, if the inserting transaction committed.  The other visibility
information -- including (1) the other of XMIN and XMAX, (2) CMIN and
CMAX, and (3) the CTID -- are only interesting the transactions
involved are no longer running and, if they committed, visible to all
running transactions.

Heikki's proposal is basically to merge the 4-byte CID field and the
first 4 bytes of the CTID that currently store the block number into
one 8-byte field that can store a pointer into this new TED structure.
And after mulling it over, that sounds pretty good to me.  It's true
(as has been pointed out by several people) that the TED will need to
be persistent because of prepared transactions.  But it would still be
a big improvement over the status quo, because:

(1) We would no longer need to freeze MultiXacts.  TED wouldn't need
to be frozen either.  You'd just truncate it whenever RecentGlobalXmin
advances.

(2) If the TED becomes horribly corrupted, you can recover by
committing or aborting any prepared transactions, shutting the system
down, and truncating it, with no loss of data integrity.  Nothing in
the TED is required to determine whether tuples are visible to an
unrelated transaction - you only need it (a) to determine whether
tuples are visible to a particular command within a transaction that
has inserted, updated, or deleted the tuple and (b) determine whether
tuples are locked.

(3) As a bonus, we'd eliminate combo CIDs, because the TED could have
space to separately store CMIN and CMAX.  Combo CIDs required special
handling for logical decoding, and they are one of the nastier
barriers to making parallelism support writes (because they are stored
in backend-local memory of unbounded size and therefore can't easily
be shared with workers), so it wouldn't be very sad if they went away.

I'm not quite sure how to decide whether something like this worth (a)
the work and (b) the risk of creating new bugs, but the more I think
about it, the more the principal of the thing seems sound to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: git push hook to check for outdated timestamps
Next
From: Robert Haas
Date:
Subject: Re: git push hook to check for outdated timestamps