Re: Add 64-bit XIDs into PostgreSQL 15 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Add 64-bit XIDs into PostgreSQL 15
Date
Msg-id a2b83417-b4ed-47ff-b312-025174424dfd@iki.fi
Whole thread Raw
In response to Re: Add 64-bit XIDs into PostgreSQL 15  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Add 64-bit XIDs into PostgreSQL 15
List pgsql-hackers
On 07/02/2026 17:54, Robert Haas wrote:
> It's worth considering why this patch set hasn't made more progress up
> until this point. It could be simply that the patch set is big and
> nobody quite has time to review it thorougly. However, it's my
> observation that when there's a patch set floating around for years
> that fixes a problem that other committers know to be important and
> yet it doesn't get committed, it's often a sign that some committers
> have taken a peek at the patch set and don't believe that it's taking
> the right approach, or don't believe the code is viable, or believe
> that fixing the code to be viable would require way more work than
> they can justify putting into someone else's patch. I've seen cases
> where committers have explicitly said this on list and the patch
> authors just keep posting new versions anyway -- but I'm sure it also
> happens that people don't want to offend the patch author or get into
> an argument and just quietly move on to other things. It would be nice
> to hear from some other committers or senior hackers whether they've
> studied this patch set and whether they believe it to be something
> that is in a shape that we could consider proceeding with it or not.
> For myself, I have not studied it.

FWIW I've looked at this several times in the past. It's a big patch and 
I haven't had the energy to review it thoroughly enough to actually 
commit. As you said, it's scary because of the risk of corruption. But I 
think this is viable and basically the right approach.

The thing I like least about this is how the upgrade works, i.e. the 
conversion code and the "double xmax" hack. This would look much nicer 
if we could start from clean slate and just add the fields we need to 
the page header. However, upgrade is important, that point has been 
discussed a lot on the list, and I don't have any better ideas. I think 
it's as good as it gets at the high level.

(I'm talking about the high-level design here. There are a lot of small 
issues here and there and tons of cleanup needed, like all the stuff 
that you just pointed out.)

> +At first read of a heap page after pg_upgrade from 32-bit XID PostgreSQL
> +version pd_special area with a size of 16 bytes should be added to a page.
> +Though a page may not have space for this. Then it can be converted to a
> +temporary format called "double XMAX".
> 
> This section generally makes sense to me but fails to explain how we
> know that a given tuple is in double XMAX format. Does that use an
> infomask bit or what? Consuming an infomask bit might be objectionable
> to some hackers, as they are a precious and *extremely* limited
> resource.
> 
> I am not at all convinced that we should use 16 bytes for this. It
> seems to me that it would be a lot simpler to just store the epoch in
> the page header (and the equivalent thing for MXIDs). I think actually
> using exactly 8 bytes is not appealing, because if we insist that
> every tuple on a page has to be from the same epoch, then that means
> that when the epoch changes, the next change to every single page in
> the system will have to rejigger the whole page, which might suck.

Moreover, you simply cannot insist that every tuple on the page is from 
the same epoch. If the page contains a tuple with an xmin from previous 
epoch, which is not yet visible to all snapshots, and you want to insert 
a new tuple to it with the new epoch, what do you do? You can't freeze 
the existing tuple's xmin yet.

> But what I have discussed in the past (and I think on the list) is
> the idea of using half-epochs: the page header says something like
> "epoch 1234, first half" or "epoch 1234, second half". In the first
> case, all XIDs observed on the page are in epoch 1234. In the second
> case, any XIDs < 2^31 are in epoch 1234 and any > 2^31 are in epoch
> 1235. That way, pages where all tuples are relatively recent will
> almost never need updating when we advance into a new half-epoch:
> most of the time, we'll just be able to bump to the next half-epoch
> without changing any tuples.

Hmm, so that's equivalent to storing a 33-bit base XID, instead of a 
64-bit base. Yeah, I think that works, that can represent any two XIDs 
that are less than 2^31 XIDs apart.

> Another way of thinking about this is that we don't really need a
> 64-bit base XID, because we don't need that much precision. It's fine
> to say that the base XID always has to be a multiple of 2^31 or 2^32,
> meaning that we only need 32 or 33 bits for it. We can save a number
> of bytes in every page this way, and I think the logic will be simpler
> as well.

Right. 32 bits is not enough, but 33 is.

> I assumed that our plan would be to continue to restrict the range
> of *running* transactions to no more than 2^31 XIDs from oldest to 
> newest. On disk, we'd allow older XIDs to exist, but any time we
> move the page-level base XID, we know that those older tuples are
> eligible to be frozen, and we freeze them. That way, nothing in
> memory needs any changes from the way it works now.

That's been my assumption too.

- Heikki



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Small fixes for incorrect error messages
Next
From: Heikki Linnakangas
Date:
Subject: Re: Add 64-bit XIDs into PostgreSQL 15