Re: 64-bit XIDs again - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: 64-bit XIDs again
Date
Msg-id CAPpHfdtZ_gQBc0dy5raoudJOjN_Vo90HUSR3G=rMoxSKySr3=Q@mail.gmail.com
Whole thread Raw
In response to Re: 64-bit XIDs again  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: 64-bit XIDs again
List pgsql-hackers
On Fri, Jul 31, 2015 at 12:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
But the elephant in the room is on-disk compatibility.  There is
absolutely no way that we can just change xmin/xmax to 64 bits without a
disk format break.

That seems problematic. But I'm not yet convinced that there is absolutely no way to do this.
 
However, if we do something like what Heikki is
suggesting, it's at least conceivable that we could convert incrementally
(ie, if you find a page with the old header format, assume all tuples in
it are part of epoch 0; and do not insert new tuples into it unless there
is room to convert the header to new format ... but I'm not sure what we
do about tuple deletion if the old page is totally full and we need to
write an xmax that's past 4G).
 
If use upgrade database cluster with pg_upgrade, he would stop old postmaster, pg_upgrade, start new postmaster. That means we start from the point when there is no running transactions. Thus, between tuples of old format there are two kinds: visible for everybody and invisible for everybody. When update or delete old tuple of first kind, we actually don't need to store its xmin anymore. We can store 64bit xmax in the place of xmin/xmax.

So, in order to switch to 64bit xmin/xmax, we have to take both free bits form t_infomask2 in order to implements it. They should indicate one of 3 possible tuple formats:
1) Old format: both xmin/xmax are 32bit
2) Intermediate format: xmax is 64bit, xmin is frozen.
3) New format: both xmin/xmax are 64bit.

But we can use same idea to implement epoch in heap page header as well. If new page header doesn't fits the page, then we don't have to insert something to this page, we just need to set xmax and flags to existing tuples. Then we can use two format from listed above: #1 and #2, and take one free bit from t_infomask2 for format indication.

Probably I'm missing something, but I think keeping on-disk compatibility should be somehow possible.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

pgsql-hackers by date:

Previous
From: Jeremy Harris
Date:
Subject: Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"
Next
From: Michael Paquier
Date:
Subject: Re: Doubt about AccessExclusiveLock in ALTER TABLE .. SET ( .. );