On Fri, Jul 31, 2015 at 12:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
But the elephant in the room is on-disk compatibility. There is absolutely no way that we can just change xmin/xmax to 64 bits without a disk format break.
That seems problematic. But I'm not yet convinced that there is absolutely no way to do this.
However, if we do something like what Heikki is suggesting, it's at least conceivable that we could convert incrementally (ie, if you find a page with the old header format, assume all tuples in it are part of epoch 0; and do not insert new tuples into it unless there is room to convert the header to new format ... but I'm not sure what we do about tuple deletion if the old page is totally full and we need to write an xmax that's past 4G).
If use upgrade database cluster with pg_upgrade, he would stop old postmaster, pg_upgrade, start new postmaster. That means we start from the point when there is no running transactions. Thus, between tuples of old format there are two kinds: visible for everybody and invisible for everybody. When update or delete old tuple of first kind, we actually don't need to store its xmin anymore. We can store 64bit xmax in the place of xmin/xmax.
So, in order to switch to 64bit xmin/xmax, we have to take both free bits form t_infomask2 in order to implements it. They should indicate one of 3 possible tuple formats:
1) Old format: both xmin/xmax are 32bit
2) Intermediate format: xmax is 64bit, xmin is frozen.
3) New format: both xmin/xmax are 64bit.
But we can use same idea to implement epoch in heap page header as well. If new page header doesn't fits the page, then we don't have to insert something to this page, we just need to set xmax and flags to existing tuples. Then we can use two format from listed above: #1 and #2, and take one free bit from t_infomask2 for format indication.
Probably I'm missing something, but I think keeping on-disk compatibility should be somehow possible.