Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)? - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?
Date
Msg-id 4ff4d6d3-7b3e-584a-8aea-e4c59ae95588@iki.fi
Whole thread Raw
In response to Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?
Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?
List pgsql-hackers
On 06/06/2017 07:24 AM, Ashutosh Bapat wrote:
> On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> On 6 June 2017 at 12:13, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
>>
>>> What happens when the epoch is so low that the rest of the XID does
>>> not fit in 32bits of tuple header? Or such a case should never arise?
>>
>> Storing an epoch implies that rows can't have (xmin,xmax) different by
>> more than one epoch. So if you're updating/deleting an extremely old
>> tuple you'll presumably have to set xmin to FrozenTransactionId if it
>> isn't already, so you can set a new epoch and xmax.
>
> If the page has multiple such tuples, updating one tuple will mean
> updating headers of other tuples as well? This means that those tuples
> need to be locked for concurrent scans? May be not, since such tuples
> will be anyway visible to any concurrent scans and updating xmin/xmax
> doesn't change the visibility. But we might have to prevent multiple
> updates to the xmin/xmax because of concurrent updates on the same
> page.

"Store the epoch in the page header" is actually a slightly 
simpler-to-visualize, but incorrect, version of what we actually need to 
do. If you only store the epoch, then all the XIDs on a page need to 
belong to the same epoch, which causes trouble when the current epoch 
changes. Just after the epoch changes, you cannot necessarily freeze all 
the tuples from the previous epoch, because they would not yet be 
visible to everyone.

The full picture is that we need to store one 64-bit XID "base" value in 
the page header, and all the xmin/xmax values in the tuple headers are 
offsets relative to that base. With that, you effectively have 64-bit 
XIDs, as long as the *difference* between any two XIDs on a page is not 
greater than 2^32. That can be guaranteed, as long as we don't allow a 
transaction to be in-progress for more than 2^32 XIDs. That seems like a 
reasonable limitation.

But yes, when the "current XID - base XID in page header" becomes 
greater than 2^32, and you need to update a tuple on that page, you need 
to first freeze the page, update the base XID on the page header to a 
more recent value, and update the XID offsets on every tuple on the page 
accordingly. And to do that, you need to hold a lock on the page. If you 
don't move any tuples around at the same time, but just update the XID 
fields, and exclusive lock on the page is enough, i.e. you don't need to 
take a super-exclusive or vacuum lock. In any case, it happens so 
infrequently that it should not become a serious burden.

- Heikki




pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: [HACKERS] PG10 transition tables, wCTEs and multiple operationson the same table
Next
From: Noah Misch
Date:
Subject: [HACKERS] Re: transition table behavior with inheritance appears broken (was:Declarative partitioning - another take)