Thread: [HACKERS] Challenges preventing us moving to 64 bit transaction id (XID)?

[HACKERS] Challenges preventing us moving to 64 bit transaction id (XID)?

From
Tianzhou Chen
Date:
Hi Pg Hackers,

    XID wraparound seems to be quite a big concern and we introduce changes like “adding another frozen bit to each page” [http://rhaas.blogspot.com/2016/03/no-more-full-table-vacuums.html to tackle this. I am just wondering what’s the challenges preventing us from moving to 64 bit xid?  This is the previous thread I find https://www.postgresql.org/message-id/CAEYLb_UfC%2BHZ4RAP7XuoFZr%2B2_ktQmS9xqcQgE-rNf5UCqEt5A%40mail.gmail.com, the only answer there is:

The most obvious reason for not using 64-bit xid values is that they
require more storage than 32-bit values. There is a patch floating
around that makes it safe to not forcibly safety shutdown the server
where currently it is necessary, but it doesn't work by making xids
64-bit.

"
   
    I am personally not quite convinced that is the main reason, since I feel for database hitting this issue, the schema is mostly non-trivial and doesn’t matter so much with 8 more bytes. Could some postgres experts share more insights about the challenges?


Thanks
Tianzhou

Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Heikki Linnakangas
Date:
On 06/05/2017 11:49 AM, Tianzhou Chen wrote:
> Hi Pg Hackers,
>
> XID wraparound seems to be quite a big concern and we introduce
> changes like “adding another frozen bit to each page”
> [http://rhaas.blogspot.com/2016/03/no-more-full-table-vacuums.html
> <http://rhaas.blogspot.com/2016/03/no-more-full-table-vacuums.html>
> to tackle this. I am just wondering what’s the challenges preventing
> us from moving to 64 bit xid?  This is the previous thread I find
> https://www.postgresql.org/message-id/CAEYLb_UfC%2BHZ4RAP7XuoFZr%2B2_ktQmS9xqcQgE-rNf5UCqEt5A%40mail.gmail.com
> <https://www.postgresql.org/message-id/CAEYLb_UfC+HZ4RAP7XuoFZr+2_ktQmS9xqcQgE-rNf5UCqEt5A@mail.gmail.com>,
> the only answer there is:
>
> “ The most obvious reason for not using 64-bit xid values is that
> they require more storage than 32-bit values. There is a patch
> floating around that makes it safe to not forcibly safety shutdown
> the server where currently it is necessary, but it doesn't work by
> making xids 64-bit.
> "
>
> I am personally not quite convinced that is the main reason, since I
> feel for database hitting this issue, the schema is mostly
> non-trivial and doesn’t matter so much with 8 more bytes. Could some
> postgres experts share more insights about the challenges?

That quote is accurate. We don't want to just expand XIDs to 64 bits, 
because it would significantly bloat the tuple header. PostgreSQL's 
per-tuple overhead is already quite large, compared to many other systems.

The most promising approach to tackle this is to switch to 64-bit XIDs 
in in-memory structures, and add some kind of an extra epoch field to 
the page header. That would effectively give you 64-bit XIDs, but would 
only add one a field to each page, not every tuple.

- Heikki



Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Ashutosh Bapat
Date:
On Mon, Jun 5, 2017 at 2:38 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 06/05/2017 11:49 AM, Tianzhou Chen wrote:
>>
>> Hi Pg Hackers,
>>
>> XID wraparound seems to be quite a big concern and we introduce
>> changes like “adding another frozen bit to each page”
>> [http://rhaas.blogspot.com/2016/03/no-more-full-table-vacuums.html
>> <http://rhaas.blogspot.com/2016/03/no-more-full-table-vacuums.html>
>> to tackle this. I am just wondering what’s the challenges preventing
>> us from moving to 64 bit xid?  This is the previous thread I find
>>
>> https://www.postgresql.org/message-id/CAEYLb_UfC%2BHZ4RAP7XuoFZr%2B2_ktQmS9xqcQgE-rNf5UCqEt5A%40mail.gmail.com
>>
>> <https://www.postgresql.org/message-id/CAEYLb_UfC+HZ4RAP7XuoFZr+2_ktQmS9xqcQgE-rNf5UCqEt5A@mail.gmail.com>,
>> the only answer there is:
>>
>> “ The most obvious reason for not using 64-bit xid values is that
>> they require more storage than 32-bit values. There is a patch
>> floating around that makes it safe to not forcibly safety shutdown
>> the server where currently it is necessary, but it doesn't work by
>> making xids 64-bit.
>> "
>>
>> I am personally not quite convinced that is the main reason, since I
>> feel for database hitting this issue, the schema is mostly
>> non-trivial and doesn’t matter so much with 8 more bytes. Could some
>> postgres experts share more insights about the challenges?
>
>
> That quote is accurate. We don't want to just expand XIDs to 64 bits,
> because it would significantly bloat the tuple header. PostgreSQL's
> per-tuple overhead is already quite large, compared to many other systems.
>
> The most promising approach to tackle this is to switch to 64-bit XIDs in
> in-memory structures, and add some kind of an extra epoch field to the page
> header. That would effectively give you 64-bit XIDs, but would only add one
> a field to each page, not every tuple.
>

What happens when the epoch is so low that the rest of the XID does
not fit in 32bits of tuple header? Or such a case should never arise?
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



On 6 June 2017 at 12:13, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

> What happens when the epoch is so low that the rest of the XID does
> not fit in 32bits of tuple header? Or such a case should never arise?

Storing an epoch implies that rows can't have (xmin,xmax) different by
more than one epoch. So if you're updating/deleting an extremely old
tuple you'll presumably have to set xmin to FrozenTransactionId if it
isn't already, so you can set a new epoch and xmax.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Ashutosh Bapat
Date:
On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 6 June 2017 at 12:13, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
>
>> What happens when the epoch is so low that the rest of the XID does
>> not fit in 32bits of tuple header? Or such a case should never arise?
>
> Storing an epoch implies that rows can't have (xmin,xmax) different by
> more than one epoch. So if you're updating/deleting an extremely old
> tuple you'll presumably have to set xmin to FrozenTransactionId if it
> isn't already, so you can set a new epoch and xmax.

If the page has multiple such tuples, updating one tuple will mean
updating headers of other tuples as well? This means that those tuples
need to be locked for concurrent scans? May be not, since such tuples
will be anyway visible to any concurrent scans and updating xmin/xmax
doesn't change the visibility. But we might have to prevent multiple
updates to the xmin/xmax because of concurrent updates on the same
page.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
> On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> Storing an epoch implies that rows can't have (xmin,xmax) different by
>> more than one epoch. So if you're updating/deleting an extremely old
>> tuple you'll presumably have to set xmin to FrozenTransactionId if it
>> isn't already, so you can set a new epoch and xmax.

> If the page has multiple such tuples, updating one tuple will mean
> updating headers of other tuples as well? This means that those tuples
> need to be locked for concurrent scans?

Locks for tuple header updates are taken at page level anyway, so in
principle you could run around and freeze other tuples on the page
anytime you had to change the page's high-order-XID value.  Holding
the lock for long enough to do that is slightly annoying, but it
should happen so seldom as to not represent a real performance problem.

In my mind the harder problem is where to find another 32 bits for the
new page header field.  You could convert the header format on-the-fly
if there's free space in the page, but what if there isn't?
        regards, tom lane



Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Ashutosh Bapat
Date:
On Tue, Jun 6, 2017 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
>> On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>>> Storing an epoch implies that rows can't have (xmin,xmax) different by
>>> more than one epoch. So if you're updating/deleting an extremely old
>>> tuple you'll presumably have to set xmin to FrozenTransactionId if it
>>> isn't already, so you can set a new epoch and xmax.
>
>> If the page has multiple such tuples, updating one tuple will mean
>> updating headers of other tuples as well? This means that those tuples
>> need to be locked for concurrent scans?
>
> Locks for tuple header updates are taken at page level anyway, so in
> principle you could run around and freeze other tuples on the page
> anytime you had to change the page's high-order-XID value.  Holding
> the lock for long enough to do that is slightly annoying, but it
> should happen so seldom as to not represent a real performance problem.
>
> In my mind the harder problem is where to find another 32 bits for the
> new page header field.  You could convert the header format on-the-fly
> if there's free space in the page, but what if there isn't?

I guess, we will have to reserve 32 bits in the header. That's much
better than increasing tuple header by 32 bits.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



On 6 June 2017 at 12:38, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
> On Tue, Jun 6, 2017 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
>>> On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>>>> Storing an epoch implies that rows can't have (xmin,xmax) different by
>>>> more than one epoch. So if you're updating/deleting an extremely old
>>>> tuple you'll presumably have to set xmin to FrozenTransactionId if it
>>>> isn't already, so you can set a new epoch and xmax.
>>
>>> If the page has multiple such tuples, updating one tuple will mean
>>> updating headers of other tuples as well? This means that those tuples
>>> need to be locked for concurrent scans?
>>
>> Locks for tuple header updates are taken at page level anyway, so in
>> principle you could run around and freeze other tuples on the page
>> anytime you had to change the page's high-order-XID value.  Holding
>> the lock for long enough to do that is slightly annoying, but it
>> should happen so seldom as to not represent a real performance problem.
>>
>> In my mind the harder problem is where to find another 32 bits for the
>> new page header field.  You could convert the header format on-the-fly
>> if there's free space in the page, but what if there isn't?
>
> I guess, we will have to reserve 32 bits in the header. That's much
> better than increasing tuple header by 32 bits.

Tom's point is, I think, that we'll want to stay pg_upgrade
compatible. So when we see a pg10 tuple and want to add a new page
with a new page header that has an epoch, but the whole page is full
so there isn't 32 bits left to move tuples "down" the page, what do we
do?





-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Bruce Momjian
Date:
On Tue, Jun  6, 2017 at 06:00:54PM +0800, Craig Ringer wrote:
> On 6 June 2017 at 12:38, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
> > On Tue, Jun 6, 2017 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> In my mind the harder problem is where to find another 32 bits for the
> >> new page header field.  You could convert the header format on-the-fly
> >> if there's free space in the page, but what if there isn't?
> >
> > I guess, we will have to reserve 32 bits in the header. That's much
> > better than increasing tuple header by 32 bits.
> 
> Tom's point is, I think, that we'll want to stay pg_upgrade
> compatible. So when we see a pg10 tuple and want to add a new page
> with a new page header that has an epoch, but the whole page is full
> so there isn't 32 bits left to move tuples "down" the page, what do we
> do?

I guess I am missing something.  If you see an old page version number,
you know none of the tuples are from running transactions so you can
just freeze them all, after consulting the pg_clog.  What am I missing?
If the page is full, why are you trying to add to the page?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Peter Eisentraut
Date:
On 6/6/17 08:29, Bruce Momjian wrote:
> On Tue, Jun  6, 2017 at 06:00:54PM +0800, Craig Ringer wrote:
>> Tom's point is, I think, that we'll want to stay pg_upgrade
>> compatible. So when we see a pg10 tuple and want to add a new page
>> with a new page header that has an epoch, but the whole page is full
>> so there isn't 32 bits left to move tuples "down" the page, what do we
>> do?
> 
> I guess I am missing something.  If you see an old page version number,
> you know none of the tuples are from running transactions so you can
> just freeze them all, after consulting the pg_clog.  What am I missing?
> If the page is full, why are you trying to add to the page?

The problem is if you want to delete from such a page.  Then you need to
update the tuple's xmax and stick the new xid epoch somewhere.

We had an unconference session at PGCon about this.  These issues were
all discussed and some ideas were thrown around.  We can expect a patch
to appear soon, I think.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Bruce Momjian
Date:
On Tue, Jun  6, 2017 at 09:05:03AM -0400, Peter Eisentraut wrote:
> On 6/6/17 08:29, Bruce Momjian wrote:
> > On Tue, Jun  6, 2017 at 06:00:54PM +0800, Craig Ringer wrote:
> >> Tom's point is, I think, that we'll want to stay pg_upgrade
> >> compatible. So when we see a pg10 tuple and want to add a new page
> >> with a new page header that has an epoch, but the whole page is full
> >> so there isn't 32 bits left to move tuples "down" the page, what do we
> >> do?
> > 
> > I guess I am missing something.  If you see an old page version number,
> > you know none of the tuples are from running transactions so you can
> > just freeze them all, after consulting the pg_clog.  What am I missing?
> > If the page is full, why are you trying to add to the page?
> 
> The problem is if you want to delete from such a page.  Then you need to
> update the tuple's xmax and stick the new xid epoch somewhere.
> 
> We had an unconference session at PGCon about this.  These issues were
> all discussed and some ideas were thrown around.  We can expect a patch
> to appear soon, I think.

Sorry I missed the unconference session.

OK, crazy idea.  Since we know the creation is frozen can we put the
epoch in the xmin and set some tuple bit that only has meaning on old
page versions?  Yeah, I said crazy.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Heikki Linnakangas
Date:
On 06/06/2017 07:24 AM, Ashutosh Bapat wrote:
> On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> On 6 June 2017 at 12:13, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
>>
>>> What happens when the epoch is so low that the rest of the XID does
>>> not fit in 32bits of tuple header? Or such a case should never arise?
>>
>> Storing an epoch implies that rows can't have (xmin,xmax) different by
>> more than one epoch. So if you're updating/deleting an extremely old
>> tuple you'll presumably have to set xmin to FrozenTransactionId if it
>> isn't already, so you can set a new epoch and xmax.
>
> If the page has multiple such tuples, updating one tuple will mean
> updating headers of other tuples as well? This means that those tuples
> need to be locked for concurrent scans? May be not, since such tuples
> will be anyway visible to any concurrent scans and updating xmin/xmax
> doesn't change the visibility. But we might have to prevent multiple
> updates to the xmin/xmax because of concurrent updates on the same
> page.

"Store the epoch in the page header" is actually a slightly 
simpler-to-visualize, but incorrect, version of what we actually need to 
do. If you only store the epoch, then all the XIDs on a page need to 
belong to the same epoch, which causes trouble when the current epoch 
changes. Just after the epoch changes, you cannot necessarily freeze all 
the tuples from the previous epoch, because they would not yet be 
visible to everyone.

The full picture is that we need to store one 64-bit XID "base" value in 
the page header, and all the xmin/xmax values in the tuple headers are 
offsets relative to that base. With that, you effectively have 64-bit 
XIDs, as long as the *difference* between any two XIDs on a page is not 
greater than 2^32. That can be guaranteed, as long as we don't allow a 
transaction to be in-progress for more than 2^32 XIDs. That seems like a 
reasonable limitation.

But yes, when the "current XID - base XID in page header" becomes 
greater than 2^32, and you need to update a tuple on that page, you need 
to first freeze the page, update the base XID on the page header to a 
more recent value, and update the XID offsets on every tuple on the page 
accordingly. And to do that, you need to hold a lock on the page. If you 
don't move any tuples around at the same time, but just update the XID 
fields, and exclusive lock on the page is enough, i.e. you don't need to 
take a super-exclusive or vacuum lock. In any case, it happens so 
infrequently that it should not become a serious burden.

- Heikki




Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Alexander Korotkov
Date:
On Tue, Jun 6, 2017 at 4:05 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 6/6/17 08:29, Bruce Momjian wrote:
> On Tue, Jun  6, 2017 at 06:00:54PM +0800, Craig Ringer wrote:
>> Tom's point is, I think, that we'll want to stay pg_upgrade
>> compatible. So when we see a pg10 tuple and want to add a new page
>> with a new page header that has an epoch, but the whole page is full
>> so there isn't 32 bits left to move tuples "down" the page, what do we
>> do?
>
> I guess I am missing something.  If you see an old page version number,
> you know none of the tuples are from running transactions so you can
> just freeze them all, after consulting the pg_clog.  What am I missing?
> If the page is full, why are you trying to add to the page?

The problem is if you want to delete from such a page.  Then you need to
update the tuple's xmax and stick the new xid epoch somewhere.

We had an unconference session at PGCon about this.  These issues were
all discussed and some ideas were thrown around.  We can expect a patch
to appear soon, I think.

Right.  I'm now working on splitting my large patch for 64-bit xids into patchset.
I'm planning to post patchset in the beginning of next week.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 

Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Alexander Korotkov
Date:
On Wed, Jun 7, 2017 at 10:47 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 06/06/2017 07:24 AM, Ashutosh Bapat wrote:
On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 6 June 2017 at 12:13, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

What happens when the epoch is so low that the rest of the XID does
not fit in 32bits of tuple header? Or such a case should never arise?

Storing an epoch implies that rows can't have (xmin,xmax) different by
more than one epoch. So if you're updating/deleting an extremely old
tuple you'll presumably have to set xmin to FrozenTransactionId if it
isn't already, so you can set a new epoch and xmax.

If the page has multiple such tuples, updating one tuple will mean
updating headers of other tuples as well? This means that those tuples
need to be locked for concurrent scans? May be not, since such tuples
will be anyway visible to any concurrent scans and updating xmin/xmax
doesn't change the visibility. But we might have to prevent multiple
updates to the xmin/xmax because of concurrent updates on the same
page.

"Store the epoch in the page header" is actually a slightly simpler-to-visualize, but incorrect, version of what we actually need to do. If you only store the epoch, then all the XIDs on a page need to belong to the same epoch, which causes trouble when the current epoch changes. Just after the epoch changes, you cannot necessarily freeze all the tuples from the previous epoch, because they would not yet be visible to everyone.

The full picture is that we need to store one 64-bit XID "base" value in the page header, and all the xmin/xmax values in the tuple headers are offsets relative to that base. With that, you effectively have 64-bit XIDs, as long as the *difference* between any two XIDs on a page is not greater than 2^32. That can be guaranteed, as long as we don't allow a transaction to be in-progress for more than 2^32 XIDs. That seems like a reasonable limitation.
 
Right.  I used the term "64-bit epoch" during developer unconference, but that was ambiguous.  It would be more correct to call it a "64-bit base".
BTW, we will have to store two 64-bit bases: for xids and for multixacts, because they are completely independent counters.

But yes, when the "current XID - base XID in page header" becomes greater than 2^32, and you need to update a tuple on that page, you need to first freeze the page, update the base XID on the page header to a more recent value, and update the XID offsets on every tuple on the page accordingly. And to do that, you need to hold a lock on the page. If you don't move any tuples around at the same time, but just update the XID fields, and exclusive lock on the page is enough, i.e. you don't need to take a super-exclusive or vacuum lock. In any case, it happens so infrequently that it should not become a serious burden.

Yes, exclusive lock seems to be enough for single page freeze.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Alvaro Herrera
Date:
Alexander Korotkov wrote:

> Right.  I used the term "64-bit epoch" during developer unconference, but
> that was ambiguous.  It would be more correct to call it a "64-bit base".
> BTW, we will have to store two 64-bit bases: for xids and for multixacts,
> because they are completely independent counters.

So this takes us from 4 additional bytes per page, to 16 additional
bytes per page.  With the proposal to require 4 free bytes it seemed
quite unlikely that many pages would fail to comply (so whatever
fallback mechanism was needed during page upgrade would be seldom used),
but now that they are 16, the likelihood of needing to run that page
upgrade seems a tad high.

Instead of adding a second 64 bit counter for multixacts, how about
first implementing something like TED which gets rid of multixacts (and
freezing thereof) altogether?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Masahiko Sawada
Date:
On Wed, Jun 7, 2017 at 4:47 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 06/06/2017 07:24 AM, Ashutosh Bapat wrote:
>>
>> On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer <craig@2ndquadrant.com>
>> wrote:
>>>
>>> On 6 June 2017 at 12:13, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
>>> wrote:
>>>
>>>> What happens when the epoch is so low that the rest of the XID does
>>>> not fit in 32bits of tuple header? Or such a case should never arise?
>>>
>>>
>>> Storing an epoch implies that rows can't have (xmin,xmax) different by
>>> more than one epoch. So if you're updating/deleting an extremely old
>>> tuple you'll presumably have to set xmin to FrozenTransactionId if it
>>> isn't already, so you can set a new epoch and xmax.
>>
>>
>> If the page has multiple such tuples, updating one tuple will mean
>> updating headers of other tuples as well? This means that those tuples
>> need to be locked for concurrent scans? May be not, since such tuples
>> will be anyway visible to any concurrent scans and updating xmin/xmax
>> doesn't change the visibility. But we might have to prevent multiple
>> updates to the xmin/xmax because of concurrent updates on the same
>> page.
>
>
> "Store the epoch in the page header" is actually a slightly
> simpler-to-visualize, but incorrect, version of what we actually need to do.
> If you only store the epoch, then all the XIDs on a page need to belong to
> the same epoch, which causes trouble when the current epoch changes. Just
> after the epoch changes, you cannot necessarily freeze all the tuples from
> the previous epoch, because they would not yet be visible to everyone.
>
> The full picture is that we need to store one 64-bit XID "base" value in the
> page header, and all the xmin/xmax values in the tuple headers are offsets
> relative to that base. With that, you effectively have 64-bit XIDs, as long
> as the *difference* between any two XIDs on a page is not greater than 2^32.
> That can be guaranteed, as long as we don't allow a transaction to be
> in-progress for more than 2^32 XIDs. That seems like a reasonable
> limitation.
>
> But yes, when the "current XID - base XID in page header" becomes greater
> than 2^32, and you need to update a tuple on that page, you need to first
> freeze the page, update the base XID on the page header to a more recent
> value, and update the XID offsets on every tuple on the page accordingly.
> And to do that, you need to hold a lock on the page. If you don't move any
> tuples around at the same time, but just update the XID fields, and
> exclusive lock on the page is enough, i.e. you don't need to take a
> super-exclusive or vacuum lock. In any case, it happens so infrequently that
> it should not become a serious burden.
>

Freezing a page is required when modifying a tuple on the page by a
transaction with greater than 2^32 XID. Is that right?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Andres Freund
Date:
On 2017-06-07 07:49:00 -0300, Alvaro Herrera wrote:
> Instead of adding a second 64 bit counter for multixacts, how about
> first implementing something like TED which gets rid of multixacts (and
> freezing thereof) altogether?

-1 - that seems like a too high barrier. We've punted on improvements on
this because of CSN, xid-lsn ranges, and at some point we're going to
have to make pragmatic choices, rather than strive for something more ideal.

- Andres



On Wed, Jun 7, 2017 at 12:49 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-06-07 07:49:00 -0300, Alvaro Herrera wrote:
>> Instead of adding a second 64 bit counter for multixacts, how about
>> first implementing something like TED which gets rid of multixacts (and
>> freezing thereof) altogether?
>
> -1 - that seems like a too high barrier. We've punted on improvements on
> this because of CSN, xid-lsn ranges, and at some point we're going to
> have to make pragmatic choices, rather than strive for something more ideal.

What is the problem that we are trying to solve with this change?  Is
there a practical use case for setting autovacuum_freeze_max_age >
2000000000, or is this just so that when autovacuum fails to vacuum
things in time, we can bloat clog instead of performing an emergency
shutdown?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Alexander Korotkov
Date:
On Wed, Jun 7, 2017 at 11:33 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Tue, Jun 6, 2017 at 4:05 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 6/6/17 08:29, Bruce Momjian wrote:
> On Tue, Jun  6, 2017 at 06:00:54PM +0800, Craig Ringer wrote:
>> Tom's point is, I think, that we'll want to stay pg_upgrade
>> compatible. So when we see a pg10 tuple and want to add a new page
>> with a new page header that has an epoch, but the whole page is full
>> so there isn't 32 bits left to move tuples "down" the page, what do we
>> do?
>
> I guess I am missing something.  If you see an old page version number,
> you know none of the tuples are from running transactions so you can
> just freeze them all, after consulting the pg_clog.  What am I missing?
> If the page is full, why are you trying to add to the page?

The problem is if you want to delete from such a page.  Then you need to
update the tuple's xmax and stick the new xid epoch somewhere.

We had an unconference session at PGCon about this.  These issues were
all discussed and some ideas were thrown around.  We can expect a patch
to appear soon, I think.

Right.  I'm now working on splitting my large patch for 64-bit xids into patchset.
I'm planning to post patchset in the beginning of next week.

Work on this patch took longer than I expected.  It is still in not so good shape, but I decided to publish it anyway in order to not stop progress in this area.
I also tried to split this patch into several.  But actually I manage to separate few small pieces, while most of changes are remaining in the single big diff.
Long story short, patchset is attached.

0001-64bit-guc-relopt-1.patch
This patch implements 64 bit GUCs and relation options which are used in further patches.

0002-heap-page-special-1.patch
Putting xid and multixact bases into PageHeaderData would take extra 16 bytes on index pages too.  That would be waste of space for indexes.  This is why I decided to put bases into special area of heap pages.
This patch adds special area for heap pages contaning prune xid and magic number.  Magic number is different for regular heap page and sequence page.

0003-64bit-xid-1.patch
It's the major patch.  It redefines TransactionID ad 64-bit integer and defines 32-bit ShortTransactionID which is used for t_xmin and t_xmax.  Transaction id comparison becomes straight instead of circular. Base values for xids and multixact ids are stored in heap page special.  SLRUs also became 64-bit and non-circular.   To be able to calculate xmin/xmax without accessing heap page, base values are copied into HeapTuple.  Correspondingly HeapTupleHeader(Get|Set)(Xmin|Xmax) becomes just HeapTuple(Get|Set)(Xmin|Xmax) whose require HeapTuple not just HeapTupleHeader.  heap_page_prepare_for_xid() is used to ensure that given xid fits particular page base.  If it doesn't fit then base of page is shifted, that could require single-page freeze.  Format for wal is changed in order to prevent unaligned access to TransactionId.  *_age GUCs and relation options are changed to 64-bit.  Forced "autovacuum to prevent wraparound" is removed, but there is still freeze to truncate SLRUs.

0004-base-values-for-testing-1.patch
This patch is used for testing that calculations using 64-bit bases and short 32-bit xid values are correct.  It provides initdb options for initial xid, multixact id and multixact offset values.  Regression tests initialize cluster with large (more than 2^32) values.

There are a lot of open items, but I would like to notice some of them:
 * WAL becomes significantly larger due to storage 8 byte xids instead of 4 byte xids.  Probably, its needed to use base approach in WAL too.
 * As discussed in developer unconference, we need to write special background worker which would ensure that each heap page can fit bases.  This background worker should finish its work before database could be pg_upgraded.  Alternatively, we could find a way to store bases in the existing page header.
 * BTPageOpaqueData contains TransactionID in special area.  BTPageOpaqueData should be changed to some pg_upgradable format.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment
re: "The problem is if you want to delete from such a page.  Then you need to
update the tuple's xmax and stick the new xid epoch somewhere."

When the xid's on a single page span a range of more than 2^32, as could
occur in the scenario above, then a single xid base value won't suffice.  Do
we have a proposed solution for this problem?

If not, then allow me to put out a 'straw man' proposal: perhaps we could
mark such a row with a 'tombstone' that points off to some other page in yet
another page format that contains full 64-bit xids.  Rows in this 64-bit xid
format page would all be deleted rows, and would be vacuumed away, along
with the tombstone row, when there are no more transactions that can see it. 
Under the assumption that deletion of such very old rows is rare, this may
have very little impact on performance.  One negative is that rarely
executed code can be a maintainability problem, but we can probably cope
with that.

Feel free to knock down this 'straw man' and propose something better!



--
View this message in context:
http://www.postgresql-archive.org/Challenges-preventing-us-moving-to-64-bit-transaction-id-XID-tp5964779p5970238.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



On 6 July 2017 at 15:29, Jim Finnerty <jfinnert@amazon.com> wrote:
>
> Feel free to knock down this 'straw man' and propose something better!

I think the pattern in this design that we don't want is that it
imposes extra complexity on every user of every page even when the
page doesn't have the problem and even when the problem isn't anywhere
in the database. Even years from now when this problem is long gone
you'll have code paths for dealing with this special page format that
are rarely executed and never tested that will have to be maintained
blind.

Ideally a solution to this problem that imposes a cost only on the
weird pages and only temporarily and leave the database in a
"consistent" state that doesn't require any special processing when
reading the data would be better.

The "natural" solution is what was discussed for incompatible page
format changes in the past where there's an point release of one
Postgres version that tries to ensure there's enough space on the page
for the next version and keeps track of whether there are any
problematic pages. Then you would be blocked from upgrading until you
had ensured all pages had space (presumably by running some special
"vacuum upgrade" or something like that).

Incidentally it's somewhat intriguing to think about what would happen
if we *always* did such a tombstone for deletes. Or perhaps only when
it's a full_page_write. Since the whole page is going into the log and
that tuple will never be modified again you could imagine just
replacing the tuple with the LSN of the deletion and letting anyone
who really needs it fetch it from the xlog. That would be a completely
different model from the way Postgres works though. More like a
log-structured storage system.
-- 
greg



Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Bruce Momjian
Date:
On Thu, Jul  6, 2017 at 07:29:07AM -0700, Jim Finnerty wrote:
> re: "The problem is if you want to delete from such a page.  Then you need to
> update the tuple's xmax and stick the new xid epoch somewhere."

I am coming to this very late, but wouldn't such a row be marked using
our frozen-commited fixed xid so it doesn't matter what the xid epoch is?
I realize with 64-bit xids we don't need to freeze tuples, but we could
still use a frozen-commited fixed xid, see:
#define FrozenTransactionId         ((TransactionId) 2)

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Ildar Musin
Date:
Hi Alexander,

On 22.06.2017 18:36, Alexander Korotkov wrote:
> On Wed, Jun 7, 2017 at 11:33 AM, Alexander Korotkov
> <a.korotkov@postgrespro.ru <mailto:a.korotkov@postgrespro.ru>> wrote:

> 0002-heap-page-special-1.patch
> Putting xid and multixact bases into PageHeaderData would take extra 16
> bytes on index pages too.  That would be waste of space for indexes.
> This is why I decided to put bases into special area of heap pages.
> This patch adds special area for heap pages contaning prune xid and
> magic number.  Magic number is different for regular heap page and
> sequence page.

We've discussed it earlier but it worth mentioning here too. You have 
pd_prune_xid of type TransactionId which is treated elsewhere as 
ShortTransactionId (see HeapPageGetPruneXid() and HeapPageSetPruneXid()) 
and hence introduces redundant 4 bytes. Could you please fix it?

-- 
Ildar Musin
i.musin@postgrespro.ru



Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Alexander Korotkov
Date:
Hi, Ildar!

On Tue, Sep 5, 2017 at 12:55 PM, Ildar Musin <i.musin@postgrespro.ru> wrote:
On 22.06.2017 18:36, Alexander Korotkov wrote:
On Wed, Jun 7, 2017 at 11:33 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru <mailto:a.korotkov@postgrespro.ru>> wrote:

0002-heap-page-special-1.patch
Putting xid and multixact bases into PageHeaderData would take extra 16
bytes on index pages too.  That would be waste of space for indexes.
This is why I decided to put bases into special area of heap pages.
This patch adds special area for heap pages contaning prune xid and
magic number.  Magic number is different for regular heap page and
sequence page.

We've discussed it earlier but it worth mentioning here too. You have pd_prune_xid of type TransactionId which is treated elsewhere as ShortTransactionId (see HeapPageGetPruneXid() and HeapPageSetPruneXid()) and hence introduces redundant 4 bytes. Could you please fix it?

Thank you for pointing.
Updated patchset is attached.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 
Attachment

Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Alexander Korotkov
Date:
On Thu, Sep 14, 2017 at 3:48 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Tue, Sep 5, 2017 at 12:55 PM, Ildar Musin <i.musin@postgrespro.ru> wrote:
On 22.06.2017 18:36, Alexander Korotkov wrote:
On Wed, Jun 7, 2017 at 11:33 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru <mailto:a.korotkov@postgrespro.ru>> wrote:

0002-heap-page-special-1.patch
Putting xid and multixact bases into PageHeaderData would take extra 16
bytes on index pages too.  That would be waste of space for indexes.
This is why I decided to put bases into special area of heap pages.
This patch adds special area for heap pages contaning prune xid and
magic number.  Magic number is different for regular heap page and
sequence page.

We've discussed it earlier but it worth mentioning here too. You have pd_prune_xid of type TransactionId which is treated elsewhere as ShortTransactionId (see HeapPageGetPruneXid() and HeapPageSetPruneXid()) and hence introduces redundant 4 bytes. Could you please fix it?

Thank you for pointing.
Updated patchset is attached.

I'm sorry.  I messed up with git, please find attached patchset.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Attachment

Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Amit Kapila
Date:
On Thu, Jun 22, 2017 at 9:06 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Wed, Jun 7, 2017 at 11:33 AM, Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
>>
>> On Tue, Jun 6, 2017 at 4:05 PM, Peter Eisentraut
>> <peter.eisentraut@2ndquadrant.com> wrote:
>>>
>>> On 6/6/17 08:29, Bruce Momjian wrote:
>>> > On Tue, Jun  6, 2017 at 06:00:54PM +0800, Craig Ringer wrote:
>>> >> Tom's point is, I think, that we'll want to stay pg_upgrade
>>> >> compatible. So when we see a pg10 tuple and want to add a new page
>>> >> with a new page header that has an epoch, but the whole page is full
>>> >> so there isn't 32 bits left to move tuples "down" the page, what do we
>>> >> do?
>>> >
>>> > I guess I am missing something.  If you see an old page version number,
>>> > you know none of the tuples are from running transactions so you can
>>> > just freeze them all, after consulting the pg_clog.  What am I missing?
>>> > If the page is full, why are you trying to add to the page?
>>>
>>> The problem is if you want to delete from such a page.  Then you need to
>>> update the tuple's xmax and stick the new xid epoch somewhere.
>>>
>>> We had an unconference session at PGCon about this.  These issues were
>>> all discussed and some ideas were thrown around.  We can expect a patch
>>> to appear soon, I think.
>>
>>
>> Right.  I'm now working on splitting my large patch for 64-bit xids into
>> patchset.
>> I'm planning to post patchset in the beginning of next week.
>
>
> Work on this patch took longer than I expected.  It is still in not so good
> shape, but I decided to publish it anyway in order to not stop progress in
> this area.
> I also tried to split this patch into several.  But actually I manage to
> separate few small pieces, while most of changes are remaining in the single
> big diff.
> Long story short, patchset is attached.
>
> 0001-64bit-guc-relopt-1.patch
> This patch implements 64 bit GUCs and relation options which are used in
> further patches.
>
> 0002-heap-page-special-1.patch
> Putting xid and multixact bases into PageHeaderData would take extra 16
> bytes on index pages too.  That would be waste of space for indexes.  This
> is why I decided to put bases into special area of heap pages.
> This patch adds special area for heap pages contaning prune xid and magic
> number.  Magic number is different for regular heap page and sequence page.
>

uint16 pd_pagesize_version;
- TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */ ItemIdData pd_linp[FLEXIBLE_ARRAY_MEMBER]; /*
linepointer array */ } PageHeaderData;
 

Why have you moved pd_prune_xid from page header?

> 0003-64bit-xid-1.patch
> It's the major patch.  It redefines TransactionID ad 64-bit integer and
> defines 32-bit ShortTransactionID which is used for t_xmin and t_xmax.
> Transaction id comparison becomes straight instead of circular. Base values
> for xids and multixact ids are stored in heap page special.  SLRUs also
> became 64-bit and non-circular.   To be able to calculate xmin/xmax without
> accessing heap page, base values are copied into HeapTuple.  Correspondingly
> HeapTupleHeader(Get|Set)(Xmin|Xmax) becomes just
> HeapTuple(Get|Set)(Xmin|Xmax) whose require HeapTuple not just
> HeapTupleHeader.  heap_page_prepare_for_xid() is used to ensure that given
> xid fits particular page base.  If it doesn't fit then base of page is
> shifted, that could require single-page freeze.  Format for wal is changed
> in order to prevent unaligned access to TransactionId.  *_age GUCs and
> relation options are changed to 64-bit.  Forced "autovacuum to prevent
> wraparound" is removed, but there is still freeze to truncate SLRUs.
>

It seems there is no README or some detailed explanation of how all
this works like how the value of pd_xid_base is maintained.  I don't
think there are enough comments in the patch to explain the things.  I
think it will be easier to understand and review the patch if you
provide some more details either in email or in the patch.

> 0004-base-values-for-testing-1.patch
> This patch is used for testing that calculations using 64-bit bases and
> short 32-bit xid values are correct.  It provides initdb options for initial
> xid, multixact id and multixact offset values.  Regression tests initialize
> cluster with large (more than 2^32) values.
>
> There are a lot of open items, but I would like to notice some of them:
>  * WAL becomes significantly larger due to storage 8 byte xids instead of 4
> byte xids.  Probably, its needed to use base approach in WAL too.
>

Yeah and I think it can impact performance as well.  By any chance
have you run pgbench read-write to see the performance impact of this
patch?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Alexander Korotkov
Date:
Dear Amit, Thank you for the attention to this patch. On Thu, Nov 23, 2017 at 4:39 PM, Amit Kapila wrote: > On Thu, Jun 22, 2017 at 9:06 PM, Alexander Korotkov > wrote: > > Work on this patch took longer than I expected. It is still in not so > good > > shape, but I decided to publish it anyway in order to not stop progress > in > > this area. > > I also tried to split this patch into several. But actually I manage to > > separate few small pieces, while most of changes are remaining in the > single > > big diff. > > Long story short, patchset is attached. > > > > 0001-64bit-guc-relopt-1.patch > > This patch implements 64 bit GUCs and relation options which are used in > > further patches. > > > > 0002-heap-page-special-1.patch > > Putting xid and multixact bases into PageHeaderData would take extra 16 > > bytes on index pages too. That would be waste of space for indexes. > This > > is why I decided to put bases into special area of heap pages. > > This patch adds special area for heap pages contaning prune xid and magic > > number. Magic number is different for regular heap page and sequence > page. > > > > uint16 pd_pagesize_version; > - TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */ > ItemIdData pd_linp[FLEXIBLE_ARRAY_MEMBER]; /* line pointer array */ > } PageHeaderData; > > Why have you moved pd_prune_xid from page header? > pg_prune_xid makes sense only for heap pages. Once we introduce special area for heap pages, we can move pg_prune_xid there and save some bytes in index pages. However, this is an optimization not directly related to 64-bit xids. Idea is that if we anyway change page format, why don't apply this optimization as well? But if we have any doubts in this, it can be removed with no problem. > > 0003-64bit-xid-1.patch > > It's the major patch. It redefines TransactionID ad 64-bit integer and > > defines 32-bit ShortTransactionID which is used for t_xmin and t_xmax. > > Transaction id comparison becomes straight instead of circular. Base > values > > for xids and multixact ids are stored in heap page special. SLRUs also > > became 64-bit and non-circular. To be able to calculate xmin/xmax > without > > accessing heap page, base values are copied into HeapTuple. > Correspondingly > > HeapTupleHeader(Get|Set)(Xmin|Xmax) becomes just > > HeapTuple(Get|Set)(Xmin|Xmax) whose require HeapTuple not just > > HeapTupleHeader. heap_page_prepare_for_xid() is used to ensure that > given > > xid fits particular page base. If it doesn't fit then base of page is > > shifted, that could require single-page freeze. Format for wal is > changed > > in order to prevent unaligned access to TransactionId. *_age GUCs and > > relation options are changed to 64-bit. Forced "autovacuum to prevent > > wraparound" is removed, but there is still freeze to truncate SLRUs. > > > > It seems there is no README or some detailed explanation of how all > this works like how the value of pd_xid_base is maintained. I don't > think there are enough comments in the patch to explain the things. I > think it will be easier to understand and review the patch if you > provide some more details either in email or in the patch. > OK. I'm going to write README and include it into the patch. > > 0004-base-values-for-testing-1.patch > > This patch is used for testing that calculations using 64-bit bases and > > short 32-bit xid values are correct. It provides initdb options for > initial > > xid, multixact id and multixact offset values. Regression tests > initialize > > cluster with large (more than 2^32) values. > > > > There are a lot of open items, but I would like to notice some of them: > > * WAL becomes significantly larger due to storage 8 byte xids instead > of 4 > > byte xids. Probably, its needed to use base approach in WAL too. > > > > Yeah and I think it can impact performance as well. By any chance > have you run pgbench read-write to see the performance impact of this > patch? > Sure, I'll make some benchmarks on both 32-bit and 64-bit machines. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company

Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Amit Kapila
Date:
On Fri, Nov 24, 2017 at 4:03 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
>> > 0002-heap-page-special-1.patch
>> > Putting xid and multixact bases into PageHeaderData would take extra 16
>> > bytes on index pages too.  That would be waste of space for indexes.
>> > This
>> > is why I decided to put bases into special area of heap pages.
>> > This patch adds special area for heap pages contaning prune xid and
>> > magic
>> > number.  Magic number is different for regular heap page and sequence
>> > page.
>> >
>>
>> uint16 pd_pagesize_version;
>> - TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */
>>   ItemIdData pd_linp[FLEXIBLE_ARRAY_MEMBER]; /* line pointer array */
>>   } PageHeaderData;
>>
>> Why have you moved pd_prune_xid from page header?
>
>
> pg_prune_xid makes sense only for heap pages.  Once we introduce special
> area for heap pages, we can move pg_prune_xid there and save some bytes in
> index pages.  However, this is an optimization not directly related to
> 64-bit xids.  Idea is that if we anyway change page format, why don't apply
> this optimization as well?
>

Sure, but I think this patch could have been proposed on top of your
main patch not other way.  Another similar thing I have noticed is
below:

*************** typedef struct CheckPoint
*** 39,45 **** TimeLineID PrevTimeLineID; /* previous TLI, if this record begins a new  * timeline (equals
ThisTimeLineIDotherwise) */ bool fullPageWrites; /* current full_page_writes */
 
- uint32 nextXidEpoch; /* higher-order bits of nextXid */ TransactionId nextXid; /* next free XID */ Oid nextOid; /*
nextfree OID */ MultiXactId nextMulti; /* next free MultiXactId */
 


I think if we have 64-bit Transaction Ids, then we might not need
epoch at all, but I think this can also be a separate change from the
main patch.  The point is that already main patch is big enough that
we should not try to squeeze other related changes in it.

>  But if we have any doubts in this, it can be
> removed with no problem.
>
>>
>> > 0003-64bit-xid-1.patch
..
>> >
>>
>> It seems there is no README or some detailed explanation of how all
>> this works like how the value of pd_xid_base is maintained.  I don't
>> think there are enough comments in the patch to explain the things.  I
>> think it will be easier to understand and review the patch if you
>> provide some more details either in email or in the patch.
>
>
> OK.  I'm going to write README and include it into the patch.
>

Thanks.  Few assorted comments on the main patch:

1.
+ /*
+  * Ensure that given xid fits base of given page.
+  */
+ bool
+ heap_page_prepare_for_xid(Relation relation, Buffer buffer,
+   TransactionId xid, bool multi)

I couldn't find any use of the return value of this function,
basically, if the specified xid can update the patch, then it returns
false otherwise it updates the tuples on a page and base xid on a page
such that new xid can update the tuple.  So either, you should make
this function return void or split it into two parts such that first
function check if the new xid can update the tuple, if so proceed with
updating the tuple, otherwise, update the base xid in the page and
xmin/xmax in tuples.

2.
heap_page_prepare_for_xid()
{
..
+ /* Find minimum and maximum xids in the page */
+ found = heap_page_xid_min_max(page, multi, &min, &max);
+
+ /* No items on the page? */
+ if (!found)
+ {
+ int64 delta;
+
+ if (!multi)
+ delta = (xid - FirstNormalTransactionId) - pageSpecial->pd_xid_base;
+ else
+ delta = (xid - FirstNormalTransactionId) - pageSpecial->pd_multi_base;
+
+ heap_page_shift_base(RelationNeedsWAL(relation) ? buffer : InvalidBuffer,
+ page, multi, delta);
+ MarkBufferDirty(buffer);
+ return false;
+ }
..
}

When there are no items on the page what is need to try to traverse
all items again (via heap_page_shift_base), you can ideally shift the
base directly as (xid - FirstNormalTransactionId) in this case.

3.
heap_page_prepare_for_xid()
{
..
+ /* Can we just shift base on the page */
+ if (xid < base + FirstNormalTransactionId)
+ {
+ int64 freeDelta = MaxShortTransactionId - max,
+ requiredDelta = (base + FirstNormalTransactionId) - xid;
+
+ if (requiredDelta <= freeDelta)
+ {
+ heap_page_shift_base(RelationNeedsWAL(relation) ? buffer : InvalidBuffer,
+ page, multi, - (freeDelta + requiredDelta) / 2);
+ MarkBufferDirty(buffer);
+ return true;
+ }
+ }
+ else
+ {
+ int64 freeDelta = min - FirstNormalTransactionId,
+ requiredDelta = xid - (base + MaxShortTransactionId);
+
+ if (requiredDelta <= freeDelta)
+ {
+ heap_page_shift_base(RelationNeedsWAL(relation) ? buffer : InvalidBuffer,
+ page, multi, (freeDelta + requiredDelta) / 2);
+ MarkBufferDirty(buffer);
+ return true;
+ }
+ }
..
}

I think the above code doesn't follow guidelines for writing a WAL.
Currently, (a) it modifies page (b) write WAL (c) mark buffer dirty.
You need to perform step (c) before step (b).  Also, all these steps
need to be performed in the critical section.

Also, some comments explaining the computation of delta in above
context can make things easier to understand.

4. I think rewrite_page_prepare_for_xid() has enough common
functionality with heap_page_prepare_for_xid that you can extract
common parts into a separate function.

5.
+ /* FIXME */
+ tuple->t_data->t_choice.t_heap.t_xmin =
NormalTransactionIdToShort(HeapPageGetSpecial(pageHeader)->pd_xid_base,
xid);
+

There are a lot of FIXME's like above which doesn't even indicate what
exactly you want to fix.  I think it is okay to have FIXME's which you
want to fix later in a patch of this size, but at the very least they
should be documented, otherwise, I am afraid that you yourself might
forget what to fix.

6.
+ /*
+  * Shift xid base in the page.  WAL-logged if buffer is specified.
+  */
+ static void
+ heap_page_shift_base(Buffer buffer, Page page, bool multi, int64 delta)
+ {
+ HeapPageSpecial pageSpecial = HeapPageGetSpecial(page);
+ OffsetNumber offnum,
+ maxoff;
+
+ /* Iterate over page items */
+ maxoff = PageGetMaxOffsetNumber(page);
+ for (offnum = FirstOffsetNumber;
+ offnum <= maxoff;
+ offnum = OffsetNumberNext(offnum))
+ {
+ ItemId itemid;
+ HeapTupleHeader htup;
+
+ itemid = PageGetItemId(page, offnum);
+
+ if (!ItemIdIsNormal(itemid))
+ continue;
+
+ htup = (HeapTupleHeader) PageGetItem(page, itemid);
+
+ /* Apply xid shift to heap tuple */
+ if (!multi)
+ {
+ if (!HeapTupleHeaderXminFrozen(htup) &&
+ TransactionIdIsNormal(htup->t_choice.t_heap.t_xmin))
+ {
+ Assert(htup->t_choice.t_heap.t_xmin - delta >= FirstNormalTransactionId);
+ Assert(htup->t_choice.t_heap.t_xmin - delta <= MaxShortTransactionId);
+ htup->t_choice.t_heap.t_xmin -= delta;
+ }

How is it ensured that the xids (xmin/xmax on tuple) you are modifying
are not in progress or there is some theory that it is okay to modify
in-progress xids?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Robert Haas
Date:
On Fri, Nov 24, 2017 at 5:33 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> pg_prune_xid makes sense only for heap pages.  Once we introduce special
> area for heap pages, we can move pg_prune_xid there and save some bytes in
> index pages.  However, this is an optimization not directly related to
> 64-bit xids.  Idea is that if we anyway change page format, why don't apply
> this optimization as well?  But if we have any doubts in this, it can be
> removed with no problem.

My first reaction is that changing the page format seems like a
non-starter, because it would break pg_upgrade.  If we get the heap
storage API working, then we could have a heap AM that works as it
does today and a newheap AM with such changes, but I have a bit of a
hard time imagining a patch that causes a hard compatibility break
ever being accepted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Peter Geoghegan
Date:
On Mon, Nov 27, 2017 at 11:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Nov 24, 2017 at 5:33 AM, Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
>> pg_prune_xid makes sense only for heap pages.  Once we introduce special
>> area for heap pages, we can move pg_prune_xid there and save some bytes in
>> index pages.  However, this is an optimization not directly related to
>> 64-bit xids.  Idea is that if we anyway change page format, why don't apply
>> this optimization as well?  But if we have any doubts in this, it can be
>> removed with no problem.
>
> My first reaction is that changing the page format seems like a
> non-starter, because it would break pg_upgrade.  If we get the heap
> storage API working, then we could have a heap AM that works as it
> does today and a newheap AM with such changes, but I have a bit of a
> hard time imagining a patch that causes a hard compatibility break
> ever being accepted.

I actually think that we could use that field in indexes for storing
an epoch. This could be used to avoid having to worry about
anti-wraparound VACUUM for deleted index pages that contain a cached
XID -- the one we compare to RecentGlobalXmin as part of the recycling
interlock. (I suggested this to Sawada-san at one point, in the
context of avoiding vacuuming indexes on large append-mostly tables.)

In any case, we'd hardly go to all that effort to save just 4 bytes in
the page header.

-- 
Peter Geoghegan


Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Alexander Korotkov
Date:
On Mon, Nov 27, 2017 at 10:56 PM, Robert Haas wrote: > On Fri, Nov 24, 2017 at 5:33 AM, Alexander Korotkov > wrote: > > pg_prune_xid makes sense only for heap pages. Once we introduce special > > area for heap pages, we can move pg_prune_xid there and save some bytes > in > > index pages. However, this is an optimization not directly related to > > 64-bit xids. Idea is that if we anyway change page format, why don't > apply > > this optimization as well? But if we have any doubts in this, it can be > > removed with no problem. > > My first reaction is that changing the page format seems like a > non-starter, because it would break pg_upgrade. If we get the heap > storage API working, then we could have a heap AM that works as it > does today and a newheap AM with such changes, but I have a bit of a > hard time imagining a patch that causes a hard compatibility break > ever being accepted. Thank you for raising this question. There was a discussion about 64-bit xids during PGCon 2017. Couple ways to provide pg_upgrade were discussed. 1) We've page layout version in the page (current is number 4). So, we can define new page layout version 5. Pages with new layout version would contain 64-bit base values for xid and multixact. The question is how to deal with page of layout version 4. If this page have enough of free space to fit extra 16 bytes, then it could be upgraded on the fly. If it doesn't contains enough of space for than then things becomes more complicated: we can't upgrade it to new format, but we still need to fit new xmax value there in the case tuple being updated or deleted. pg_upgrade requires server restart. Thus, once we set hint bits, pre-pg_upgrade xmin is not really meaningful – corresponding xid is visible for every post-pg_upgrade snapshot. So, idea is to use both xmin and xmax tuple fields on such unupgradable page to store 64-bit xmax. This idea was proposed by me, but was criticized by some session attendees (sorry, but I don't recall who were them) for its complexity and suspected overhead. 2) Alternative idea was to use unused bits in page header. Naturally, if we would look for unused bits in pd_flags (3 bits of 16 is used), pd_pagesize_version (we can left 1 bit of 16 to distinguish between old and new format) and pd_special (we can leave 1 bit to distinguish sequence pages), we can scrape together 43 bits. That would be far enough for single base value, because we definitely don't need all lower 32-bits of base value (21 bits is more than enough). But I'm not sure about two base values: if we would live 2 bits for lower part of base value, than it leaves us 19 bits for high part of base value. This solution would give us 2^51 maximum values for xids and multixacts. I'm not sure if it's enough to assume these counters infinite. AFAIK, there are products on the market whose have 48-bit transaction identifiers and don't care about wraparound or something... New heap AM for 64-bit xids is an interesting idea too. I would even say that pluggable storage API being discussed now is excessive for this particular purpose (but still can fit!), because in most of aspects heap with 64-bit xids is absolutely same as current heap (in contrast to heap with undo log, for example). Best fit API for heap with 64-bit xid support would be pluggable heap page format. But I don't think it deserves separate API though. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company

Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Masahiko Sawada
Date:
On Tue, Nov 28, 2017 at 4:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Nov 24, 2017 at 5:33 AM, Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
>> pg_prune_xid makes sense only for heap pages.  Once we introduce special
>> area for heap pages, we can move pg_prune_xid there and save some bytes in
>> index pages.  However, this is an optimization not directly related to
>> 64-bit xids.  Idea is that if we anyway change page format, why don't apply
>> this optimization as well?  But if we have any doubts in this, it can be
>> removed with no problem.
>
> My first reaction is that changing the page format seems like a
> non-starter, because it would break pg_upgrade.

IIUC xid-lsn ranges patch[1] doesn't require the page format
conversion. Is there any reason we can not go on that way? FWIW, I've
rebased the patch to current HEAD.

[1] https://www.postgresql.org/message-id/5242F8BF.6010807%40vmware.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?

From
Michael Paquier
Date:
On Tue, Nov 28, 2017 at 6:41 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Mon, Nov 27, 2017 at 10:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Fri, Nov 24, 2017 at 5:33 AM, Alexander Korotkov
>> <a.korotkov@postgrespro.ru> wrote:
>> > pg_prune_xid makes sense only for heap pages.  Once we introduce special
>> > area for heap pages, we can move pg_prune_xid there and save some bytes
>> > in
>> > index pages.  However, this is an optimization not directly related to
>> > 64-bit xids.  Idea is that if we anyway change page format, why don't
>> > apply
>> > this optimization as well?  But if we have any doubts in this, it can be
>> > removed with no problem.
>>
>> My first reaction is that changing the page format seems like a
>> non-starter, because it would break pg_upgrade.  If we get the heap
>> storage API working, then we could have a heap AM that works as it
>> does today and a newheap AM with such changes, but I have a bit of a
>> hard time imagining a patch that causes a hard compatibility break
>> ever being accepted.

Yeah.. I can't imagine that either.

> Thank you for raising this question.  There was a discussion about 64-bit
> xids during PGCon 2017.  Couple ways to provide pg_upgrade were discussed.
>
> 1) We've page layout version in the page (current is number 4).  So, we can
> define new page layout version 5.  Pages with new layout version would
> contain 64-bit base values for xid and multixact.  The question is how to
> deal with page of layout version 4.  If this page have enough of free space
> to fit extra 16 bytes, then it could be upgraded on the fly.  If it doesn't
> contains enough of space for than then things becomes more complicated: we
> can't upgrade it to new format, but we still need to fit new xmax value
> there in the case tuple being updated or deleted.  pg_upgrade requires
> server restart.  Thus, once we set hint bits, pre-pg_upgrade xmin is not
> really meaningful – corresponding xid is visible for every post-pg_upgrade
> snapshot.  So, idea is to use both xmin and xmax tuple fields on such
> unupgradable page to store 64-bit xmax.  This idea was proposed by me, but
> was criticized by some session attendees (sorry, but I don't recall who were
> them) for its complexity and suspected overhead.
>
> 2) Alternative idea was to use unused bits in page header.  Naturally, if we
> would look for unused bits in pd_flags (3 bits of 16 is used),
> pd_pagesize_version (we can left 1 bit of 16 to distinguish between old and
> new format) and pd_special (we can leave 1 bit to distinguish sequence
> pages), we can scrape together 43 bits.  That would be far enough for single
> base value, because we definitely don't need all lower 32-bits of base value
> (21 bits is more than enough).  But I'm not sure about two base values: if
> we would live 2 bits for lower part of base value, than it leaves us 19 bits
> for high part of base value.  This solution would give us 2^51 maximum
> values for xids and multixacts.  I'm not sure if it's enough to assume these
> counters infinite.  AFAIK, there are products on the market whose have
> 48-bit transaction identifiers and don't care about wraparound or
> something...
>
> New heap AM for 64-bit xids is an interesting idea too.  I would even say
> that pluggable storage API being discussed now is excessive for this
> particular purpose (but still can fit!), because in most of aspects heap
> with 64-bit xids is absolutely same as current heap (in contrast to heap
> with undo log, for example).  Best fit API for heap with 64-bit xid support
> would be pluggable heap page format.  But I don't think it deserves separate
> API though.

I am moving that entry to next CF as discussion still goes on.
--
Michael


Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Ryan Murphy
Date:
Since the Patch Tester (http://commitfest.cputube.org/) says this Patch will not apply correctly, I am tempted to
changethe status to Waiting on Author.
 

However, I'm new to the CommitFest process, so I'm leaving the status as-is for now and asking Stephen Frost whether he
agrees.

I haven't tried to apply the patch myself yet, but happy to do so if e.g. we think the Patch Tester can't be taken on
facevalue,
 
or if I need to find specific feedback about why the patch didn't apply to help the Author.

Best regards,
Ryan

Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Ryan Murphy
Date:
Thanks for this contribution!
I think it's a hard but important problem to upgrade these xids.

Unfortunately, I've confirmed that this patch 0001-64bit-guc-relopt-3.patch doesn't apply correctly on my computer.

Here's what I did:

I did a "git pull" to the current HEAD, which is 6271fceb8a4f07dafe9d67dcf7e849b319bb2647

Then I attempted to apply the patch, here's what I saw:

$ git apply patches/0001-64bit-guc-relopt-3.patch 
error: src/backend/access/common/reloptions.c: already exists in working directory
error: src/backend/utils/misc/guc.c: already exists in working directory
error: src/include/access/reloptions.h: already exists in working directory
error: src/include/utils/guc.h: already exists in working directory
error: src/include/utils/guc_tables.h: already exists in working directory

Alexander, what is the process you're using to create the patch?  I've heard someone (maybe Tom Lane?) say that he
sometimesuses "patch" directly instead of "git" to create the patch, with better results.  I forget the exact command.
 

For now I'm setting this to Waiting on Author, until we have a patch that applies via "git apply".

Thanks!
Ryan

The new status of this patch is: Waiting on Author

Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Tom Lane
Date:
Ryan Murphy <ryanfmurphy@gmail.com> writes:
> Alexander, what is the process you're using to create the patch?  I've heard someone (maybe Tom Lane?) say that he
sometimesuses "patch" directly instead of "git" to create the patch, with better results.  I forget the exact command. 

Nah, you've got that the other way 'round.  "patch" is not for creating
patches, it's for applying them.  I've found, and some other people seem
to agree, that "patch" is more robust at applying patches than "git apply"
is.  You might try this for a patch created with "git diff":

    patch -p1 <patchfile

Be sure to cd to the top of the source tree first.  Also, you can do

    patch --dry -p1 <patchfile

if you just want to see whether it will complain without messing up
your tree.

(I gather from the messages it prints that the Patch Tester uses
"patch" not "git apply", so probably this patch would fail anyway.)

            regards, tom lane


Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Alexander Korotkov
Date:
Hi, Ryan!

On Sat, Jan 6, 2018 at 10:10 PM, Ryan Murphy <ryanfmurphy@gmail.com> wrote:
Thanks for this contribution!
I think it's a hard but important problem to upgrade these xids.

Unfortunately, I've confirmed that this patch 0001-64bit-guc-relopt-3.patch doesn't apply correctly on my computer.

Here's what I did:

I did a "git pull" to the current HEAD, which is 6271fceb8a4f07dafe9d67dcf7e849b319bb2647

Then I attempted to apply the patch, here's what I saw:

$ git apply patches/0001-64bit-guc-relopt-3.patch
error: src/backend/access/common/reloptions.c: already exists in working directory
error: src/backend/utils/misc/guc.c: already exists in working directory
error: src/include/access/reloptions.h: already exists in working directory
error: src/include/utils/guc.h: already exists in working directory
error: src/include/utils/guc_tables.h: already exists in working directory

Alexander, what is the process you're using to create the patch?  I've heard someone (maybe Tom Lane?) say that he sometimes uses "patch" directly instead of "git" to create the patch, with better results.  I forget the exact command.

I've created patches using context diff, as described in PostgreSQL wiki.
I already noticed that it causing troubles to some community members who use 'git apply'.  And also I noticed that majority of patches nowadays are sent using universal format.  So, I decided to switch to universal format too.  I'm working on rebasing patchset, that takes some time...  Next revision will be sent in universal format.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Alexander Korotkov
Date:
On Tue, Jan 9, 2018 at 12:41 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Sat, Jan 6, 2018 at 10:10 PM, Ryan Murphy <ryanfmurphy@gmail.com> wrote:
Thanks for this contribution!
I think it's a hard but important problem to upgrade these xids.

Unfortunately, I've confirmed that this patch 0001-64bit-guc-relopt-3.patch doesn't apply correctly on my computer.

Here's what I did:

I did a "git pull" to the current HEAD, which is 6271fceb8a4f07dafe9d67dcf7e849b319bb2647

Then I attempted to apply the patch, here's what I saw:

$ git apply patches/0001-64bit-guc-relopt-3.patch
error: src/backend/access/common/reloptions.c: already exists in working directory
error: src/backend/utils/misc/guc.c: already exists in working directory
error: src/include/access/reloptions.h: already exists in working directory
error: src/include/utils/guc.h: already exists in working directory
error: src/include/utils/guc_tables.h: already exists in working directory

Alexander, what is the process you're using to create the patch?  I've heard someone (maybe Tom Lane?) say that he sometimes uses "patch" directly instead of "git" to create the patch, with better results.  I forget the exact command.

I've created patches using context diff, as described in PostgreSQL wiki.
I already noticed that it causing troubles to some community members who use 'git apply'.  And also I noticed that majority of patches nowadays are sent using universal format.  So, I decided to switch to universal format too.  I'm working on rebasing patchset, that takes some time...  Next revision will be sent in universal format.

Please, find in attachment the rebased patchset.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Alexander Korotkov
Date:
On Tue, Jan 9, 2018 at 10:51 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Tue, Jan 9, 2018 at 12:41 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Sat, Jan 6, 2018 at 10:10 PM, Ryan Murphy <ryanfmurphy@gmail.com> wrote:
Thanks for this contribution!
I think it's a hard but important problem to upgrade these xids.

Unfortunately, I've confirmed that this patch 0001-64bit-guc-relopt-3.patch doesn't apply correctly on my computer.

Here's what I did:

I did a "git pull" to the current HEAD, which is 6271fceb8a4f07dafe9d67dcf7e849b319bb2647

Then I attempted to apply the patch, here's what I saw:

$ git apply patches/0001-64bit-guc-relopt-3.patch
error: src/backend/access/common/reloptions.c: already exists in working directory
error: src/backend/utils/misc/guc.c: already exists in working directory
error: src/include/access/reloptions.h: already exists in working directory
error: src/include/utils/guc.h: already exists in working directory
error: src/include/utils/guc_tables.h: already exists in working directory

Alexander, what is the process you're using to create the patch?  I've heard someone (maybe Tom Lane?) say that he sometimes uses "patch" directly instead of "git" to create the patch, with better results.  I forget the exact command.

I've created patches using context diff, as described in PostgreSQL wiki.
I already noticed that it causing troubles to some community members who use 'git apply'.  And also I noticed that majority of patches nowadays are sent using universal format.  So, I decided to switch to universal format too.  I'm working on rebasing patchset, that takes some time...  Next revision will be sent in universal format.

Please, find in attachment the rebased patchset.

As I get from cputube, patchset doesn't compiles again.  Please find revised version attached.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Andres Freund
Date:
Hi,

On 2018-01-11 01:02:52 +0300, Alexander Korotkov wrote:
> As I get from cputube, patchset doesn't compiles again.  Please find
> revised version attached.

It'd be good if you could maintain the patches as commits with some
description of why you're doing these changes.  It's a bit hard to
figure that out sometimes and having to dig through a thread with quite
some discussion certainly makes it less likely to be looked at.


- Andres


Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Alexander Korotkov
Date:
Hi!

On Fri, Mar 2, 2018 at 1:41 AM, Andres Freund <andres@anarazel.de> wrote:
On 2018-01-11 01:02:52 +0300, Alexander Korotkov wrote:
> As I get from cputube, patchset doesn't compiles again.  Please find
> revised version attached.

It'd be good if you could maintain the patches as commits with some
description of why you're doing these changes.  It's a bit hard to
figure that out sometimes and having to dig through a thread with quite
some discussion certainly makes it less likely to be looked at.

Thank you for pointing.  In future I'll maintain patches with their description.
BTW, during FOSDEM developer meeting we decided that I should extract
64-bit in-memory representation of xids and resubmit it, while 64-bit
on-disk reprensentation should become a pluggable table access method.
I didn't manage to do this before current commitfest.  Also, the last commitfest
is already too late for such big changes.  So, I'm marking this RWF.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Andres Freund
Date:
On 2018-03-02 01:48:03 +0300, Alexander Korotkov wrote:
> Also, the last commitfest is already too late for such big changes.
> So, I'm marking this RWF.

Agreed.  Perhaps extract the 64bit GUC patch and track that separately?
Seems like something we should just do...

Greetings,

Andres Freund


Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Alexander Korotkov
Date:
On Fri, Mar 2, 2018 at 1:51 AM, Andres Freund <andres@anarazel.de> wrote:
On 2018-03-02 01:48:03 +0300, Alexander Korotkov wrote:
> Also, the last commitfest is already too late for such big changes.
> So, I'm marking this RWF.

Agreed.  Perhaps extract the 64bit GUC patch and track that separately?
Seems like something we should just do...

Sounds reasonable.  But I didn't notice if there are other users for 64bit GUCs
besides 64bit xids?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Andres Freund
Date:
On 2018-03-02 01:56:00 +0300, Alexander Korotkov wrote:
> On Fri, Mar 2, 2018 at 1:51 AM, Andres Freund <andres@anarazel.de> wrote:
> 
> > On 2018-03-02 01:48:03 +0300, Alexander Korotkov wrote:
> > > Also, the last commitfest is already too late for such big changes.
> > > So, I'm marking this RWF.
> >
> > Agreed.  Perhaps extract the 64bit GUC patch and track that separately?
> > Seems like something we should just do...
> >
> 
> Sounds reasonable.  But I didn't notice if there are other users for 64bit
> GUCs besides 64bit xids?

I think there were a couple past occasions where we could've used that,
don't quite recall the details. We're at least not that far away from
the point where various size limits are actually limited by int32
range. And timeouts of ~25 days are long but not entirely unreasonable.

Greetings,

Andres Freund


Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Chris Travers
Date:


On Fri, Mar 2, 2018 at 12:07 AM Andres Freund <andres@anarazel.de> wrote:
On 2018-03-02 01:56:00 +0300, Alexander Korotkov wrote:
> On Fri, Mar 2, 2018 at 1:51 AM, Andres Freund <andres@anarazel.de> wrote:
>
> > On 2018-03-02 01:48:03 +0300, Alexander Korotkov wrote:
> > > Also, the last commitfest is already too late for such big changes.
> > > So, I'm marking this RWF.
> >
> > Agreed.  Perhaps extract the 64bit GUC patch and track that separately?
> > Seems like something we should just do...
> >
>
> Sounds reasonable.  But I didn't notice if there are other users for 64bit
> GUCs besides 64bit xids?

I think there were a couple past occasions where we could've used that,
don't quite recall the details. We're at least not that far away from
the point where various size limits are actually limited by int32
range. And timeouts of ~25 days are long but not entirely unreasonable.

As a note here, I have worked on projects where there could be 2-week-long idle-in-transaction states (no joke, we tuned things to only use virtual xids for most of that time), and having an ability to set idle-in-transaction timeouts to figures of greater than a month are things I could imagine doing.  I would certainly favor the idea of 64-big GUC variables as a general rule.

Greetings,

Andres Freund



--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Chris Travers
Date:


On Thu, Mar 1, 2018 at 11:48 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
Hi!

On Fri, Mar 2, 2018 at 1:41 AM, Andres Freund <andres@anarazel.de> wrote:
On 2018-01-11 01:02:52 +0300, Alexander Korotkov wrote:
> As I get from cputube, patchset doesn't compiles again.  Please find
> revised version attached.

It'd be good if you could maintain the patches as commits with some
description of why you're doing these changes.  It's a bit hard to
figure that out sometimes and having to dig through a thread with quite
some discussion certainly makes it less likely to be looked at.

Thank you for pointing.  In future I'll maintain patches with their description.
BTW, during FOSDEM developer meeting we decided that I should extract
64-bit in-memory representation of xids and resubmit it, while 64-bit
on-disk reprensentation should become a pluggable table access method.
I didn't manage to do this before current commitfest.  Also, the last commitfest
is already too late for such big changes.  So, I'm marking this RWF.

I am wondering about this point.  If there are different access methods, what does that mean for transactions crossing the 32-bit mark? Is it possible that supporting 32-bit representations and 64-bit representations together could cause visibility problems and bugs?  Or is that with the idea that pages would be upgraded on write?
 

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 


--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Andres Freund
Date:
Hi,

On 2019-02-13 12:16:33 +0100, Chris Travers wrote:
> As a note here, I have worked on projects where there could be 2-week-long
> idle-in-transaction states (no joke, we tuned things to only use virtual
> xids for most of that time), and having an ability to set
> idle-in-transaction timeouts to figures of greater than a month are things
> I could imagine doing.  I would certainly favor the idea of 64-big GUC
> variables as a general rule.

How about proposing a patch for it in a new thread?

Greetings,

Andres Freund


Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Jim Finnerty
Date:
What is the current status of the 64-bit XID topic?  Was an updated patch
ever posted to a different thread as suggested by Andres, or is v5 the
latest patch available?  Is anyone developing a heap64 storage plugin?

I'm in the process of applying the v5 patch set just to understand what was
done, but the patches didn't all apply cleanly and so I've been doing a lot
of manual cleanup.  It's a big patch set.  It's obvious that a huge amount
of work and thought has gone into this.




-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Jim Finnerty
Date:
Looking at the patch set attached by Alexander Korotkov, a large number of
changes were required to update the code to use XID_FMT (INT64_FORMAT)
instead of %u in print format strings.  I think these are infrastructural
changes would be required in any 64-bit implementation, and a separate
low-risk commit could be created to just make these changes.  This patch
would touch 35 files and is already large enough to separate out by itself. 
Does anyone have any objections to such a patch?

One of the other changes proposed by the patch that Alexander posted in this
thread was to refactor the special area for heaps and move pd_prune_xid from
PageHeaderData to HeapPageSpecialData; however, this conflicts with a later
commit, 52ac6cd2d0c, by Andrey Borodin and Alexander Korotkov, to fix a
locking problem with gin indexes:

"Due to binary compatibility we can't change GinPageOpaqueData to store
corresponding transaction id. Instead we reuse page header pd_prune_xid
field, which is unused in index pages."

In other words, prior to the time the 64bit XID patch was written the
pd_prune_xid was in the page header but was unused for index pages, but as
of commit 52ac6cd2d0c, pd_prune_xid became used by gin indexes, so moving it
out of the page header to the heap special data area doesn't work for gin
any more.

If GinPageOpaqueData can't be modified for the reason stated above, then a
different solution would be required for the 64bit XID patch.  pd_prune_xid
is still used in the 64bit XID patch, so I don't think the refactoring saves
any space on a heap page, though it may save 32 bits per page on non-gin
index pages.  

What would the downside(s) be, if any, of leaving pd_prune_xid in the header
in the 64bit patch?




-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: Challenges preventing us moving to 64 bit transaction id (XID)?

From
Jim Finnerty
Date:
I'll be uploading three patches (in a subsequent reply to this thread, so I
can do it via email):

- 64-bit integer GUCs
- Using ClogPageNumber and CLOG_PAGENO_FMT in places where clog pages are
currently declared as int or printed as a %d
- Using XID_FMT in places where xid's are printed with format %u, and
XID_BITS for one or two cases when we need to variant the code based on 32
or 64 bit xids.

These changes don't affect field sizes at all yet.  They compile, build, and
run check-world cleanly.  

64-bit GUCs were used in the original patch set to handle several
autovacuum_* gucs that needed to become 64-bit.  This seems like a useful
thing to commit independent of what we do with xids.

The ClogPageNumber patch gives a name to the kind of page index that is
returned by TransactionIdToPage. The intent is to improve code clarity for
now.  In the original patch it was modified from int to uint64.

In the original patch set xids were formatted using XID_FMT, and XID_FMT was
defined as INT64_FORMAT.  This patch changes all the same places to XID_FMT
but keeps the current format string as %u, so it just cleans things up and
makes the code less sensitive to the assumption that xids are 32 bits.

The big patch(s) after these first three patches reconstructs the logic from
the original patch set manually as well as I was able to manage.  Those
changes have lots of TBDs and FIXMEs and don't necessarily compile, but
should at least apply cleanly for a while.  There are also some residual
changes from the original patch set that I was unable to apply even manually
that will be factored out to a separate pseudo-patch.

A few observations about the big patch to come:

There appears to be widespread agreement that 64-bit xids should be
implemented by the table access method infrastructure, and that we need to
keep going in the direction of using FullTransactionId for the in-memory
representation.  There is also widespread distaste for dealing with
multixacts as part of that.  The original patch set came out a little bit
before table AMs were implemented, so redesigning it to support table AMs is
tbd.  I'm not clear yet about how the CLOG was generalized for 64-bit xid's
except that ClogPageNumbers were based on a 64-bit xid and that the
autovacuum limit was increased to int64 and allowed to have a larger range. 
In an excellent presentation that you can find from the FullTransactionId
wiki page, Thomas Munro suggested that the UNDO infrastructure might be used
as the mechanism to ensure that the effects of very old aborted transactions
in the CLOG would eventually be cleaned up. 

The big patch(s) will be provided just so we don't entirely lose all the
work that went into creating the original patch set. 

patches to follow in subsequent replies



-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



First 3 patches derived from the original 64-bit xid patch set by Alexander Korotkov

On 1/22/21, 5:42 PM, "Jim Finnerty" <jfinnert@amazon.com> wrote:

    I'll be uploading three patches (in a subsequent reply to this thread, so I
    can do it via email):

    - 64-bit integer GUCs
    - Using ClogPageNumber and CLOG_PAGENO_FMT in places where clog pages are
    currently declared as int or printed as a %d
    - Using XID_FMT in places where xid's are printed with format %u, and
    XID_BITS for one or two cases when we need to variant the code based on 32
    or 64 bit xids.

    These changes don't affect field sizes at all yet.  They compile, build, and
    run check-world cleanly.  

    64-bit GUCs were used in the original patch set to handle several
    autovacuum_* gucs that needed to become 64-bit.  This seems like a useful
    thing to commit independent of what we do with xids.

    The ClogPageNumber patch gives a name to the kind of page index that is
    returned by TransactionIdToPage. The intent is to improve code clarity for
    now.  In the original patch it was modified from int to uint64.

    In the original patch set xids were formatted using XID_FMT, and XID_FMT was
    defined as INT64_FORMAT.  This patch changes all the same places to XID_FMT
    but keeps the current format string as %u, so it just cleans things up and
    makes the code less sensitive to the assumption that xids are 32 bits.

...

    -----
    Jim Finnerty, AWS, Amazon Aurora PostgreSQL
    --
    Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Attachment
On 1/22/21 6:46 PM, Finnerty, Jim wrote:
> First 3 patches derived from the original 64-bit xid patch set by Alexander Korotkov

The patches no longer apply 
(http://cfbot.cputube.org/patch_32_2960.log), so marked Waiting on Author.

I also removed the PG14 target since this is a fresh patch set after a 
long hiatus with no new review.

Regards,
-- 
-David
david@pgmasters.net



On Fri, Mar 26, 2021 at 2:57 AM David Steele <david@pgmasters.net> wrote:
> On 1/22/21 6:46 PM, Finnerty, Jim wrote:
> > First 3 patches derived from the original 64-bit xid patch set by Alexander Korotkov
>
> The patches no longer apply
> (http://cfbot.cputube.org/patch_32_2960.log), so marked Waiting on Author.
>
> I also removed the PG14 target since this is a fresh patch set after a
> long hiatus with no new review.

Hi Jim,

I just wanted to say that I'm definitely interested in progress in
this area, and I'm sure many others are too.  Let's talk again about
incremental steps in the PG15 cycle.  The reason for lack of responses
on this thread is most likely due to being at the business end of the
PG14 cycle.



Hi Jim,

On 3/26/21 12:01 AM, Thomas Munro wrote:
> On Fri, Mar 26, 2021 at 2:57 AM David Steele <david@pgmasters.net> wrote:
>> On 1/22/21 6:46 PM, Finnerty, Jim wrote:
>>> First 3 patches derived from the original 64-bit xid patch set by Alexander Korotkov
>>
>> The patches no longer apply
>> (http://cfbot.cputube.org/patch_32_2960.log), so marked Waiting on Author.
>>
>> I also removed the PG14 target since this is a fresh patch set after a
>> long hiatus with no new review.
> 
> I just wanted to say that I'm definitely interested in progress in
> this area, and I'm sure many others are too.  Let's talk again about
> incremental steps in the PG15 cycle.  The reason for lack of responses
> on this thread is most likely due to being at the business end of the
> PG14 cycle.

Indeed. I certainly didn't mean to imply that this work is not valuable, 
just that it is too late to consider it for PG14.

I'm going to move this to the 2021-07 CF and leave it in the Waiting for 
Author state. It would probably be best to wait until just before the CF 
to rebase since anything you do now will likely be broken by the flurry 
of commits that will happen in the next two weeks before feature freeze.

Regards,
-- 
-David
david@pgmasters.net



On Fri, Mar 26, 2021 at 09:54:21AM -0400, David Steele wrote:
> I'm going to move this to the 2021-07 CF and leave it in the Waiting for
> Author state. It would probably be best to wait until just before the CF to
> rebase since anything you do now will likely be broken by the flurry of
> commits that will happen in the next two weeks before feature freeze.

And a couple of months later, the development cycle of 15 has begun.
While re-reading the thread, I got the impression that there is no
reason to not move on with at least the 64-bit GUC part, and that it
could be useful as a piece to move forward with the rest.  Am I
getting the wrong impression?

0001 still applies and compiles, but we don't test this API set at
all.  This could be done by moving one of the existing GUCs to this
new category, but the same can also be achieved with
DefineCustomInt64Variable().  One simple idea would be to change
one of the GUCs of worker_spi to int64, like worker_spi.naptime with
some of the hooks set for coverage, in combination with some new SHOW
commands in the existing regression tests.

Thoughts?
--
Michael

Attachment
On Tue, Sep 7, 2021 at 12:20 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Mar 26, 2021 at 09:54:21AM -0400, David Steele wrote:
> > I'm going to move this to the 2021-07 CF and leave it in the Waiting for
> > Author state. It would probably be best to wait until just before the CF to
> > rebase since anything you do now will likely be broken by the flurry of
> > commits that will happen in the next two weeks before feature freeze.
>
> And a couple of months later, the development cycle of 15 has begun.
> While re-reading the thread, I got the impression that there is no
> reason to not move on with at least the 64-bit GUC part, and that it
> could be useful as a piece to move forward with the rest.  Am I
> getting the wrong impression?

That's also my understanding, so +1 to apply that soon

> 0001 still applies and compiles, but we don't test this API set at
> all.  This could be done by moving one of the existing GUCs to this
> new category, but the same can also be achieved with
> DefineCustomInt64Variable().  One simple idea would be to change
> one of the GUCs of worker_spi to int64, like worker_spi.naptime with
> some of the hooks set for coverage, in combination with some new SHOW
> commands in the existing regression tests.

+1



On Tue, Sep 07, 2021 at 02:38:13PM +0800, Julien Rouhaud wrote:
> On Tue, Sep 7, 2021 at 12:20 PM Michael Paquier <michael@paquier.xyz> wrote:
>> And a couple of months later, the development cycle of 15 has begun.
>> While re-reading the thread, I got the impression that there is no
>> reason to not move on with at least the 64-bit GUC part, and that it
>> could be useful as a piece to move forward with the rest.  Am I
>> getting the wrong impression?
>
> That's also my understanding, so +1 to apply that soon

I have been looking at this part, and there are a couple of things
that the patch is doing wrong, while there are other cases that we had
better support for consistency with the 32b case:
- No checks after decimal '.', 'e' and 'E' after the initial parsing
phase, but I think that we should allow those cases and then go
through strtod() or just strtold().  For example 1.1 or just 1e1 are
supported values.
- No support for units, the code failing if there is any trailing
character.  This should be flexible enough to allow any number of
spaces between the value and its units.
- No support for octal and hex values.

+   for (i = 0; int64RelOpts[i].gen.name; i++)
+   {
+       Assert(DoLockModesConflict(int64RelOpts[i].gen.lockmode,
+                                  int64RelOpts[i].gen.lockmode));
+       j++;
+   }
+   for (i = 0; int64RelOpts[i].gen.name; i++)
+   {
+       Assert(DoLockModesConflict(int64RelOpts[i].gen.lockmode,
+                                  int64RelOpts[i].gen.lockmode));
+       j++;
+   }
This loop is repeated twice, this is required only once.

+#ifdef _MSC_VER                    /* MSVC only */
+   val = _strtoi64(value, &endptr, 10);
+#elif defined(HAVE_STRTOLL) && SIZEOF_LONG < 8
+   val = strtoll(value, &endptr, 10);
+#else
+   val = strtol(value, &endptr, 10);
+#endif
This is a copy-paste of pg_strtouint64() in numutils.c.  We could just
add a similar routine there, as this code path cannot generate elog()s
and similar reports by itself.  This code had better use 0 for the
base number, to be able to handle the hex and octal cases.

On top of the tests for needed for custom GUCs, this needs tests for
the new int64 reloption.  I would suggest to add something in
dummy_index_am, where we test all the reloption APIs.
--
Michael

Attachment
On Wed, Sep 08, 2021 at 03:08:28PM +0900, Michael Paquier wrote:
> On top of the tests for needed for custom GUCs, this needs tests for
> the new int64 reloption.  I would suggest to add something in
> dummy_index_am, where we test all the reloption APIs.

My review here was three weeks ago, and there has been no replies from
the author, so I am marking this patch set as RwF.
--
Michael

Attachment