Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Date
Msg-id 9fa7bef7-6a08-dd6e-b324-a69ccb30eb60@iki.fi
Whole thread Raw
In response to Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)  (Aleksander Alekseev <aleksander@timescale.com>)
Responses Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)  (Maxim Orlov <orlovmg@gmail.com>)
List pgsql-hackers
On 01/03/2023 12:21, Aleksander Alekseev wrote:
> Hi,
> 
>> I'm surprised that these patches extend the page numbering to 64 bits,
>> but never actually uses the high bits. The XID "epoch" is not used, and
>> pg_xact still wraps around and the segment names are still reused. I
>> thought we could stop doing that.
> 
> To clarify, the idea is to let CLOG grow indefinitely and simply store
> FullTransactionId -> TransactionStatus (two bits). Correct?

Correct.

> I didn't investigate this in much detail but it may affect quite some
> amount of code since TransactionIdDidCommit() and
> TransactionIdDidCommit() currently both deal with TransactionId, not
> FullTransactionId. IMO, this would be a nice change however, assuming
> we are ready for it.

Yep, it's a lot of code churn..

> In the previous version of the patch there was an attempt to derive
> FullTransactionId from TransactionId but it was wrong for the reasons
> named above in the thread. Thus is was removed and the patch
> simplified.

Yeah, it's tricky to get it right. Clearly we need to do it at some 
point though.

All in all, this is a big effort. I spent some more time reviewing this 
in the last few days, and thought a lot about what the path forward here 
could be. And I haven't looked at the actual 64-bit XIDs patch set yet, 
just this patch to use 64-bit addressing in SLRUs.

This patch is the first step, but we have a bit of a chicken and egg 
problem, because this patch on its own isn't very interesting, but on 
the other hand, we need it to work on the follow up items. Here's how I 
see the development path for this (and again, this is just for the 
64-bit SLRUs work, not the bigger 64-bit-XIDs-in-heapam effort):

1. Use 64 bit page numbers in SLRUs (this patch)

I would like to make one change here: I'd prefer to keep the old 4-digit 
segment names, until we actually start to use the wider address space. 
Let's allow each SLRU to specify how many digits to use in the 
filenames, so that we convert one SLRU at a time.

If we do that, and don't change any of the existing SLRUs to actually 
use the wider space of page and segment numbers yet, this patch becomes 
just refactoring with no on-disk format changes. No pg_upgrade needed.

The next patches will start to make use of the wider address space, one 
SLRU at a time.

2. Use the larger segment file names in async.c, to lift the current 8 
GB limit on the max number of pending notifications.

No one actually minds the limit, it's quite generous as it is. But there 
is some code and complexity in async.c to avoid the wraparound that 
could be made simpler if we used longer SLRU segment names and avoided 
the wraparound altogether.

I wonder if we should actually add an artificial limit, as a GUC. If 
there are gigabytes of notifications queued up, something's probably 
wrong with the system, and you're not going to be happy if we just 
remove the limit so it can grow to terabytes until you run out of disk 
space.

3. Extend pg_multixact so that pg_multixact/members is addressed by 
64-bit offsets.

Currently, multi-XIDs can wrap around, requiring anti-wraparound 
freezing, but independently of that, the pg_multixact/members SLRU can 
also wrap around. We track both, and trigger anti-wraparound if either 
SLRU is about to wrap around. If we redefine MultiXactOffset as a 64-bit 
integer, we can avoid the pg_multixact/members wraparound altogether. A 
downside is that pg_multixact/offsets will take twice as much space, but 
I think that's a good tradeoff. Or perhaps we can play tricks like store 
a single 64-bit offset on each pg_multixact/offsets page, and a 32-bit 
offset from that for each XID, to avoid making it so much larger.

This would reduce the need to do anti-wraparound VACUUMs on systems that 
use multixacts heavily. Needs pg_upgrade support.

4. Extend pg_subtrans to 64-bits.

This isn't all that interesting because the active region of pg_subtrans 
cannot be wider than 32 bits anyway, because you'll still reach the 
general 32-bit XID wraparound. But it might be less confusing in some 
places.

I actually started to write a patch to do this, to see how complicated 
it is. It quickly proliferates into expanding other XIDs to 64-bits, 
like TransactionXmin, frozenXid calculation in vacuum.c, known-assigned 
XID tracking in procarray.c. etc. It's going to be necessary to convert 
32-bit XIDs to FullTransactionIds at some boundaries, and I'm not sure 
where exactly that should happen. It's easier to do the conversions 
close to subtrans.c, but then I'm not sure how much it gets us in terms 
of reducing confusion. It's easy to get confused with the epochs during 
conversions, as you noted. On the other hand, if we change much more of 
the backend to use FullTransactionIds, the patch becomes much more invasive.

Nice thing with pg_subtrans, though, is that it doesn't require 
pg_upgrade support.

5. Extend pg_xact to 64-bits.

Similar to pg_subtrans, really, but needs pg_upgrade support.

6. (a bonus thing that I noticed while thinking of pg_xact.) Extend 
pg_twophase.c, to use FullTransactionIds.

Currently, the twophase state files in pg_twophase are named according 
to the 32 bit Xid of the transaction. Let's switch to FullTransactionId 
there.



As we start to refactor these things, I also think it would be good to 
have more explicit tracking of the valid range of SLRU pages in each 
SLRU. Take pg_subtrans for example: it's not very clear what pages have 
been initialized, especially during different stages of startup. It 
would be good to have clear start and end page numbers, and throw an 
error if you try to look up anything outside those bounds. Same for all 
other SLRUs.

I propose that we try to finish 1 and 2 for v16. And maybe 6. I think 
that's doable. It doesn't have any great user-visible benefits yet, but 
we need to start somewhere.

- Heikki




pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: add PROCESS_MAIN to VACUUM
Next
From: "Gregory Stark (as CFM)"
Date:
Subject: Re: [PATCH] Support % wildcard in extension upgrade filenames