Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15) - Mailing list pgsql-hackers

From Maxim Orlov
Subject Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Date
Msg-id CACG=ezb0_zsf1Ek=RxzpL+J1vs-eYSHat45GMya+LGXpPbr_Ow@mail.gmail.com
Whole thread Raw
In response to Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On Tue, 17 Jan 2023 at 16:33, Aleksander Alekseev <aleksander@timescale.com> wrote:
Hi hackers,

Maxim, perhaps you could share with us what your reasoning was here?

I'm really sorry for late response, but better late than never. Yes, we can not access shared memory without lock.
In this particular case, we use XidGenLock. That is why we use lock argument to take it is it was not taken previously.
Actually, we may place assertion in this insist.

As for xid compare: we do not compare xids here, we are checking for wraparound, so, AFAICS, this code is correct.



On Mon, 6 Mar 2023 at 22:48, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

1. Use 64 bit page numbers in SLRUs (this patch)

2. Use the larger segment file names in async.c, to lift the current 8
GB limit on the max number of pending notifications.

3. Extend pg_multixact so that pg_multixact/members is addressed by
64-bit offsets.

4. Extend pg_subtrans to 64-bits.

5. Extend pg_xact to 64-bits.


6. (a bonus thing that I noticed while thinking of pg_xact.) Extend
pg_twophase.c, to use FullTransactionIds.

Currently, the twophase state files in pg_twophase are named according
to the 32 bit Xid of the transaction. Let's switch to FullTransactionId
there.

...

I propose that we try to finish 1 and 2 for v16. And maybe 6. I think
that's doable. It doesn't have any great user-visible benefits yet, but
we need to start somewhere.

- Heikki

Yes, this is a great idea! My only concern here is that we're going in circles here. You see, patch 1 is what was proposed
in the beginning of this thread. Anyway, I will be happy if we are being able to push this topic forward.

As for making pg_multixact 64 bit, I spend the last couple of days to make proper pg_upgrade for pg_multixact's and for pg_xact's
with wraparound and I've understood, that it is not a simple task compare to pg_xact's. The problem is, we do not have epoch for
multixacts, so we do not have ability to "overcome" wraparound. The solution may be adding some kind of epoch for multixacts or
make them 64 bit in "main" 64-xid patch, but in perspective of this thread, in my view, this should be last in line here.

In pg_xact we do not have such a problem, we do have epoch for transacions, so conversion should be pretty obvious:
0000 -> 000000000000
0001 -> 000000000001
...
0FFE -> 000000000FFE
0FFF -> 000000000FFF
0000 -> 000000010000
0001 -> 000000010001

So, in my view, the plan should be:
1. Use internal 64 bit page numbers in SLRUs without changing segments naming.
2. Use the larger segment file names in async.c, to lift the current 8 GB limit on the max number of pending notifications.
3. Extend pg_xact to 64-bits.
4. Extend pg_subtrans to 64-bits.
5. Extend pg_multixact so that pg_multixact/members is addressed by 64-bit offsets.
6. Extend pg_twophase.c, to use FullTransactionIds. (a bonus thing)

Thoughts?

--
Best regards,
Maxim Orlov.

On Mon, 6 Mar 2023 at 22:48, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 01/03/2023 12:21, Aleksander Alekseev wrote:
> Hi,
>
>> I'm surprised that these patches extend the page numbering to 64 bits,
>> but never actually uses the high bits. The XID "epoch" is not used, and
>> pg_xact still wraps around and the segment names are still reused. I
>> thought we could stop doing that.
>
> To clarify, the idea is to let CLOG grow indefinitely and simply store
> FullTransactionId -> TransactionStatus (two bits). Correct?

Correct.

> I didn't investigate this in much detail but it may affect quite some
> amount of code since TransactionIdDidCommit() and
> TransactionIdDidCommit() currently both deal with TransactionId, not
> FullTransactionId. IMO, this would be a nice change however, assuming
> we are ready for it.

Yep, it's a lot of code churn..

> In the previous version of the patch there was an attempt to derive
> FullTransactionId from TransactionId but it was wrong for the reasons
> named above in the thread. Thus is was removed and the patch
> simplified.

Yeah, it's tricky to get it right. Clearly we need to do it at some
point though.

All in all, this is a big effort. I spent some more time reviewing this
in the last few days, and thought a lot about what the path forward here
could be. And I haven't looked at the actual 64-bit XIDs patch set yet,
just this patch to use 64-bit addressing in SLRUs.

This patch is the first step, but we have a bit of a chicken and egg
problem, because this patch on its own isn't very interesting, but on
the other hand, we need it to work on the follow up items. Here's how I
see the development path for this (and again, this is just for the
64-bit SLRUs work, not the bigger 64-bit-XIDs-in-heapam effort):

1. Use 64 bit page numbers in SLRUs (this patch)

I would like to make one change here: I'd prefer to keep the old 4-digit
segment names, until we actually start to use the wider address space.
Let's allow each SLRU to specify how many digits to use in the
filenames, so that we convert one SLRU at a time.

If we do that, and don't change any of the existing SLRUs to actually
use the wider space of page and segment numbers yet, this patch becomes
just refactoring with no on-disk format changes. No pg_upgrade needed.

The next patches will start to make use of the wider address space, one
SLRU at a time.

2. Use the larger segment file names in async.c, to lift the current 8
GB limit on the max number of pending notifications.

No one actually minds the limit, it's quite generous as it is. But there
is some code and complexity in async.c to avoid the wraparound that
could be made simpler if we used longer SLRU segment names and avoided
the wraparound altogether.

I wonder if we should actually add an artificial limit, as a GUC. If
there are gigabytes of notifications queued up, something's probably
wrong with the system, and you're not going to be happy if we just
remove the limit so it can grow to terabytes until you run out of disk
space.

3. Extend pg_multixact so that pg_multixact/members is addressed by
64-bit offsets.

Currently, multi-XIDs can wrap around, requiring anti-wraparound
freezing, but independently of that, the pg_multixact/members SLRU can
also wrap around. We track both, and trigger anti-wraparound if either
SLRU is about to wrap around. If we redefine MultiXactOffset as a 64-bit
integer, we can avoid the pg_multixact/members wraparound altogether. A
downside is that pg_multixact/offsets will take twice as much space, but
I think that's a good tradeoff. Or perhaps we can play tricks like store
a single 64-bit offset on each pg_multixact/offsets page, and a 32-bit
offset from that for each XID, to avoid making it so much larger.

This would reduce the need to do anti-wraparound VACUUMs on systems that
use multixacts heavily. Needs pg_upgrade support.

4. Extend pg_subtrans to 64-bits.

This isn't all that interesting because the active region of pg_subtrans
cannot be wider than 32 bits anyway, because you'll still reach the
general 32-bit XID wraparound. But it might be less confusing in some
places.

I actually started to write a patch to do this, to see how complicated
it is. It quickly proliferates into expanding other XIDs to 64-bits,
like TransactionXmin, frozenXid calculation in vacuum.c, known-assigned
XID tracking in procarray.c. etc. It's going to be necessary to convert
32-bit XIDs to FullTransactionIds at some boundaries, and I'm not sure
where exactly that should happen. It's easier to do the conversions
close to subtrans.c, but then I'm not sure how much it gets us in terms
of reducing confusion. It's easy to get confused with the epochs during
conversions, as you noted. On the other hand, if we change much more of
the backend to use FullTransactionIds, the patch becomes much more invasive.

Nice thing with pg_subtrans, though, is that it doesn't require
pg_upgrade support.

5. Extend pg_xact to 64-bits.

Similar to pg_subtrans, really, but needs pg_upgrade support.

6. (a bonus thing that I noticed while thinking of pg_xact.) Extend
pg_twophase.c, to use FullTransactionIds.

Currently, the twophase state files in pg_twophase are named according
to the 32 bit Xid of the transaction. Let's switch to FullTransactionId
there.



As we start to refactor these things, I also think it would be good to
have more explicit tracking of the valid range of SLRU pages in each
SLRU. Take pg_subtrans for example: it's not very clear what pages have
been initialized, especially during different stages of startup. It
would be good to have clear start and end page numbers, and throw an
error if you try to look up anything outside those bounds. Same for all
other SLRUs.

I propose that we try to finish 1 and 2 for v16. And maybe 6. I think
that's doable. It doesn't have any great user-visible benefits yet, but
we need to start somewhere.

- Heikki



--
Best regards,
Maxim Orlov.

pgsql-hackers by date:

Previous
From: Önder Kalacı
Date:
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Next
From: Matthias van de Meent
Date:
Subject: Re: Add pg_walinspect function with block info columns