Thread: POC: make mxidoff 64 bits

POC: make mxidoff 64 bits

From

Maxim Orlov

Date:

23 April 2024, 08:23:41

Hi!

I've been trying to introduce 64-bit transaction identifications to Postgres for quite a while [0]. All this implies, 

of course, an enormous amount of change that will have to take place in various modules. Consider this, the 

patch set become too big to be committed “at once”.

The obvious solutions was to try to split the patch set into smaller ones. But here it comes a new challenge, 

not every one of these parts, make Postgres better at the moment. Actually, even switching to a 

FullTransactionId in PGPROC will have disadvantage in increasing of WAL size [1].

In fact, I believe, we're dealing with the chicken and the egg problem here. Not able to commit full patch set 

since it is too big to handle and not able to commit parts of it, since they make sense all together and do not 

help improve Postgres at the moment.

But it's not that bad. Since the commit 4ed8f0913bfdb5f, added in [3], we are capable to use 64 bits to 

indexing SLRUs.

PROPOSAL

Make multixact offsets 64 bit.

RATIONALE
It is not very difficult to overflow 32-bit mxidoff. Since, it is created for every unique combination of the 

transaction for each tuple, including XIDs and respective flags. And when a transaction is added to a 

specific multixact, it is rewritten with a new offset. In other words, it is possible to overflow the offsets of 

multixacts without overflowing the multixacts themselves and/or ordinary transactions. I believe, there 

was something about in the hackers mail lists, but I could not find links now.

PFA, patch. Here is a WIP version. Upgrade machinery should be added later.

As always, any opinions on a subject a very welcome!

[0] https://www.postgresql.org/message-id/flat/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com

[1] https://www.postgresql.org/message-id/flat/CACG%3DezY7msw%2Bjip%3Drtfvnfz051dRqz4s-diuO46v3rAoAE0T0g%40mail.gmail.com

[3] https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com

Best regards,

Maxim Orlov.

Attachment

0001-WIP-mxidoff-to-64bit.patch

Re: POC: make mxidoff 64 bits

From

Heikki Linnakangas

Date:

23 April 2024, 09:37:40

On 23/04/2024 11:23, Maxim Orlov wrote:
> PROPOSAL
> Make multixact offsets 64 bit.

+1, this is a good next step and useful regardless of 64-bit XIDs.

> @@ -156,7 +148,7 @@
>          ((uint32) ((0xFFFFFFFF % MULTIXACT_MEMBERS_PER_PAGE) + 1))
>  
>  /* page in which a member is to be found */
> -#define MXOffsetToMemberPage(xid) ((xid) / (TransactionId) MULTIXACT_MEMBERS_PER_PAGE)
> +#define MXOffsetToMemberPage(xid) ((xid) / (MultiXactOffset) MULTIXACT_MEMBERS_PER_PAGE)
>  #define MXOffsetToMemberSegment(xid) (MXOffsetToMemberPage(xid) / SLRU_PAGES_PER_SEGMENT)
>  
>  /* Location (byte offset within page) of flag word for a given member */

This is really a bug fix. It didn't matter when TransactionId and 
MultiXactOffset were both typedefs of uint32, but it was always wrong. 
The argument name 'xid' is also misleading.

I think there are some more like that, MXOffsetToFlagsBitShift for example.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: POC: make mxidoff 64 bits

From

"Andrey M. Borodin"

Date:

23 April 2024, 13:02:56


> On 23 Apr 2024, at 11:23, Maxim Orlov <orlovmg@gmail.com> wrote:
>
> Make multixact offsets 64 bit.

-        ereport(ERROR,
-                (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-                 errmsg("multixact \"members\" limit exceeded"),
Personally, I'd be happy with this! We had some incidents where the only mitigation was vacuum settings tweaking.

BTW as a side note... I see lot's of casts to (unsigned long long), can't we just cast to MultiXactOffset?


Best regards, Andrey Borodin.

Re: POC: make mxidoff 64 bits

From

wenhui qiu

Date:

23 April 2024, 13:03:06

Hi Maxim Orlov
Thank you so much for your tireless work on this. Increasing the WAL size by a few bytes should have very little impact with today's disk performance(Logical replication of this feature wal log is also increased a lot, logical replication is a milestone new feature, and the community has been improving the logical replication of functions),I believe removing troubled postgresql Transaction ID Wraparound was also a milestone new feature adding a few bytes is worth it!

Best regards

On Tue, 23 Apr 2024 at 17:37, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 23/04/2024 11:23, Maxim Orlov wrote:
> PROPOSAL
> Make multixact offsets 64 bit.

+1, this is a good next step and useful regardless of 64-bit XIDs.

> @@ -156,7 +148,7 @@
> ((uint32) ((0xFFFFFFFF % MULTIXACT_MEMBERS_PER_PAGE) + 1))
>
> /* page in which a member is to be found */
> -#define MXOffsetToMemberPage(xid) ((xid) / (TransactionId) MULTIXACT_MEMBERS_PER_PAGE)
> +#define MXOffsetToMemberPage(xid) ((xid) / (MultiXactOffset) MULTIXACT_MEMBERS_PER_PAGE)
> #define MXOffsetToMemberSegment(xid) (MXOffsetToMemberPage(xid) / SLRU_PAGES_PER_SEGMENT)
>
> /* Location (byte offset within page) of flag word for a given member */

This is really a bug fix. It didn't matter when TransactionId and
MultiXactOffset were both typedefs of uint32, but it was always wrong.
The argument name 'xid' is also misleading.

I think there are some more like that, MXOffsetToFlagsBitShift for example.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: POC: make mxidoff 64 bits

From

Maxim Orlov

Date:

25 April 2024, 14:20:53

On Tue, 23 Apr 2024 at 12:37, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

This is really a bug fix. It didn't matter when TransactionId and
MultiXactOffset were both typedefs of uint32, but it was always wrong.
The argument name 'xid' is also misleading.

I think there are some more like that, MXOffsetToFlagsBitShift for example.

Yeah, I always thought so too. I believe, this is just a copy-paste. You mean, it is worth creating a separate CF

entry for these fixes?

On Tue, 23 Apr 2024 at 16:03, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

BTW as a side note... I see lot's of casts to (unsigned long long), can't we just cast to MultiXactOffset?

Actually, first versions of the 64xid patch set have such a cast to types TransactionID, MultiXact and so on. But,

after some discussions, we are switched to unsigned long long cast. Unfortunately, I could not find an exact link

for that discussion. On the other hand, such a casting is already used throughout the code. So, just for the

sake of the consistency, I would like to stay with these casts.

On Tue, 23 Apr 2024 at 16:03, wenhui qiu <qiuwenhuifx@gmail.com> wrote:

Hi Maxim Orlov
Thank you so much for your tireless work on this. Increasing the WAL size by a few bytes should have very little impact with today's disk performance(Logical replication of this feature wal log is also increased a lot, logical replication is a milestone new feature, and the community has been improving the logical replication of functions),I believe removing troubled postgresql Transaction ID Wraparound was also a milestone new feature adding a few bytes is worth it!

I'm 100% agree. Maybe, I should return to this approach and find some benefits for having FXIDs in WAL.

Re: POC: make mxidoff 64 bits

From

Alexander Korotkov

Date:

03 September 2024, 16:32:46

On Tue, Sep 3, 2024 at 4:30 PM Maxim Orlov <orlovmg@gmail.com> wrote:
> Here is rebase.  Apparently I'll have to do it often, since the CATALOG_VERSION_NO changed in the patch.

I don't think you need to maintain CATALOG_VERSION_NO change in your
patch for the exact reason you have mentioned: patch will get conflict
each time CATALOG_VERSION_NO is advanced.  It's responsibility of
committer to advance CATALOG_VERSION_NO when needed.

------
Regards,
Alexander Korotkov
Supabase

Re: POC: make mxidoff 64 bits

From

Maxim Orlov

Date:

04 September 2024, 11:49:32

On Tue, 3 Sept 2024 at 16:32, Alexander Korotkov <aekorotkov@gmail.com> wrote:

I don't think you need to maintain CATALOG_VERSION_NO change in your
patch for the exact reason you have mentioned: patch will get conflict
each time CATALOG_VERSION_NO is advanced. It's responsibility of
committer to advance CATALOG_VERSION_NO when needed.

OK, I got it. My intention here was to help to test the patch. If someone wants to have a

look at the patch, he won't need to make changes in the code. In the next iteration, I'll

remove CATALOG_VERSION_NO version change.

Best regards,

Maxim Orlov.

Re: POC: make mxidoff 64 bits

From

Pavel Borisov

Date:

12 September 2024, 15:09:08

Hi, Maxim!

Previously we accessed offsets in shared MultiXactState without locks as 32-bit read is always atomic. But I'm not sure it's so when offset become 64-bit.

E.g. GetNewMultiXactId():

nextOffset = MultiXactState->nextOffset;

is outside lock.

There might be other places we do the same as well.

Regards,

Pavel Borisov

Supabase

Re: POC: make mxidoff 64 bits

From

Pavel Borisov

Date:

12 September 2024, 15:25:53

On Thu, 12 Sept 2024 at 16:09, Pavel Borisov <pashkin.elfe@gmail.com> wrote:

Hi, Maxim!

Previously we accessed offsets in shared MultiXactState without locks as 32-bit read is always atomic. But I'm not sure it's so when offset become 64-bit.
E.g. GetNewMultiXactId():

nextOffset = MultiXactState->nextOffset;
is outside lock.

There might be other places we do the same as well.

I think the replacement of plain assignments by pg_atomic_read_u64/pg_atomic_write_u64 would be sufficient.

(The same I think is needed for the patchset [1])

[1] https://www.postgresql.org/message-id/flat/CAJ7c6TMvPz8q+nC=JoKniy7yxPzQYcCTnNFYmsDP-nnWsAOJ2g@mail.gmail.com

Regards,

Pavel Borisov

Re: POC: make mxidoff 64 bits

From

Alvaro Herrera

Date:

12 September 2024, 16:14:08

On 2024-Sep-12, Pavel Borisov wrote:

> Hi, Maxim!
> 
> Previously we accessed offsets in shared MultiXactState without locks as
> 32-bit read is always atomic. But I'm not sure it's so when offset become
> 64-bit.
> E.g. GetNewMultiXactId():
> 
> nextOffset = MultiXactState->nextOffset;
> is outside lock.

Good though.  But fortunately I think it's not a problem.  The one you
say is with MultiXactGetLock held in shared mode -- and that works OK,
as the assignment (in line 1263 at the bottom of the same routine) is
done with exclusive lock held.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: POC: make mxidoff 64 bits

From

Heikki Linnakangas

Date:

22 October 2024, 12:43:34

On 07/09/2024 07:36, Maxim Orlov wrote:
> Here is v3.

MultiXactMemberFreezeThreshold looks quite bogus now. Now that 
MaxMultiXactOffset==2^64-1, you cannot get anywhere near the 
MULTIXACT_MEMBER_SAFE_THRESHOLD and MULTIXACT_MEMBER_DANGER_THRESHOLD 
values anymore. Can we just get rid of MultiXactMemberFreezeThreshold? I 
guess it would still be useful to trigger autovacuum if multixacts 
members grows large though, to release the disk space, even if you can't 
run out of members as such anymore. What should the logic for that look 
like?

I'd love to see some tests for the pg_upgrade code. Something like a 
little perl script to generate test clusters with different wraparound 
scenarios etc. using the old version, and a TAP test to run pg_upgrade 
on them and verify that queries on the upgraded cluster works correctly. 
We don't have tests like that in the repository today, and I don't know 
if we'd want to commit these permanently either, but it would be highly 
useful now as a one-off thing, to show that the code works.

On upgrade, are there really no changes required to 
pg_multixact/members? I imagined that the segment files would need to be 
renamed around wraparound, so that if you previously had files like this:

pg_multixact/members/FFFE
pg_multixact/members/FFFF
pg_multixact/members/0000
pg_multixact/members/0001

after upgrade you would need to have:

pg_multixact/members/00000000FFFE
pg_multixact/members/00000000FFFF
pg_multixact/members/000000010000
pg_multixact/members/000000010001


Thanks for working on this!

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: POC: make mxidoff 64 bits

From

wenhui qiu

Date:

25 October 2024, 06:38:57

HI Maxim Orlov

> After a bit of thought, I've realized that to be conservative here is the way to go.

>We can reuse a maximum of existing logic. I mean, we can remove offset wraparound "error logic" and reuse "warning logic". But set the threshold for "warning >logic" to a much higher value. For now, I choose 2^32-1. In other world, legit logic, in my view, here would be to trigger autovacuum if the number of offsets (i.e. >difference nextOffset - oldestOffset) exceeds 2^32-1. PFA patch set.

good point ,Couldn't agree with you more. xid64 is the solution to the wraparound problem,The previous error log is no longer meaningful ,But we might want to refine the output waring log a little(For example, checking the underlying reasons why age has been increasing),Though we don't have to worry about xid wraparound

+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuum in order to release the disk space if possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)

Can we refine this annotation a bit? for example

+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuum in order to release the disk space ,reduce table bloat if possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)

Thanks

Maxim Orlov <orlovmg@gmail.com> 于2024年10月23日周三 23:55写道：

After a bit of thought, I've realized that to be conservative here is the way to go.

We can reuse a maximum of existing logic. I mean, we can remove offset wraparound "error logic" and reuse "warning logic". But set the threshold for "warning logic" to a much higher value. For now, I choose 2^32-1. In other world, legit logic, in my view, here would be to trigger autovacuum if the number of offsets (i.e. difference nextOffset - oldestOffset) exceeds 2^32-1. PFA patch set.

--
Best regards,
Maxim Orlov.

Re: POC: make mxidoff 64 bits

From

Heikki Linnakangas

Date:

15 November 2024, 14:06:00

On 13/11/2024 17:44, Maxim Orlov wrote:
> On Tue, 12 Nov 2024 at 02:31, Heikki Linnakangas <hlinnaka@iki.fi 
> <mailto:hlinnaka@iki.fi>> wrote:
>     On a different note, I'm surprised you're rewriting member segments
>     from
>     scratch, parsing all the individual member groups and writing them out
>     again. There's no change to the members file format, except for the
>     numbering of the files, so you could just copy the files under the new
>     names without paying attention to the contents. It's not wrong to parse
>     them in detail, but I'd assume that it would be simpler not to.
> 
> Yes, at the beginning I also thought that it would be possible to get by 
> with simple copying. But in case of wraparound, we must "bypass" invalid 
> zero offset value. See, old 32 bit offsets a wrapped at 2^32, thus 0 
> values appears in multixact.c So, they must be handled. Bypass, in fact. 
> When we are switched to the 64-bit offsets, we have two options:
> 1). Bypass every ((uint32) offset == 0) value in multixact.c;
> 2). Convert members and bypass invalid value once.
> 
> The first options seem too weird for me. So, we have to repack members 
> and bypass invalid value.

Hmm, so if I understand correctly, this is related to how we determine 
the length of the members array, by looking at the next multixid's 
offset. This is explained in GetMultiXactIdMembers:

>     /*
>      * Find out the offset at which we need to start reading MultiXactMembers
>      * and the number of members in the multixact.  We determine the latter as
>      * the difference between this multixact's starting offset and the next
>      * one's.  However, there are some corner cases to worry about:
>      *
>      * 1. This multixact may be the latest one created, in which case there is
>      * no next one to look at.  In this case the nextOffset value we just
>      * saved is the correct endpoint.
>      *
>      * 2. The next multixact may still be in process of being filled in: that
>      * is, another process may have done GetNewMultiXactId but not yet written
>      * the offset entry for that ID.  In that scenario, it is guaranteed that
>      * the offset entry for that multixact exists (because GetNewMultiXactId
>      * won't release MultiXactGenLock until it does) but contains zero
>      * (because we are careful to pre-zero offset pages). Because
>      * GetNewMultiXactId will never return zero as the starting offset for a
>      * multixact, when we read zero as the next multixact's offset, we know we
>      * have this case.  We handle this by sleeping on the condition variable
>      * we have just for this; the process in charge will signal the CV as soon
>      * as it has finished writing the multixact offset.
>      *
>      * 3. Because GetNewMultiXactId increments offset zero to offset one to
>      * handle case #2, there is an ambiguity near the point of offset
>      * wraparound.  If we see next multixact's offset is one, is that our
>      * multixact's actual endpoint, or did it end at zero with a subsequent
>      * increment?  We handle this using the knowledge that if the zero'th
>      * member slot wasn't filled, it'll contain zero, and zero isn't a valid
>      * transaction ID so it can't be a multixact member.  Therefore, if we
>      * read a zero from the members array, just ignore it.
>      *
>      * This is all pretty messy, but the mess occurs only in infrequent corner
>      * cases, so it seems better than holding the MultiXactGenLock for a long
>      * time on every multixact creation.
>      */

With 64-bit offsets, can we assume that it never wraps around? We often 
treat 2^64 as "large enough that we'll never run out", e.g. LSNs are 
also assumed to never wrap around. I think that would be a safe 
assumption here too.

If we accept that, we don't need to worry about case 3 anymore. But if 
we upgrade wrapped-around members files by just renaming them, there 
could still be a members array where we had skipped offset 0, and 
reading that after the upgrade might get confused. We could continue to 
ignore a 0 XID in the members array like the comment says; I think that 
would be enough. But yeah, maybe it's better to bite the bullet in 
pg_upgrade and squeeze those out.

Does your upgrade test suite include case 3, where the next multixact's 
offset is 1?


Can we remove MaybeExtendOffsetSlru() now? There are a bunch of other 
comments and checks that talk about binary-upgraded values too that we 
can hopefully clean up now.


If we are to parse the member segments in detail in upgrade anyway, I'd 
be tempted to make some further changes / optimizations:

-  You could leave out all locking XID members in upgrade, because 
they're not relevant after upgrade any more (all the XIDs will be 
committed or aborted and have released the locks; we require prepared 
transactions to be completed before upgrading too). It'd be enough to 
include actual UPDATE/DELETE XIDs.

- The way we determine the length of the members array by looking at the 
next multixid's offset is a bit complicated. We could have one extra 
flag per XID in the members to indicate "this is the last member of this 
multixid". That could either to replace the current mechanism of looking 
at the next offset, or be just an additional cross-check.

- Do we still like the "group" representation, with 4 bytes of flags 
followed by 4 XIDs? I wonder if it'd be better to just store 5 bytes per 
XID unaligned.

- A more radical idea: There can be only one updating XID in one 
multixid. We could store that directly in the offsets SLRU, and keep 
only the locking XIDs in members. That way, the members SLRU would 
become less critical; it could be safely reset on crash for example 
(except for prepared transactions, which could still be holding locks, 
but it'd still be less serious). Separating correctness-critical data 
from more ephemeral state is generally a good idea.

I'm not insisting on any of these changes, just some things that might 
be worth considering if we're rewriting the SLRUs on upgrade anyway.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: POC: make mxidoff 64 bits

From

Maxim Orlov

Date:

15 November 2024, 19:19:07

On Fri, 15 Nov 2024 at 14:06, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Hmm, so if I understand correctly, this is related to how we determine
the length of the members array, by looking at the next multixid's
offset. This is explained in GetMultiXactIdMembers:

Correct.

If we accept that, we don't need to worry about case 3 anymore. But if
we upgrade wrapped-around members files by just renaming them, there
could still be a members array where we had skipped offset 0, and
reading that after the upgrade might get confused. We could continue to
ignore a 0 XID in the members array like the comment says; I think that
would be enough. But yeah, maybe it's better to bite the bullet in
pg_upgrade and squeeze those out.

Correct. I couldn't explain this better. I'm more for the squeeze those out. Overwise, we're ending up in adding another hack in multixact, but one of the benefits from switching to 64-bits, it should make XID's logic more straight forward. After all, mxact juggling in pg_upgrade is one time inconvenience.

Does your upgrade test suite include case 3, where the next multixact's
offset is 1?

Not exactly.

simple
Latest checkpoint's NextMultiXactId: 119441
Latest checkpoint's NextMultiOffset: 5927049

offset-wrap
Latest checkpoint's NextMultiXactId: 119441
Latest checkpoint's NextMultiOffset: 5591183

multi-wrap
Latest checkpoint's NextMultiXactId: 82006
Latest checkpoint's NextMultiOffset: 7408811

offset-multi-wrap
Latest checkpoint's NextMultiXactId: 52146
Latest checkpoint's NextMultiOffset: 5591183

You want test case where NextMultiOffset will be 1?

Can we remove MaybeExtendOffsetSlru() now? There are a bunch of other
comments and checks that talk about binary-upgraded values too that we
can hopefully clean up now.

Yes, technically we can. But this is kinda unrelated to the offsets and will make the patch set significantly complicated, thus more complicated to review and less likely to be committed. Again, I'm not opposing the idea, I'm not sure if it is worth to do it right now.

If we are to parse the member segments in detail in upgrade anyway, I'd
be tempted to make some further changes / optimizations:

- You could leave out all locking XID members in upgrade, because
they're not relevant after upgrade any more (all the XIDs will be
committed or aborted and have released the locks; we require prepared
transactions to be completed before upgrading too). It'd be enough to
include actual UPDATE/DELETE XIDs.

- The way we determine the length of the members array by looking at the
next multixid's offset is a bit complicated. We could have one extra
flag per XID in the members to indicate "this is the last member of this
multixid". That could either to replace the current mechanism of looking
at the next offset, or be just an additional cross-check.

- Do we still like the "group" representation, with 4 bytes of flags
followed by 4 XIDs? I wonder if it'd be better to just store 5 bytes per
XID unaligned.

Not really. But I would leave it for next iteration - switching multi to 64 bit. I already have some drafts for this. In any case, we'll must do adjustments in pg_upgrade again. My goal is to move towards 64 XIDs, but with the small steps, and I plan changes in "group" representation in combination with switching multi to 64 bit. This seems a bit more appropriate in my view.

As for your optimization suggestions, I like them. I don’t against them, but I’m afraid to disrupt the clarity of thought, especially since the algorithm is not the simplest.

Best regards,

Maxim Orlov.

Re: POC: make mxidoff 64 bits

From

Heikki Linnakangas

Date:

02 January, 01:12:36

On 27/12/2024 19:09, Maxim Orlov wrote:
> 
> On Wed, 18 Dec 2024 at 13:21, Heikki Linnakangas <hlinnaka@iki.fi 
> <mailto:hlinnaka@iki.fi>> wrote:
>     Does the pg_upgrade code work though, if you have that buggy situation
>     where oldestOffsetKnown == false ?

...

>      >
>      >               if (!TransactionIdIsValid(*xactptr))
>      >               {
>      >                       /* Corner case 3: we must be looking at
>     unused slot zero */
>      >                       Assert(offset == 0);
>      >                       continue;
>      >               }
> 
>     After upgrade, this corner case 3 would *not* happen on offset == 0. So
>     looks like we're still missing test coverage for this upgrade corner
>     case.
> 
> Am I understanding correctly that you want to have a test corresponding 
> to the buggy 9.3 and 9.4 era versions?

No, those were two different things. I think there might be two things 
wrong here:

1. I suspect pg_upgrade might not correctly handle the situation where 
oldestOffsetKnown==false, and

2. The above assertion in "corner case 3" would not hold. It seems that 
we don't have a test case for it, or it would've hit the assertion.

Now that I think about it, yes, a test case for 1. would be good too. 
But I was talking about 2.

> Do you think we could imitate this scenario on a current master branch 
> like that:
> 1) generate a couple of offsets segments for the first table;
> 2) generate more segments for a second table;
> 3) drop first table;
> 4) stop pg cluster;
> 5) remove pg_multixact/offsets/0000
> 6) upgrade?

I don't remember off the top of my head.

It might be best to just refuse the upgrade if oldestOffsetKnown==false. 
It's a very ancient corner case. It seems reasonable to require you to 
upgrade to a newer minor version and run VACUUM before upgrading. IIRC 
that sets oldestOffsetKnown.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: POC: make mxidoff 64 bits

From

Maxim Orlov

Date:

07 January, 13:23:59

On Thu, 2 Jan 2025 at 01:12, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

It might be best to just refuse the upgrade if oldestOffsetKnown==false.
It's a very ancient corner case. It seems reasonable to require you to
upgrade to a newer minor version and run VACUUM before upgrading. IIRC
that sets oldestOffsetKnown.

I agree. After all, we do already have a ready-made solution in the form of a vacuum, do we?

If I understand all this multixact_old.c machinery correctly, in case of oldestOffsetKnown==false

we should fail with "could not open file" or offset will be 0 in GetOldMultiXactIdSingleMember.

So, I suppose we can put an analogue of SimpleLruDoesPhysicalPageExist call in the beginning

of GetOldMultiXactIdSingleMember. And if either SimpleLruDoesPhysicalPageExist return false

or a corresponding offset will be 0 we have to bail out with "oldest offset does not exist, consider

running vacuum before pg_upgrdade" or smth. Please, correct me if I'm wrong.

Best regards,

Maxim Orlov.

Re: POC: make mxidoff 64 bits

From

Maxim Orlov

Date:

07 March, 14:30:17

Here is a rebase, v14.

Best regards,

Maxim Orlov.

Attachment

Re: POC: make mxidoff 64 bits

From

wenhui qiu

Date:

27 March, 05:03:22

HI Maxim Orlov Heikki Linnakangas

Thank you for working on it,A few more days is a code freeze.It seems like things have been quiet for a while, but I believe implementing xid64 is absolutely necessary. For instance, there’s often concern about performance jitter caused by frequent freezes. If xid64 is implemented, the freeze process can be smoother.

Thanks

On Fri, Mar 7, 2025 at 7:30 PM Maxim Orlov <orlovmg@gmail.com> wrote:

Here is a rebase, v14.

--
Best regards,
Maxim Orlov.

Re: POC: make mxidoff 64 bits

From

Heikki Linnakangas

Date:

01 April, 21:25:16

On 07/03/2025 13:30, Maxim Orlov wrote:
> Here is a rebase, v14.

Thanks! I did some manual testing of this. I created a little helper 
function to consume multixids, to test the autovacuum behavior, and 
found one issue:

If you consume a lot of multixid members space, by creating lots of 
multixids with huge number of members in each, you can end up with a 
very bloated members SLRU, and autovacuum is in no hurry to clean it up. 
Here's what I did:

1. Installed attached test module
2. Ran "select consume_multixids(10000, 100000);" many times
3. ran:

$ du -h data/pg_multixact/members/
26G    data/pg_multixact/members/

When I run "vacuum freeze; select * from pg_database;", I can see that 
'datminmxid' for the current database is advanced. However, autovacuum 
is in no hurry to vacuum 'template0' and 'template1', so 
pg_multixact/members/ does not get truncated. Eventually, when 
autovacuum_multixact_freeze_max_age is reached, it presumably will, but 
you will run out of disk space before that.

There is this check for members size at the end of SetOffsetVacuumLimit():

> 
>     /*
>      * Do we need autovacuum?    If we're not sure, assume yes.
>      */
>     return !oldestOffsetKnown ||
>         (nextOffset - oldestOffset > MULTIXACT_MEMBER_AUTOVAC_THRESHOLD);

And the caller (SetMultiXactIdLimit()) will in fact signal the 
autovacuum launcher after "vacuum freeze" because of that. But 
autovacuum launcher will look at the datminmxid / relminmxid values, see 
that they are well within autovacuum_multixact_freeze_max_age, and do 
nothing.

This is a very extreme case, but clearly the code to signal autovacuum 
launcher, and the freeze age cutoff that autovacuum then uses, are not 
in sync.

This patch removed MultiXactMemberFreezeThreshold(), per my suggestion, 
but we threw this baby with the bathwater. We discussed that in this 
thread, but didn't come up with any solution. But ISTM we still need 
something like MultiXactMemberFreezeThreshold() to trigger autovacuum 
freezing if the members have grown too large.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: POC: make mxidoff 64 bits

From

Heikki Linnakangas

Date:

01 April, 21:26:27

On 01/04/2025 21:25, Heikki Linnakangas wrote:
> On 07/03/2025 13:30, Maxim Orlov wrote:
>> Here is a rebase, v14.
> 
> Thanks! I did some manual testing of this. I created a little helper 
> function to consume multixids, to test the autovacuum behavior, and 
> found one issue:

Forgot to attach the test function I used, here it is.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

0001-TEST-add-consume_multixids-function.patch

Re: POC: make mxidoff 64 bits

From

wenhui qiu

Date:

14 April, 12:04:40

HI Heikki

Pls Kindly help to create task in https://commitfest.postgresql.org/53/ ,I can not found this path in

Commitfest 2025-07

Thanks

On Wed, Apr 2, 2025 at 2:26 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 01/04/2025 21:25, Heikki Linnakangas wrote:
> On 07/03/2025 13:30, Maxim Orlov wrote:
>> Here is a rebase, v14.
>
> Thanks! I did some manual testing of this. I created a little helper
> function to consume multixids, to test the autovacuum behavior, and
> found one issue:

Forgot to attach the test function I used, here it is.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: POC: make mxidoff 64 bits

From

Maxim Orlov

Date:

16 April, 11:11:46

I moved the topic to the next commitfest.

--

Best regards,

Maxim Orlov.

Re: POC: make mxidoff 64 bits

From

Ashutosh Bapat

Date:

29 April, 15:01:09

Hi Maxim,

On Wed, Apr 16, 2025 at 1:42 PM Maxim Orlov <orlovmg@gmail.com> wrote:

I moved the topic to the next commitfest.

I am reviewing these patches.

I notice that transam/README does not mention multixact except one place in section "Transaction Emulation during Recovery". I expected it to document how pg_multixact/members and pg_multixact/offset are used and what is their layout. It's not the responsibility of this patchset to document it, but it will be good if we add a section about multixacts in transam/README. It will make reviews easier.

Best Wishes,

Ashutosh Bapat