Re: POC: make mxidoff 64 bits - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: POC: make mxidoff 64 bits
Date
Msg-id 3624730d-6dae-42bf-9458-76c4c965fb27@iki.fi
Whole thread Raw
In response to Re: POC: make mxidoff 64 bits  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: POC: make mxidoff 64 bits
List pgsql-hackers
On 08/12/2025 17:43, Ashutosh Bapat wrote:
> On Sat, Dec 6, 2025 at 5:06 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> On 05/12/2025 15:42, Ashutosh Bapat wrote:
>>> And it makes the test faster by about a second or two on my laptop.
>>> Something on those lines or other is required to reduce the output
>>> from query_safe().
>>
>> Nice! That log bloat was the reason I bundled together the "COMMIT;
>> BEGIN; SELECT ...;" steps into one statement in the loop. Your solution
>> addresses it more directly.
> 
> Now we can call query_safe() separately on each of those. That will be
> more readable and marginally less code.

Done.

>> I'm not entirely happy with the "Old" prefix here, because as you
>> pointed out, we might end up needing "older" or "oldold" in the future.
>> I couldn't come up with anything better though. "PreV19MultiXactOffset"
>> is quite a mouthful.
> 
> How about MultiXactOffset32?

Ooh, I like that. It doesn't sound as nice for the other "old" prefixed 
things though. So I changed OldMultiXactOffset to MultiXactOffset32, but 
kept OldMultiXactReader, GetOldMultiXactIdSingleMember() et al. We can 
live with that for now, and rename in the future if we'd need "oldold".

Committed with that and some other minor cleanups. Thanks everyone! This 
patch has been brewing for a while :-).


There are some noncritical followups that I'd like to address, now that 
we know that in v19 the pg_multixact files will be rewritten. That gives 
us an opportunity to clean up some backwards-compatibility stuff. The 
committed patch already cleaned up a bunch, but there's some more we 
could do:

1. Currently, at multixid wraparound, MultiXactState->nextMXact goes to 
0, which is invalid. All the readers must be prepared for that, and skip 
over the 0. That's error-prone, we've already missed that a few times. 
Let's change things so that the code that *writes* 
MultiXactState->nextMXact skips over the zero already.

2. We currently don't persist 'oldestOffset' in the control file the 
same way as 'oldestMultiXactId'. Instead, we look up the offset of the 
oldestMultiXactId at startup, and keep that value in memory. Originally 
that was because we missed the need for that and had to add the offset 
wraparound protections in a minor release without changing the control 
file format. But we could easily do it now.

With 64-bit offsets, it's actually less critical to persist the 
oldestOffset. Previously, if we failed to look up the oldest offset 
because the oldest multixid was invalid, it could lead to serious 
trouble if the offsets then wrapped around and old offsets were 
overwritten, but that won't happen anymore. Nevertheless, it leads to 
unnecessarily aggressive vacuuming and some messages in the log.

At first I thought that the failure to look up the oldest offset should 
no longer happen, because we don't need to support reading old 9.3 era 
SLRUs anymore that were created before we added the offset wraparound 
protection. But it's not so: it's still possible to have multixids with 
invalid offsets in the 'offsets' SLRU on a crash. Such multixids won't 
be referenced from anywhere in the heap, but I think they could later 
become the oldest multixid, and we would fail to look up its offset. 
Persisting the oldest offset doesn't fully fix that problem, because 
advancing the oldest offset is done by looking up the oldest multixid's 
offset anyway.

3. I think we should turn some of the assertions in 
GetMultiXactIdMembers() into ereports(ERROR) calls. I included those 
changes in my patch version 29 [1], as a separate patch, but I didn't 
commit that yet.

4. Compressing the offsets, per discussion. It doesn't really seem worth 
to me and I don't intend to work on it, but if someone wants to do it, 
now would be the time, so that we don't need to have upgrade code to 
deal with yet another format.

- Heikki




pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Improve pg_sync_replication_slots() to wait for primary to advance
Next
From: Chao Li
Date:
Subject: Re: Fix LOCK_TIMEOUT handling in slotsync worker