Re: Add 64-bit XIDs into PostgreSQL 15 - Mailing list pgsql-hackers

From Pavel Borisov
Subject Re: Add 64-bit XIDs into PostgreSQL 15
Date
Msg-id CALT9ZEEsj54k9+xmcSAYLe7YsEHYbzFUuDwpVzkjmR6CCPHpww@mail.gmail.com
Whole thread Raw
In response to Re: Add 64-bit XIDs into PostgreSQL 15  (Andres Freund <andres@anarazel.de>)
Responses Re: Add 64-bit XIDs into PostgreSQL 15  (Aleksander Alekseev <aleksander@timescale.com>)
List pgsql-hackers
Hi, Andres!

I've revised the README a little bit to address your corrections and questions. Thanks for this very much!
A patchset with changed README is attached as v8 here (the code is unchanged and identical to v7).
 
> +The downside of this is that we can not use tuple's XMIN and XMAX right away.
> +We often need to re-read t_xmin and t_xmax - which could actually be pointers
> +into a page in shared buffers and therefore they could be updated by any other
> +backend.

Ugh, that's not great.
Agree. This part is one of the candidates for revision as per proposals above [1] i.e :
"2A. Probably refactor it to store precalculated XMIN/XMAX in memory
tuple representation instead of t_xid_base/t_multi_base". 

We are working on this change.
 
What happens if the first access happens on a replica?

What is the approach for dealing with multixact files? They have xids
embedded?  And currently the SLRUs will break if you just let the offsets SLRU
grow without bounds.

Wait. So you just modify the page without WAL logging or marking it dirty on a
standby? I fail to see how that can be correct.

Imagine the cluster is promoted, the page is dirtied, and we write it
out. You'll have written out a completely changed page, without any WAL
logging. There's plenty other scenarios.
In this part, I suppose you've found a definite bug. Thanks! There are a couple 
of ways how it could be fixed:

1. If we enforce checkpoint at replica promotion then we force full-page writes after each page modification afterward.

2. Maybe it's worth using BufferDesc bit to mark the page as converted to 64xid but not yet written to disk? For example, one of four bits from BUF_USAGECOUNT.
BM_MAX_USAGE_COUNT  = 5 so it will be enough 3 bits to store it. This will change in-memory page representation but will not need WAL-logging which is impossible on a replica. 

What do you think about it? 

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Ensure that STDERR is empty during connect_ok
Next
From: Robert Haas
Date:
Subject: Re: Server-side base backup: why superuser, not pg_write_server_files?