Re: Add 64-bit XIDs into PostgreSQL 15 - Mailing list pgsql-hackers

From Chris Travers
Subject Re: Add 64-bit XIDs into PostgreSQL 15
Date
Msg-id CAEq-hvsZmA1fBgHscsWgbtz4Gop_TTVh73eQt9zh8ZiJoZuphQ@mail.gmail.com
Whole thread Raw
In response to Re: Add 64-bit XIDs into PostgreSQL 15  (Aleksander Alekseev <aleksander@timescale.com>)
Responses Re: Add 64-bit XIDs into PostgreSQL 15
Re: Add 64-bit XIDs into PostgreSQL 15
List pgsql-hackers
Hi;

Trying to discuss where we are talking past eachother.....

On Fri, Nov 25, 2022 at 9:38 AM Aleksander Alekseev <aleksander@timescale.com> wrote:
Hi hackers,

> I'm wondering whether the safest way to handle this is by creating a
> new TAM called "heap64", so that all storage changes happens there.

> Many current users see stability as one of the greatest strengths of
> Postgres, so while I very much support this move, I wonder if this
> gives us a way to have both stability and innovation at the same time?

That would be nice.

However from what I see TransactionId is a type used globally in
PostgresSQL. It is part of structures used by TAM interface, used in
WAL records, etc. So we will have to learn these components to work
with 64-bit XIDs anyway and then start thinking about cases like: when
a user runs a transaction affecting two tables, a heap32 one and
heap64 one and we will have to figure out which tuples are visible and
which are not. This perhaps is doable but the maintenance burden for
the project will be too high IMO.

It seems to me that the best option we can offer for the users looking
for stability is to use the latest PostgreSQL version with 32-bit
XIDs. Assuming these users care that much about this particular design
choice of course.

I didn't see any changes to pg_upgrade to make this change possible on upgrade.  Is that also outside of the scope of your patch set?  If so how is that continuity supposed to be ensured?

Also related to that, I think you would have to have a check on streaming replication that both instances use the same xid format (that you don't accidently upgrade this somehow), since this is set per db cluster, right? 

> The whole project seems to just ignore basic, pertinent questions.
> Questions like: why are we falling behind like this in the first
> place? And: If we don't catch up soon, why should we be able to catch
> up later on? Falling behind on freezing is still a huge problem with
> 64-bit XIDs.

Is the example I provided above wrong?

"""
Consider the case when you run a slow OLAP query that takes 12h to
complete and 100K TPS of fast OLTP-type queries on the same system.
The fast queries will consume all 32-bit XIDs in less than 12 hours,
while the OLAP query started 12 hours ago didn't finish yet and thus
its tuples can't be frozen.
"""

If it is, please let me know. I would very much like to know if my
understanding here is flawed.

So, you have described a scenario we cannot support today (because xids would be exhausted within 5.5 hours at that transactional rate).  Additionally as PostgreSQL becomes more capable, this sort of scale will increasingly be within reach and that is an important point in favor of this effort.

This being said, there is another set of xid wraparound cases which today is much larger in number that I think would be hurt if this patchset were to be accepted into Postgres without mitigating measures which you consider out of bounds -- the cases like Mailchimp, Adjust, and the like.  This is why I keep stressing this, and I don't think waiving away concerns about use cases outside of the one you are focusing on is helpful, particularly from those of us who have faced xid wraparounds in these cases in the past.  In these cases, database teams are usually faced with an operational emergency while tools like vacuum, pg_repack, etc are severely degraded due to getting so far behind on freezing.  The deeper the hole, the harder it will be to dig out of.

Every large-scale high-throughput database I have ever worked on had long-running query alerts precisely because of the impact on vacuum and the downstream performance impacts.   I would love to get to a point where this wasn't necessary and maybe in a few specific workloads we might be there very soon.  The effort you are engaging in here is an important part of the path to get there, but let's not forget the people who today are facing xid wraparounds due to vacuum problems and what this sort of set of changes will mean for them.

--
Best regards,
Aleksander Alekseev


pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Avoid streaming the transaction which are skipped (in corner cases)
Next
From: Ian Lawrence Barwick
Date:
Subject: Re: [PATCH] Allow specification of custom slot for custom nodes