Re: Proposal: Global Index for PostgreSQL - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Proposal: Global Index for PostgreSQL
Date
Msg-id CAFiTN-s4vpW5o3EHoBzoffDEQa=jR48NAfdVyqQxjrvrpYviuw@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: Global Index for PostgreSQL  (Nikita Malakhov <hukutoc@gmail.com>)
Responses Re: Proposal: Global Index for PostgreSQL
List pgsql-hackers
On Mon, Jun 9, 2025 at 2:03 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi Dilip!

Thanks Nikita for your response and reading my proposal.

> Global Indexes is a very interesting functionality that has both significant advantages
> and drawbacks, and the community seems not ready to accept it without very strong
> motivation.

I understand that this is a hard problem and needs changes in many
critical modules. I don't think there should be a problem with the
motivation of this work, but I believe the main issue lies in the
project's complexity.

> There was a more recent approach to Global index problem [1], please check it out.

I've reviewed the proposal, and I understand it aims to address
uniqueness on partitioned tables for non-partition key columns.
However, I'm concerned about the basic design principles' scalability.
I believe it won't scale effectively beyond a relatively small number
of partitions, and this limitation will be quite surprising to users.
Specifically, checking uniqueness during inserts/updates across all
indexes on each partition (since there's no global index) will become
a significant bottleneck.

> I've read you proposal and have several questions:

> 1) New catalog table with global index partitions would immediately affect interaction
> with user tables with global indexes because of corresponding locks that should be
> taken for [in]validation and attach/detach operations, this should be investigated;

Yeah that's right, but logically the attach/detach are DDL and are not
most frequent operations, so are you worried about performance due to
locking?

> 2) Changing relation OIDs (by, say, vacuum full) would immediately result in index
> inconsistency, what do you suppose to do with internal processes that could change
> relation OIDs? Also this question

I want to clarify that we don't store relation OIDs directly in the
global index. Instead, the global index holds partition IDs, and the
mapping from partition ID to relation OID is managed in a new catalog
table, pg_index_partition. It's important to note that VACUUM FULL
operations only alter the relfilenumber (the disk file OID), not the
relation OID itself, which remains constant for the lifetime of the
relation.

What does change during a VACUUM FULL are the TIDs (tuple IDs).
Because of this, all indexes, including global indexes, are reindexed.
As I mentioned in my proposal, there's a significant opportunity for
optimization here. Reindexing large global indexes is a costly
operation, and I've proposed some ideas to improve this process.

> 3) Would single sort space be enough for a more typical case when we have
> hundreds of partitions with hundreds of millions records in each? It is a normal
> production case for partitioned tables.

In general, users ideally wouldn't use a global index everywhere. It
really comes down to their specific use case – they should only opt
for a global index when they can't effectively partition their data
without one. The idea is that the amount of data in the sort space
should essentially be the same as if the table wasn't partitioned at
all. That's a good point for consideration. I agree that global
indexes shouldn't be a default choice for every use case. They're most
beneficial when a user's data access patterns inherently prevent
effective partitioning without them. In such scenarios, the amount of
data in the sort space would ideally remain comparable to an
unpartitioned table.

> 4) Update-heavy partitioned tables that should run vacuum frequently. Significant
> vacuum slowdown would result in going beyond SLAs without corresponding
> significant improvements.
>
You've got it. I'm on board with prioritizing a VACUUM optimization
solution for partitioned tables with global indexes. My initial
proposal touched on a proof-of-concept experiment, which indicated no
significant performance hit with global index after the optimization.
I'll share the detailed VACUUM optimization proposal in this thread
within the next couple of days.

--
Regards,
Dilip Kumar
Google



pgsql-hackers by date:

Previous
From: "Chiranmoy.Bhattacharya@fujitsu.com"
Date:
Subject: Re: [PATCH] Hex-coding optimizations using SVE on ARM.
Next
From: vignesh C
Date:
Subject: Re: Enhance pg_createsubscriber to create required standby.