Re: Proposal: Global Index for PostgreSQL - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Proposal: Global Index for PostgreSQL |
Date | |
Msg-id | CAFiTN-s4vpW5o3EHoBzoffDEQa=jR48NAfdVyqQxjrvrpYviuw@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal: Global Index for PostgreSQL (Nikita Malakhov <hukutoc@gmail.com>) |
Responses |
Re: Proposal: Global Index for PostgreSQL
|
List | pgsql-hackers |
On Mon, Jun 9, 2025 at 2:03 PM Nikita Malakhov <hukutoc@gmail.com> wrote: > > Hi Dilip! Thanks Nikita for your response and reading my proposal. > Global Indexes is a very interesting functionality that has both significant advantages > and drawbacks, and the community seems not ready to accept it without very strong > motivation. I understand that this is a hard problem and needs changes in many critical modules. I don't think there should be a problem with the motivation of this work, but I believe the main issue lies in the project's complexity. > There was a more recent approach to Global index problem [1], please check it out. I've reviewed the proposal, and I understand it aims to address uniqueness on partitioned tables for non-partition key columns. However, I'm concerned about the basic design principles' scalability. I believe it won't scale effectively beyond a relatively small number of partitions, and this limitation will be quite surprising to users. Specifically, checking uniqueness during inserts/updates across all indexes on each partition (since there's no global index) will become a significant bottleneck. > I've read you proposal and have several questions: > 1) New catalog table with global index partitions would immediately affect interaction > with user tables with global indexes because of corresponding locks that should be > taken for [in]validation and attach/detach operations, this should be investigated; Yeah that's right, but logically the attach/detach are DDL and are not most frequent operations, so are you worried about performance due to locking? > 2) Changing relation OIDs (by, say, vacuum full) would immediately result in index > inconsistency, what do you suppose to do with internal processes that could change > relation OIDs? Also this question I want to clarify that we don't store relation OIDs directly in the global index. Instead, the global index holds partition IDs, and the mapping from partition ID to relation OID is managed in a new catalog table, pg_index_partition. It's important to note that VACUUM FULL operations only alter the relfilenumber (the disk file OID), not the relation OID itself, which remains constant for the lifetime of the relation. What does change during a VACUUM FULL are the TIDs (tuple IDs). Because of this, all indexes, including global indexes, are reindexed. As I mentioned in my proposal, there's a significant opportunity for optimization here. Reindexing large global indexes is a costly operation, and I've proposed some ideas to improve this process. > 3) Would single sort space be enough for a more typical case when we have > hundreds of partitions with hundreds of millions records in each? It is a normal > production case for partitioned tables. In general, users ideally wouldn't use a global index everywhere. It really comes down to their specific use case – they should only opt for a global index when they can't effectively partition their data without one. The idea is that the amount of data in the sort space should essentially be the same as if the table wasn't partitioned at all. That's a good point for consideration. I agree that global indexes shouldn't be a default choice for every use case. They're most beneficial when a user's data access patterns inherently prevent effective partitioning without them. In such scenarios, the amount of data in the sort space would ideally remain comparable to an unpartitioned table. > 4) Update-heavy partitioned tables that should run vacuum frequently. Significant > vacuum slowdown would result in going beyond SLAs without corresponding > significant improvements. > You've got it. I'm on board with prioritizing a VACUUM optimization solution for partitioned tables with global indexes. My initial proposal touched on a proof-of-concept experiment, which indicated no significant performance hit with global index after the optimization. I'll share the detailed VACUUM optimization proposal in this thread within the next couple of days. -- Regards, Dilip Kumar Google
pgsql-hackers by date: