Re: 回复:how to create index concurrently on partitioned table - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: 回复:how to create index concurrently on partitioned table |
Date | |
Msg-id | 20201006024255.GJ1464@paquier.xyz Whole thread Raw |
In response to | Re: 回复:how to create index concurrently on partitioned table (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: 回复:how to create index concurrently on partitioned table
|
List | pgsql-hackers |
On Mon, Oct 05, 2020 at 03:08:32PM -0500, Justin Pryzby wrote: > It means that we might do N catalog updates for a partition heirarchy that's N > levels deep. Normally, N=2, and we'd clear indisclustered for the index as > well as its parent. This is not essential, though. Hmm. I got to think more about this one, and being able to ensure a consistent state of indisclustered for partitioned tables and all their partitions across multiple transactions at all times would be a nice property, as we could easily track down if an operation has failed in-flight. The problem here is that we are limited by indisclustered being a boolean, so we may set indisclustered one way or another on some partitions, or on some parent partitioned tables, but we are not sure if what's down is actually clustered for the leaves of a partitioned index, or not. Or maybe we even have an inconsistent state, so this does not provide a good user experience. The consistency of a partition tree is a key thing, and we have such guarantees with REINDEX (with or without concurrently), so what I'd like to think that what makes sense for indisclustered on a partitioned index is to have the following set of guarantees, and relying on indisvalid as being true iff an index partition tree is complete on a given table partition tree is really important: - If indisclustered is set to true for a partitioned index, then we have the guarantee that all its partition and partitioned indexes have indisclustered set to true, across all the layers down to the leaves. - If indisclustered is set to false for a partitioned index, then we have the guarantee that all its partition and partitioned indexes have indisclustered set to false. If we happen to attach a new partition to a partitioned table of such a tree, I guess that it is then logic to have indisclustered set to the same value as the partitioned index when the partition inherits an index. So, it seems to me that with the current catalogs we are limited by indisclustered as being a boolean to maintain some kind of consistency across transactions of CLUSTER, as we would use one transaction per leaf for the work. In order to make things better here, we could switch indisclustered to a char, able to use three states: - 'e' or enabled, equivalent to indisclustered == true. There should be only one index on a table with 'e' set at a given time. - 'd' or disabled, equivalent to indisclustered == false. - a new third state, for an equivalent of work-in-progress, let's call it 'w', or whatever. Then, when we begin a CLUSTER on a partitioned table, I would imagine the following: - Switch all the indexes selected to 'w' in a first transaction, and do not reset the state of other indexes if there is one 'e'. For CLUSTER without USING, we switch the existing 'e' to 'w' if there is such an index. If there are no indexes select-able, issue an error. If we find an index with 'w', we have a candidate from a previous failed command, so this gets used. I don't think that this design requires a new DDL either as we could reset all 'w' and 'e' to 'd' if using ALTER TABLE SET WITHOUT CLUSTER on a partitioned table. For CLUSTER with USING, the partitioned index selected and its leaves are switched to 'w', similarly. - Then do the work for all the partitions, one partition per transaction, keeping 'w'. - In a last transaction, switch all the partitions from 'w' to 'e', resetting any existing 'e'. ALTER TABLE CLUSTER ON should also be a supported operation, where 'e' gets switched for all the partitions in a single transaction. Of course, this should not work for an invalid index. Even with such a design, I got to wonder if there could be cases where a user does *not* want to cluster the leaves of a partition tree with the same index. If that were to happen, the partition tree may need a redesign so making things work so as we force partitions to inherit what's wanted for the partitioned table makes the most sense to me. By the way, I mentioned that previously, but this thread got used for REINDEX that is finished, and now we have a discussion going on with CLUSTER. There is also stuff related to CIC and DIC. There is also only one CF entry for all that. I really think that this work had better be split into separate threads, with separate CF entries. Do you mind if I change the contents of the CF app so as the existing item is renamed for REINDEX, that got committed? Then we could create a new entry for CIC/DIC (it may make sense to split these two as well, but I am not if there are any overlaps between the APIs of the two), and a new entry for CLUSTER, depending on the state of the work. The original subject of the thread is also unrelated, this makes the review process unnecessarily complicated, and that's really confusing. Each discussion should go into its own, independent, thread. -- Michael
Attachment
pgsql-hackers by date: