Re: CLUSTER on partitioned index - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: CLUSTER on partitioned index |
Date | |
Msg-id | CAEze2WgtmBT5pS3agVWK8F6LJ_a-eVa60+4Kk0KFdwXcch_m7Q@mail.gmail.com Whole thread Raw |
In response to | CLUSTER on partitioned index (Justin Pryzby <pryzby@telsasoft.com>) |
List | pgsql-hackers |
On Sun, 12 Sept 2021 at 22:10, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote: > > I have to wonder if there really *is* a use case for CLUSTER in the > > first place on regular tables, let alone on partitioned tables, which > > are likely to be large and thus take a lot of time. > > The cluster now is done one partition at a time, so it might take a long time, > but doesn't lock the whole partition heirarchy. Same as VACUUM (since v10) and > (since v14) REINDEX. Note: The following review is based on the assumption that this v11 revision was meant to contain only one patch. I put this up as a note, because it seemed quite limited when compared to earlier versions of the patchset. I noticed that you store the result of find_all_inheritors(..., NoLock) in get_tables_to_cluster_partitioned, without taking care of potential concurrent partition hierarchy changes that the comment on find_all_inheritors warns against or documenting why it is safe, which sounds dangerous in case someone wants to adapt the code. One problem I can think of is that only storing reloid and indoid is not necessarily safe, as they might be reused by drop+create table running parallel to the CLUSTER command. Apart from that, I think it would be useful (though not strictly necessary for this patch) if you could adapt the current CLUSTER progress reporting to report the progress for the whole partition hierarchy, instead of a new progress report for each relation, as was the case in earlier versions of the patchset. The v11 patch seems quite incomplete when compared to previous discussions, or at the very least is very limited (no ALTER TABLE clustering commands for partitioned tables, but `CLUSTER ptable USING pindex` is supported). If v11 is the new proposed direction for ptable clustering, could you also document these limitations in the cluster.sgml and alter_table.sgml docs? > [ v11-0001-Implement-CLUSTER-of-partitioned-table.patch ] > diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out > ... > +ALTER TABLE clstrpart SET WITHOUT CLUSTER; > +ERROR: cannot mark index clustered in partitioned table This error message does not seem to match my expectation as a user: I am not trying to mark an index as clustered, and for a normal table "SET WITHOUT CLUSTER" does not fail for unclustered tables. I think that behaviour of normal unclustered tables should be shared here as well. At the very least, the error message should be changed. > ALTER TABLE clstrpart CLUSTER ON clstrpart_idx; > ERROR: cannot mark index clustered in partitioned table A "HINT: use the CLUSTER command to cluster partitioned tables" (or equivalent) should be added if we decide to keep the clustering APIs of ALTER TABLE disabled for partitioned tables, as CLUSTER is now implemented for partitioned tables. > -DROP TABLE clstrpart; I believe that this cleanup should not be fully removed, but moved to before '-- Test CLUSTER with external tuplesorting', as the table is not used after that line. Kind regards, Matthias van de Meent
pgsql-hackers by date: