Re: CLUSTER on partitioned index - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: CLUSTER on partitioned index
Date
Msg-id CAEze2WgtmBT5pS3agVWK8F6LJ_a-eVa60+4Kk0KFdwXcch_m7Q@mail.gmail.com
Whole thread Raw
In response to CLUSTER on partitioned index  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Sun, 12 Sept 2021 at 22:10, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:
> > I have to wonder if there really *is* a use case for CLUSTER in the
> > first place on regular tables, let alone on partitioned tables, which
> > are likely to be large and thus take a lot of time.
>
> The cluster now is done one partition at a time, so it might take a long time,
> but doesn't lock the whole partition heirarchy.  Same as VACUUM (since v10) and
> (since v14) REINDEX.

Note: The following review is based on the assumption that this v11
revision was meant to contain only one patch. I put this up as a note,
because it seemed quite limited when compared to earlier versions of
the patchset.

I noticed that you store the result of find_all_inheritors(...,
NoLock) in get_tables_to_cluster_partitioned, without taking care of
potential concurrent partition hierarchy changes that the comment on
find_all_inheritors warns against or documenting why it is safe, which
sounds dangerous in case someone wants to adapt the code. One problem
I can think of is that only storing reloid and indoid is not
necessarily safe, as they might be reused by drop+create table running
parallel to the CLUSTER command.

Apart from that, I think it would be useful (though not strictly
necessary for this patch) if you could adapt the current CLUSTER
progress reporting to report the progress for the whole partition
hierarchy, instead of a new progress report for each relation, as was
the case in earlier versions of the patchset.

The v11 patch seems quite incomplete when compared to previous
discussions, or at the very least is very limited (no ALTER TABLE
clustering commands for partitioned tables, but `CLUSTER ptable USING
pindex` is supported). If v11 is the new proposed direction for ptable
clustering, could you also document these limitations in the
cluster.sgml and alter_table.sgml docs?

> [ v11-0001-Implement-CLUSTER-of-partitioned-table.patch ]

> diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
> ...
> +ALTER TABLE clstrpart SET WITHOUT CLUSTER;
> +ERROR:  cannot mark index clustered in partitioned table

This error message does not seem to match my expectation as a user: I
am not trying to mark an index as clustered, and for a normal table
"SET WITHOUT CLUSTER" does not fail for unclustered tables. I think
that behaviour of normal unclustered tables should be shared here as
well. At the very least, the error message should be changed.

>  ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
>  ERROR:  cannot mark index clustered in partitioned table

A "HINT: use the CLUSTER command to cluster partitioned tables" (or
equivalent) should be added if we decide to keep the clustering APIs
of ALTER TABLE disabled for partitioned tables, as CLUSTER is now
implemented for partitioned tables.

> -DROP TABLE clstrpart;

I believe that this cleanup should not be fully removed, but moved to
before '-- Test CLUSTER with external tuplesorting', as the table is
not used after that line.

Kind regards,

Matthias van de Meent



pgsql-hackers by date:

Previous
From: Arne Roland
Date:
Subject: missing indexes in indexlist with partitioned tables
Next
From: Jeremy Schneider
Date:
Subject: Re: relation OID in ReorderBufferToastReplace error message