CLUSTER on partitioned index - Mailing list pgsql-hackers

From Justin Pryzby
Subject CLUSTER on partitioned index
Date
Msg-id 20201028003312.GU9241@telsasoft.com
Whole thread Raw
List pgsql-hackers
Forking this thread, since the existing CFs have been closed.
https://www.postgresql.org/message-id/flat/20200914143102.GX18552%40telsasoft.com#58b1056488451f8594b0f0ba40996afd

On Tue, Oct 06, 2020 at 01:38:23PM +0900, Michael Paquier wrote:
> On Mon, Oct 05, 2020 at 10:07:33PM -0500, Justin Pryzby wrote:
> > Honestly, I think you're over-thinking and over-engineering indisclustered.
> > 
> > If "clusteredness" was something we offered to maintain across DML, I think
> > that might be important to provide stronger guarantees.  As it is now, I don't
> > think this patch is worth changing the catalog definition.
> 
> Well, this use case is new because we are discussing the relationship
> of indisclustered across multiple transactions for multiple indexes,
> so I'd rather have this discussion than not, and I have learnt
> the hard way with REINDEX that we should care a lot about the
> consistency of partition trees at any step of the operation.

indisclustered is only used as a default for "CLUSTER" (without USING).  The
worst thing that can happen if it's "inconsistent" is that "CLUSTER;" clusters
a table on the "old" clustered index (that it was already clustered on), which
is what would've happened before running some command which was interrupted.

> Let's
> imagine a simple example here, take this partition tree: p (parent),
> and two partitions p1 and p2.  p has two partitioned indexes i and j,
> indexes also present in p1 and p2 as i1, i2, j1 and j2.  Let's assume
> that the user has done a CLUSTER on p USING i that completes, meaning
> that i, i1 and i2 have indisclustered set.  Now let's assume that the
> user does a CLUSTER on p USING j this time, and that this command
> fails while processing p2, meaning that indisclustered is set for j1,
> i2, and perhaps i or j depending on what the patch does.

I think the state of "indisclustered" at that point is not critical.
The command failed, and the user can re-run it, or ALTER..SET CLUSTER.
Actually, I think the only inconsistent state is if two indexes are both marked
indisclustered.

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

Also, I noticed that CREATE TABLE (LIKE.. INCLUDING INDEXES) doesn't preserve
indisclustered, but I can't say that's an issue.

-- 
Justin

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Patch to fix FK-related selectivity estimates with constants
Next
From: Justin Pryzby
Date:
Subject: DROP INDEX CONCURRENTLY on partitioned index