Re: 回复:how to create index concurrently on partitioned table - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: 回复:how to create index concurrently on partitioned table
Date
Msg-id 20201006024255.GJ1464@paquier.xyz
Whole thread Raw
In response to Re: 回复:how to create index concurrently on partitioned table  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: 回复:how to create index concurrently on partitioned table
List pgsql-hackers
On Mon, Oct 05, 2020 at 03:08:32PM -0500, Justin Pryzby wrote:
> It means that we might do N catalog updates for a partition heirarchy that's N
> levels deep.  Normally, N=2, and we'd clear indisclustered for the index as
> well as its parent.  This is not essential, though.

Hmm.  I got to think more about this one, and being able to ensure a
consistent state of indisclustered for partitioned tables and all
their partitions across multiple transactions at all times would be a
nice property, as we could easily track down if an operation has
failed in-flight.  The problem here is that we are limited by
indisclustered being a boolean, so we may set indisclustered one way
or another on some partitions, or on some parent partitioned tables,
but we are not sure if what's down is actually clustered for the
leaves of a partitioned index, or not.  Or maybe we even have an
inconsistent state, so this does not provide a good user experience.
The consistency of a partition tree is a key thing, and we have such
guarantees with REINDEX (with or without concurrently), so what I'd
like to think that what makes sense for indisclustered on a
partitioned index is to have the following set of guarantees, and
relying on indisvalid as being true iff an index partition tree is
complete on a given table partition tree is really important:
- If indisclustered is set to true for a partitioned index, then we
have the guarantee that all its partition and partitioned indexes have
indisclustered set to true, across all the layers down to the leaves.
- If indisclustered is set to false for a partitioned index, then we
have the guarantee that all its partition and partitioned indexes have
indisclustered set to false.

If we happen to attach a new partition to a partitioned table of such
a tree, I guess that it is then logic to have indisclustered set to
the same value as the partitioned index when the partition inherits an
index.  So, it seems to me that with the current catalogs we are
limited by indisclustered as being a boolean to maintain some kind of
consistency across transactions of CLUSTER, as we would use one
transaction per leaf for the work.  In order to make things better
here, we could switch indisclustered to a char, able to use three
states:
- 'e' or enabled, equivalent to indisclustered == true.  There should
be only one index on a table with 'e' set at a given time.
- 'd' or disabled, equivalent to indisclustered == false.
- a new third state, for an equivalent of work-in-progress, let's call
it 'w', or whatever.

Then, when we begin a CLUSTER on a partitioned table, I would imagine
the following:
- Switch all the indexes selected to 'w' in a first transaction, and
do not reset the state of other indexes if there is one 'e'.  For
CLUSTER without USING, we switch the existing 'e' to 'w' if there is
such an index.  If there are no indexes select-able, issue an error.
If we find an index with 'w', we have a candidate from a previous
failed command, so this gets used.  I don't think that this design
requires a new DDL either as we could reset all 'w' and 'e' to 'd' if
using ALTER TABLE SET WITHOUT CLUSTER on a partitioned table.  For
CLUSTER with USING, the partitioned index selected and its leaves are
switched to 'w', similarly.
- Then do the work for all the partitions, one partition per
transaction, keeping 'w'.
- In a last transaction, switch all the partitions from 'w' to 'e',
resetting any existing 'e'.

ALTER TABLE CLUSTER ON should also be a supported operation, where 'e'
gets switched for all the partitions in a single transaction.  Of
course, this should not work for an invalid index.  Even with such a
design, I got to wonder if there could be cases where a user does
*not* want to cluster the leaves of a partition tree with the same
index.  If that were to happen, the partition tree may need a redesign
so making things work so as we force partitions to inherit what's
wanted for the partitioned table makes the most sense to me.

By the way, I mentioned that previously, but this thread got used for
REINDEX that is finished, and now we have a discussion going on with
CLUSTER.  There is also stuff related to CIC and DIC.  There is also
only one CF entry for all that.  I really think that this work had
better be split into separate threads, with separate CF entries. Do
you mind if I change the contents of the CF app so as the existing
item is renamed for REINDEX, that got committed?  Then we could create
a new entry for CIC/DIC (it may make sense to split these two as
well, but I am not if there are any overlaps between the APIs of the
two), and a new entry for CLUSTER, depending on the state of the work.
The original subject of the thread is also unrelated, this makes the
review process unnecessarily complicated, and that's really
confusing.  Each discussion should go into its own, independent,
thread.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: James Coleman
Date:
Subject: Re: enable_incremental_sort changes query behavior
Next
From: Justin Pryzby
Date:
Subject: Re: 回复:how to create index concurrently on partitioned table