Re: 回复:how to create index concurrently on partitioned table - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: 回复:how to create index concurrently on partitioned table
Date
Msg-id 20200901055158.GH3511@paquier.xyz
Whole thread Raw
In response to Re: 回复:how to create index concurrently on partitioned table  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: 回复:how to create index concurrently on partitioned table  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Sun, Aug 09, 2020 at 06:44:23PM -0500, Justin Pryzby wrote:
> Thanks for looking.

The REINDEX patch is progressing its way, so I have looked a bit at
the part for CIC.

Visibly, the case of multiple partition layers is not handled
correctly.  Here is a sequence that gets broken:
CREATE TABLE parent_tab (id int) PARTITION BY RANGE (id);
CREATE TABLE child_0_10 PARTITION OF parent_tab
     FOR VALUES FROM (0) TO (10);
CREATE TABLE child_10_20 PARTITION OF parent_tab
     FOR VALUES FROM (10) TO (20);
CREATE TABLE child_20_30 PARTITION OF parent_tab
     FOR VALUES FROM (20) TO (30);
INSERT INTO parent_tab VALUES (generate_series(0,29));
CREATE TABLE child_30_40 PARTITION OF parent_tab
     FOR VALUES FROM (30) TO (40)
     PARTITION BY RANGE(id);
CREATE TABLE child_30_35 PARTITION OF child_30_40
     FOR VALUES FROM (30) TO (35);
CREATE TABLE child_35_40 PARTITION OF child_30_40
     FOR VALUES FROM (35) TO (40);
INSERT INTO parent_tab VALUES (generate_series(30,39));
CREATE INDEX CONCURRENTLY parent_index ON parent_tab (id);

This fails as follows:
ERROR: XX000: unrecognized node type: 2139062143
LOCATION:  copyObjectImpl, copyfuncs.c:5718

And the problem visibly comes from some handling with the second level
of partitions, child_30_40 in my example above.  Even with that, this
outlines a rather severe problem in the patch: index_set_state_flags()
flips indisvalid on/off using a non-transactional update (see the use
of heap_inplace_update), meaning that if we fail in the middle of the
operation we may finish with a partition index tree where some of the
indexes are valid and some of them are invalid.  In order to make this
logic consistent, I am afraid that we will need to do two things:
- Change index_set_state_flags() so as it uses a transactional
update.  That's something I would like to change for other reasons,
like making sure that the REPLICA IDENTITY of a parent relation is
correctly reset when dropping a replica index.
- Make all the indexes of the partition tree valid in the *same*
sub-transaction.
You can note that this case is different than a concurrent REINDEX,
because in this case we just do an in-place change between the old and
new index, meaning that even if there is a failure happening while
processing, we may have some invalid indexes, but there are still
valid indexes attached to the partition tree, at any time.

+     MemoryContext   oldcontext = MemoryContextSwitchTo(ind_context);
      PartitionDesc partdesc = RelationGetPartitionDesc(rel);
      int         nparts = partdesc->nparts;
+     char        *relname = pstrdup(RelationGetRelationName(rel));
Er, no.  We should not play with the relation cache calls in a private
memory context.  I think that this needs much more thoughts.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Remove line length restriction in passwordFromFile()
Next
From: Michael Paquier
Date:
Subject: Re: Use T_IntList for uint32