Thread: reindex partitioned indexes: refactor ReindexRelationConcurrently ?

reindex partitioned indexes: refactor ReindexRelationConcurrently ?

From
Justin Pryzby
Date:
This is a new thread about possible refactoring and commit a6642b3ae
("Add support for partitioned tables and indexes in REINDEX")

Currently, ReindexPartitions() calls ReindexMultipleInternal() which loops
around ReindexRelationConcurrently(), similar to what's done with
ReindexMultipleTables()

Contrast that with ReindexTable(), which calls ReindexRelationConcurrently()
with the table's OID, which then handles all indexes at once:

|postgres=# REINDEX TABLE CONCURRENTLY onek;
|DEBUG:  00000: building index "onek_unique1_ccnew" on table "onek" serially
|LOCATION:  index_build, index.c:2853
|DEBUG:  00000: index "onek_unique1_ccnew" can safely use deduplication
|LOCATION:  _bt_allequalimage, nbtutils.c:2742
|DEBUG:  00000: building index "onek_unique2_ccnew" on table "onek" serially
|LOCATION:  index_build, index.c:2853
|DEBUG:  00000: index "onek_unique2_ccnew" can safely use deduplication
|LOCATION:  _bt_allequalimage, nbtutils.c:2742
|DEBUG:  00000: building index "onek_hundred_ccnew" on table "onek" serially
|LOCATION:  index_build, index.c:2853
|DEBUG:  00000: index "onek_hundred_ccnew" can safely use deduplication
|LOCATION:  _bt_allequalimage, nbtutils.c:2742
|DEBUG:  00000: building index "onek_stringu1_ccnew" on table "onek" serially
|LOCATION:  index_build, index.c:2853
|DEBUG:  00000: index "onek_stringu1_ccnew" can safely use deduplication
|LOCATION:  _bt_allequalimage, nbtutils.c:2742
|DEBUG:  00000: validate_index found 1000 heap tuples, 1000 index tuples; inserted 0 missing tuples
|LOCATION:  validate_index, index.c:3281
|DEBUG:  00000: validate_index found 1000 heap tuples, 1000 index tuples; inserted 0 missing tuples
|LOCATION:  validate_index, index.c:3281
|DEBUG:  00000: validate_index found 1000 heap tuples, 1000 index tuples; inserted 0 missing tuples
|LOCATION:  validate_index, index.c:3281
|DEBUG:  00000: validate_index found 1000 heap tuples, 1000 index tuples; inserted 0 missing tuples
|LOCATION:  validate_index, index.c:3281
|REINDEX

Should we refactor ReindexRelationConcurrently() into two functions ?  One to
build a list of indexes on a relation, and one to concurrently reindex an
list of indexes.  Then, ReindexPartitions() would make a list of leaf indexes
of a partitioned index, or leaf indexes of partitioned indexes of a partitioned
table, and then reindex those indexes all at once.  For CIC, we could call
that from DefineIndex().

The reason is that concurrent Reindex must wait for longrunning transactions,
and if we call it in a loop, then we wait for longrunning transactions N times.
I can imagine scenarios where it's easy for an DBA to schedule maintenance to
do reindex concurrently and restart processes to allow the reindex to proceed.
But it might be infeasible to restart processes every 5min for 3 hours to allow
reindex to proceed on each partition.

ReindexMultipleTables avoids doing that to avoid deadlocks, which makes great
sense for REINDEX SCHEMA/DATABASE/SYSTEM.  But I wonder if that reasoning
doesn't apply to partitioned tables.

We currently have this:

ReindexIndex()
    => ReindexPartitions
    => ReindexRelationConcurrently
    => reindex_index
ReindexTable()
    => ReindexPartitions
    => ReindexRelationConcurrently
    => reindex_relation
ReindexPartitions()
    => ReindexMultipleInternal()
        => ReindexRelationConcurrently()
        => reindex_relation()
        => reindex_index()

And I'm proposing to consider this:

ReindexIndex()
    => ReindexPartitions
    => ReindexIndexesConcurrently
    => reindex_index
ReindexTable()
    => ReindexPartitions
    => ReindexRelationConcurrently - this would be just a wrapper to collect the list of index Oids
    => reindex_relation
ReindexPartitions - this exists mainly to make a list of all leaf indexes on all childs of a partitioned table
    => ReindexIndexesConcurrently - this processes all indexes at once
    => ReindexMultipleInternal - this loops around everything
        => ReindexRelationConcurrently()
        => reindex_index()
        => reindex_relation()
ReindexRelationConcurrently
    => ReindexIndexesConcurrently - this is the worker function factored out of ReindexRelationConcurrently

I'm not sure if I'm missing any opportunities to simplify...

So then it's processed similar to REINDEX TABLE (rather than REINDEX DATABASE).

|postgres=# REINDEX TABLE CONCURRENTLY hash_parted;
|DEBUG:  building index "hpart0_a_idx_ccnew" on table "hpart0" serially
|DEBUG:  index "hpart0_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart1_a_idx_ccnew" on table "hpart1" serially
|DEBUG:  index "hpart1_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart2_a_idx_ccnew" on table "hpart2" serially
|DEBUG:  index "hpart2_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart3_a_idx_ccnew" on table "hpart3" serially
|DEBUG:  index "hpart3_a_idx_ccnew" can safely use deduplication
|DEBUG:  validate_index found 2489 heap tuples, 2489 index tuples; inserted 0 missing tuples
|DEBUG:  validate_index found 2527 heap tuples, 2527 index tuples; inserted 0 missing tuples
|DEBUG:  validate_index found 2530 heap tuples, 2530 index tuples; inserted 0 missing tuples
|DEBUG:  validate_index found 2453 heap tuples, 2453 index tuples; inserted 0 missing tuples
|REINDEX

And, if there are multiple indexes:

|postgres=# REINDEX TABLE CONCURRENTLY hash_parted;
|DEBUG:  building index "hpart0_a_idx_ccnew" on table "hpart0" serially
|DEBUG:  index "hpart0_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart0_a_idx1_ccnew" on table "hpart0" serially
|DEBUG:  index "hpart0_a_idx1_ccnew" can safely use deduplication
|DEBUG:  building index "hpart1_a_idx_ccnew" on table "hpart1" serially
|DEBUG:  index "hpart1_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart1_a_idx1_ccnew" on table "hpart1" serially
|DEBUG:  index "hpart1_a_idx1_ccnew" can safely use deduplication
|DEBUG:  building index "hpart2_a_idx_ccnew" on table "hpart2" serially
|DEBUG:  index "hpart2_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart2_a_idx1_ccnew" on table "hpart2" serially
|DEBUG:  index "hpart2_a_idx1_ccnew" can safely use deduplication
|DEBUG:  building index "hpart3_a_idx_ccnew" on table "hpart3" serially
|DEBUG:  index "hpart3_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart3_a_idx1_ccnew" on table "hpart3" serially
|DEBUG:  index "hpart3_a_idx1_ccnew" can safely use deduplication
|DEBUG:  validate_index found 2489 heap tuples, 2489 index tuples; inserted 0 missing tuples
|DEBUG:  validate_index found 2489 heap tuples, 2489 index tuples; inserted 0 missing tuples
|DEBUG:  validate_index found 2527 heap tuples, 2527 index tuples; inserted 0 missing tuples
|DEBUG:  validate_index found 2527 heap tuples, 2527 index tuples; inserted 0 missing tuples
|DEBUG:  validate_index found 2530 heap tuples, 2530 index tuples; inserted 0 missing tuples
|DEBUG:  validate_index found 2530 heap tuples, 2530 index tuples; inserted 0 missing tuples
|DEBUG:  validate_index found 2453 heap tuples, 2453 index tuples; inserted 0 missing tuples
|DEBUG:  validate_index found 2453 heap tuples, 2453 index tuples; inserted 0 missing tuples
|REINDEX

I think the usual scenario is to have 100-1000 partitions, and 1-10 indexes per
partition.  It seems to me that at least all partitions of a given index should
be processed simultaneously.

Also, it occured to me that CIC for regular tables could be simplied to do the
same thing that I'm doing for CIC on a partitioned table: create an INVALID
catalog entry, and then reindex it concurrently.  This seems to work, with the
only difference I see being that REINDEX leaves behind ccnew indexes.

Attached is what I'm thinking of.

Attachment

Re: reindex partitioned indexes: refactor ReindexRelationConcurrently ?

From
Michael Paquier
Date:
On Mon, Nov 02, 2020 at 01:00:06AM -0600, Justin Pryzby wrote:
> The reason is that concurrent Reindex must wait for longrunning transactions,
> and if we call it in a loop, then we wait for longrunning transactions N times.
> I can imagine scenarios where it's easy for an DBA to schedule maintenance to
> do reindex concurrently and restart processes to allow the reindex to proceed.
> But it might be infeasible to restart processes every 5min for 3 hours to allow
> reindex to proceed on each partition.
>
> ReindexMultipleTables avoids doing that to avoid deadlocks, which makes great
> sense for REINDEX SCHEMA/DATABASE/SYSTEM.  But I wonder if that reasoning
> doesn't apply to partitioned tables.
>
> I think the usual scenario is to have 100-1000 partitions, and 1-10 indexes per
> partition.  It seems to me that at least all partitions of a given index should
> be processed simultaneously.

ReindexPartitions(), as currently shaped, has the advantage to
minimize the number of ccnew and ccold indexes to handle in parallel.
With your suggestion, there could be potentially hundreds of
built-still-invalid indexes or invalid-but-not-dropped indexes
depending on the phase where the whole REINDEX operation fails, if it
fails of course.  So I would say no to your proposal and I would
prefer keeping the approach where we minimize the remnants of a failed
operation to a bare minimum (aka one index for REINDEX INDEX, and one
set of indexes on a single relation for REINDEX TABLE).
--
Michael

Attachment