Re: [PATCH] Fix: Partitioned parent index remains invalid after child indexes are repaired - Mailing list pgsql-hackers

From Haibo Yan
Subject Re: [PATCH] Fix: Partitioned parent index remains invalid after child indexes are repaired
Date
Msg-id CABXr29Epj=23eYAWg_mTdiXYQQzJSVN22uk6tK8ei0ts5wXvSQ@mail.gmail.com
Whole thread
In response to Re: [PATCH] Fix: Partitioned parent index remains invalid after child indexes are repaired  (Haibo Yan <tristan.yim@gmail.com>)
Responses Re: [PATCH] Fix: Partitioned parent index remains invalid after child indexes are repaired
List pgsql-hackers
On Fri, Apr 10, 2026 at 5:55 PM Haibo Yan <tristan.yim@gmail.com> wrote:
On Wed, Apr 8, 2026 at 11:17 PM Mohamed ALi <moali.pg@gmail.com> wrote:
Hi hackers,

A partitioned (parent) index in PostgreSQL can become permanently
stuck with `indisvalid = false` even after all of its child partition
indexes have been repaired and are valid. There is no built-in
mechanism to re-validate the parent index after a child is fixed via
`REINDEX`. This affects all currently supported PostgreSQL versions
(13 through 18)
The root cause is that `validatePartitionedIndex()` — the only
function that can mark a partitioned index as valid is never called
after `REINDEX` operations, and is skipped when re-running `ALTER
INDEX ATTACH PARTITION` on an already-attached index.

How the Bug Manifests

Typical Scenario :
1. A partitioned table has multiple partitions.
2. The user creates indexes on partitions concurrently. One fails (due
to deadlock, cancellation, timeout, etc.), leaving an invalid
partition index.
3. A parent index is created (or the invalid index is attached to an
existing parent). The parent is correctly marked `indisvalid = false`
because at least one child is invalid.
4. The user fixes the broken child index with `REINDEX INDEX CONCURRENTLY`.
5. The child index becomes valid (`indisvalid = true`).
6. The parent index remains `indisvalid = false` permanently. No SQL
command can fix it.

Reproduction steps:

```sql
-- ============================================================
-- SETUP: Partitioned table with two partitions and sample data
-- ============================================================
DROP TABLE IF EXISTS orders;
CREATE TABLE orders (
    id serial,
    order_date date NOT NULL,
    amount numeric
) PARTITION BY RANGE (order_date);
CREATE TABLE orders_2023 PARTITION OF orders
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
INSERT INTO orders (order_date, amount)
SELECT d, random() * 1000
FROM generate_series('2023-01-01'::date, '2023-12-31'::date, '1 day') d;
INSERT INTO orders (order_date, amount)
SELECT d, random() * 1000
FROM generate_series('2024-01-01'::date, '2024-12-31'::date, '1 day') d;
-- ============================================================
-- STEP 1: Create parent index with ONLY (starts as invalid)
-- ============================================================
CREATE INDEX orders_amount_idx ON ONLY orders (amount);
-- Verify: parent index is invalid (no children attached yet)
SELECT c.relname, i.indisvalid
FROM pg_class c
JOIN pg_index i ON c.oid = i.indexrelid
WHERE c.relname LIKE 'orders%idx%'
ORDER BY c.relname;
-- Expected:
--  orders_amount_idx | f
-- ============================================================
-- STEP 2: Create valid index on first partition
-- ============================================================
CREATE INDEX CONCURRENTLY orders_2023_amount_idx ON orders_2023 (amount);
-- ============================================================
-- STEP 3: Create an INVALID index on second partition
-- ============================================================
-- In a separate session, hold a lock:
BEGIN; LOCK TABLE orders_2024 IN SHARE MODE;
-- Then in the main session:
SET statement_timeout = '1ms';
CREATE INDEX CONCURRENTLY orders_2024_amount_idx ON orders_2024 (amount);
RESET statement_timeout;
-- it will fail/timeout, leaving an invalid index.
-- Verify state:
SELECT c.relname, i.indisvalid
FROM pg_class c
JOIN pg_index i ON c.oid = i.indexrelid
WHERE c.relname LIKE 'orders%idx%'
ORDER BY c.relname;
-- Expected:
--  orders_2023_amount_idx | t   (valid)
--  orders_2024_amount_idx | f   (invalid)
--  orders_amount_idx      | f   (invalid, created with ONLY)
-- ============================================================
-- STEP 4: Attach both partition indexes to the parent
-- ============================================================
-- Attach the invalid one first
ALTER INDEX orders_amount_idx ATTACH PARTITION orders_2024_amount_idx;
-- Succeeds. Parent stays invalid (correct — child is invalid).
-- Attach the valid one
ALTER INDEX orders_amount_idx ATTACH PARTITION orders_2023_amount_idx;
-- Succeeds. Parent still invalid (correct — one child still invalid).
-- Verify attachment and validity:
SELECT c.relname, i.indisvalid,
       pg_get_indexdef(i.indexrelid) AS indexdef
FROM pg_class c
JOIN pg_index i ON c.oid = i.indexrelid
WHERE c.relname LIKE 'orders%amount%'
ORDER BY c.relname;
-- Expected:
--  orders_2023_amount_idx | t
--  orders_2024_amount_idx | f
--  orders_amount_idx      | f
-- ============================================================
-- STEP 5: Fix the invalid child index via REINDEX
-- ============================================================
REINDEX INDEX CONCURRENTLY orders_2024_amount_idx;
-- Verify: child is now valid
SELECT c.relname, i.indisvalid
FROM pg_class c
JOIN pg_index i ON c.oid = i.indexrelid
WHERE c.relname LIKE 'orders%amount%'
ORDER BY c.relname;
-- ACTUAL (buggy) result:
--  orders_2023_amount_idx | t   (valid)
--  orders_2024_amount_idx | t   (valid — fixed by REINDEX)
--  orders_amount_idx      | f   (STILL INVALID — this is the bug!)
--
-- EXPECTED result (if bug were fixed):
--  orders_2023_amount_idx | t
--  orders_2024_amount_idx | t
--  orders_amount_idx      | t   (should be valid now)
-- ============================================================
-- STEP 6: Demonstrate that re-running ATTACH does not help
-- ============================================================
ALTER INDEX orders_amount_idx ATTACH PARTITION orders_2024_amount_idx;
-- Returns "ALTER INDEX" (succeeds silently, does nothing)
SELECT c.relname, i.indisvalid
FROM pg_class c
JOIN pg_index i ON c.oid = i.indexrelid
WHERE c.relname LIKE 'orders%amount%'
ORDER BY c.relname;
-- Parent is STILL invalid. The "silently do nothing" path
-- skips validatePartitionedIndex() entirely.
-- ============================================================
-- CLEANUP
-- ============================================================
DROP TABLE orders;
```


Root Cause Analysis:

Where `validatePartitionedIndex()` Is Called

The function is called in exactly these code paths:
1. During `ALTER INDEX ... ATTACH PARTITION` — inside
`ATExecAttachPartitionIdx()`
2. During `ALTER TABLE ... ATTACH PARTITION` — via
`AttachPartitionEnsureIndexes()`
3. During `CREATE INDEX` on partitioned tables — via `DefineIndex()`
It is NOT called:
- After `REINDEX` of a partitioned index
- During any maintenance operation
- As any periodic validation check

Bug 1: REINDEX Does Not Validate Parent


When `reindex_index()` in `src/backend/catalog/index.c` marks a
partition index as valid (setting `indisvalid = true`), it does not
check whether the parent partitioned index should also become valid.
The function simply updates the child's `pg_index` entry and returns.

Bug 2: Re-running ATTACH Skips Validation


In `ATExecAttachPartitionIdx()` (tablecmds.c, around line 21923 in PG
16 / line ~22900 in HEAD):
https://github.com/postgres/postgres/blob/master/src/backend/commands/tablecmds.c#L21923

```c
/* Silently do nothing if already in the right state */
currParent = partIdx->rd_rel->relispartition ?
    get_partition_parent(partIdxId, false) : InvalidOid;
if (currParent != RelationGetRelid(parentIdx))
{
    // ... all validation checks and attachment logic ...
    validatePartitionedIndex(parentIdx, parentTbl);  // ONLY called here
}
// If already attached, entire block is skipped — no validation!
```

When the child is already attached (`currParent == parentIdx`), the
condition is false, the entire if-block is skipped, and
`validatePartitionedIndex()` is never called. The comment "Silently do
nothing if already in the right state" is misleading  "already
attached" does not mean "parent validity is correct."

Proposed Fixes:

Fix 1 : Always Validate Parent Index in ALTER INDEX ATTACH PARTITION

Patch File : 0001-Always-validate-parent-index-in-ALTER-INDEX-ATTACH.patch

Move the validatePartitionedIndex() call outside the if-block so it runs
unconditionally — both when a new attachment is made and when the partition is
already attached. This provides a user-accessible recovery path: after fixing a
child index with REINDEX, re-running ALTER INDEX ATTACH PARTITION triggers
parent validation.

When the partition is already attached, a NOTICE is emitted:

NOTICE:  partition index "child_idx" is already attached to
"parent_idx", validating parent index


This follows PostgreSQL's existing convention of using NOTICE for
informational messages about no-op or reduced-scope operations (e.g.,
DROP TABLE IF EXISTS, CREATE INDEX IF NOT EXISTS). It tells the user:

1- Nothing went wrong
2- The index was already attached (so they know the state)
3-  Validation still happened (so they know the fix path works)


Fix 2: Validate Parent Partitioned Index After REINDEX of Child

Patch File : 0001-Validate-parent-partitioned-index-after-REINDEX.patch

Same underlying bug but this patch addresses it from the
REINDEX side. When a partition index is repaired via REINDEX or
REINDEX CONCURRENTLY, the parent partitioned index remains permanently
stuck with indisvalid = false even though all children are now valid.

This is because validatePartitionedIndex() — the only function that can
mark a partitioned index as valid is never called from any REINDEX code
path.


validatePartitionedIndex() is only called during:

1- ALTER INDEX ... ATTACH PARTITION (tablecmds.c)
2- ALTER TABLE ... ATTACH PARTITION (tablecmds.c)
3- CREATE INDEX on partitioned tables (indexcmds.c)

It is NOT called after:

1- REINDEX INDEX (regular) — handled by reindex_index() in index.c
2- REINDEX INDEX CONCURRENTLY — handled by ReindexRelationConcurrently()

in indexcmds.c, which uses index_concurrently_swap() in index.c

Three changes are made:

1. Make validatePartitionedIndex() public
The function was static in tablecmds.c. It is now exported via
tablecmds.h so it can be called from index.c and indexcmds.c.

Files changed:

src/backend/commands/tablecmds.c — remove static, update comment
src/include/commands/tablecmds.h — add extern declaration

2. Call from reindex_index() (regular REINDEX)
After reindex_index() marks a partition index as valid (indisvalid = true),
check if the index is a partition (iRel->rd_rel->relispartition) and if so,
look up the parent and call validatePartitionedIndex().

A CommandCounterIncrement() is required before the call so that the child's
updated indisvalid is visible to the syscache lookup that
validatePartitionedIndex() performs internally.

File changed: src/backend/catalog/index.c

3. Call from ReindexRelationConcurrently() (REINDEX CONCURRENTLY)
REINDEX CONCURRENTLY uses a completely different code path: it creates a new
index, builds it concurrently, then swaps it with the old one via
index_concurrently_swap(). The new index inherits the old index's partition
status during the swap.

After the swap and the existing CommandCounterIncrement() (which makes the
swap visible), check if the new index is a partition and call
validatePartitionedIndex() on the parent.

File changed: src/backend/commands/indexcmds.c

Multi-level Hierarchy Support
validatePartitionedIndex() already handles multi-level partition hierarchies.
When it marks a mid-level parent valid, it checks if that parent is itself a
partition and recursively validates the grandparent. No additional recursion
logic is needed in the REINDEX patches.


Thanks,
Mohamed Ali
Senior DBE
AWS RDS

Hi, Mohamed

Thanks for working on this. I went through the problem statement and the two proposed fixes. I agree that the underlying issue looks real: after repairing an invalid child index with REINDEX, the parent partitioned index can remain stuck in an invalid state because validatePartitionedIndex() is not reached from the relevant REINDEX paths. The analysis around ATExecAttachPartitionIdx() also looks correct: when the child index is already attached, the current code takes the no-op path and skips the validation call entirely.

Overall, I think this is worth fixing, but I do not view the two patches equally.

I think patch 2 is the real fix. The state transition that matters here is that a child index goes from invalid to valid, and the natural place to trigger parent revalidation is where that transition actually happens, namely in REINDEX. By contrast, patch 1 feels more like a secondary hardening/workaround path: it makes ALTER INDEX ... ATTACH PARTITION retry parent validation even in the already-attached case, but that is not really where the underlying state change happens.

My main comments are below.

1. Patch 2 should be treated as the primary fix

This seems like the correct place to repair the catalog state. If REINDEX repairs a partition index that was previously invalid, and that index is attached to a partitioned parent index, then rechecking the parent with validatePartitionedIndex() is a reasonable and direct solution.

I also think it is good that the patch covers both the regular REINDEX path and the REINDEX CONCURRENTLY path. Those are distinct enough that both need explicit attention, and the extra CommandCounterIncrement() before revalidation also seems necessary.

So at a high level, this patch makes sense to me.

2. I am less convinced by patch 1 in its current form

The main issue here is not correctness, but design and placement.

Once the child is already attached, ALTER INDEX ... ATTACH PARTITION is conceptually supposed to be a no-op. With this patch, it becomes a generic “retry parent validation” hook. That means users can run an attach command that does not change attachment state at all, yet still triggers a full parent validation attempt.

That is especially questionable if there are still other invalid child indexes elsewhere under the same parent. In that case, each already-attached ATTACH command will do the validation work again, but it is already known in advance that the parent still cannot become valid. So this is not “reinvalidating” anything, but it is repeated checking with no state change, which feels misplaced on a no-op path.

Because of that, I do not think patch 1 should be the main bug fix. At most, I would see it as an optional hardening patch.

3. The NOTICE added by patch 1 does not seem like a good fit

The existing code explicitly says “Silently do nothing if already in the right state”. Changing that into a NOTICE every time we hit the already-attached case seems noisy to me.

If the community decides that the validation call should stay in this path, I would still suggest dropping the NOTICE. The command is syntactically an ATTACH command, not a repair command, and emitting a message for an otherwise no-op case does not feel very PostgreSQL-like.

4. Please double-check coverage of all REINDEX entry points

I agree with the direction in patch 2, but I think reviewers will want confidence that all relevant REINDEX flows are covered consistently.

For example, it would be good to confirm that the fix behaves correctly not just for REINDEX INDEX, but also for the broader forms that eventually reach the same logic, such as REINDEX TABLE, and that there is no alternate path where a repaired child index can still leave the parent stale.

5. Multi-level partition trees need explicit testing

One especially important point here is recursion. If a repaired child index causes its immediate parent partitioned index to become valid, and that parent is itself a child of another partitioned index, we need to be sure that validity propagates all the way up as intended.

The current reasoning suggests that validatePartitionedIndex() already handles that, but this is important enough that it should be demonstrated with a regression test, not just assumed.

Minor comments:

  • In patch 1, I would avoid turning the already-attached path into a behavioral special case unless there is a strong reason to keep it. It would be cleaner if the fix relied primarily on the actual state-changing paths.

  • If patch 1 remains, I would remove the NOTICE.

  • The commit message for patch 2 should clearly explain why REINDEX is the right place to do this, rather than making ATTACH PARTITION the repair mechanism.

  • It may also help to mention explicitly whether the extra validation call is only intended for indexes that were previously invalid and have just been repaired, since that is an important part of why the patch is reasonably narrow.

I do not think the current patches are complete without regression coverage. At minimum, I think the following tests should be added:

  1. Regular REINDEX case

    Create a partitioned table with a parent index left invalid initially, then attach child indexes such that one child index is invalid. Repair that child with plain REINDEX INDEX, and verify that the child becomes valid and the parent also becomes valid.

  2. REINDEX CONCURRENTLY case

    Same setup, but use REINDEX INDEX CONCURRENTLY. This should be tested separately because the code path is different.

  3. Multi-level partition hierarchy

    Use at least a three-level hierarchy and verify that repairing a leaf-level invalid child index can cause validity to propagate upward through intermediate partitioned indexes.

  4. Negative case

    Repair one child index while another child index under the same parent remains invalid, and verify that the parent remains invalid.

  5. Already-attached ATTACH PARTITION case

    Only if patch 1 is kept: exercise ALTER INDEX ... ATTACH PARTITION on a child index that is already attached, and verify both behaviors:

  • parent becomes valid if all children are now valid

  • parent stays invalid if some other child is still invalid

My overall view is:

  • The bug itself looks real.
  • The diagnosis looks sound.
  • Patch 2 is the right direction and should be the primary fix.
  • Patch 1 is much less compelling as written, and I would not take it as the main solution.
  • The series needs regression tests before it is ready.

Regards,

Haibo 

Hi, Mohamed
I took a look at this patch and added some regression coverage for it.
While doing that, I found that the current concurrent-path fix is not quite complete. The new tests exposed crashes in existing REINDEX CONCURRENTLY regression coverage, and the root cause appears to be that in the concurrent path, validatePartitionedIndex() can eventually reach catalog update code at a point where there is no active/registered snapshot. That leads to the HaveRegisteredOrActiveSnapshot() assertion failure.
So the overall bug diagnosis still looks correct to me, and the plain REINDEX direction also looks right, but the REINDEX CONCURRENTLY path needs an additional fix around the call site/context.
Based on that, I prepared a v2 on top of your patch. It includes:
  • a fix for the concurrent-path snapshot issue
  • narrower handling for the concurrent path so it only does parent revalidation for the actual repair case
  • regression tests covering:
  • plain REINDEX INDEX
  • a negative case where another invalid sibling keeps the parent invalid
  • multi-level partition hierarchies
  • REINDEX TABLE
If my understanding above looks right to you, I think this v2 is a better base for further review.
Thanks,
Haibo
Attachment

pgsql-hackers by date:

Previous
From: Tender Wang
Date:
Subject: Re: pg17: XX000: no relation entry for relid 0
Next
From: Isaac Morland
Date:
Subject: Possible mismatch between behaviour and documentation in CREATE FUNCTION