Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943 - Mailing list pgsql-bugs

From Alvaro Herrera
Subject Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943
Date
Msg-id 202405201616.y4ht2qe5ihoy@alvherre.pgsql
Whole thread Raw
In response to Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943
List pgsql-bugs
On 2024-May-13, Alvaro Herrera wrote:

> Evidently there's something I'm missing in how this plancache
> invalidation works.

Actually, what I was missing is that I had forgotten how
PartitionDirectory is actually used.  In the original code I wrote for
DETACH CONCURRENTLY for 2ndQPostgres a few years ago, I set things up so
that both planner and executor used the same PartitionDesc for a
partitioned table.  But when this was reworked for real Postgres by
Robert to implement concurrent ATTACH, he instead introduced
PartitionDirectory which keeps a PartitionDesc unchanged -- but planner
and executor use different PartitionDirectories.  I probably reworked
some of DETACH CONCURRENTLY to cope with that, but evidently must have
missed this scenario.

So the problem is that we read the partition descriptor during
re-planning with N partitions, and then if we receive the corresponding
invalidation message (at executor time) exactly before
CreatePartitionPruneState() creates its PartitionDirectory, then we read
N-1 partitions, and everything explodes.  (If the invalidation is
received at some other time, then the plan is remade before
ExecInitAppend by plancache, and everything works correctly.)

So one fix for this problem seems to be to patch
CreatePartitionPruneState() to accept the possibility that one partition
has been detached concurrently, and construct the planmap/partmap arrays
with a marking that the partition in question has been pruned away.


With that fix in place, I was now getting errors from
RelationBuildPartitionDesc() that some partition in the partdesc had
NULL relpartbound.  If I understand correctly, the problem here is that
we have a snapshot that still sees that partition as being in the
hierarchy, but in reality its relpartbound has been nullified by the
concurrent detach.  I can fix this symptom by doing
AcceptInvalidationMesages() and starting back at the top, where the list
of partitions is obtained by find_inheritance_children_extended().

With these two things in place , the pgbench scripts run without errors.
That's in patch 0001, which deserves a commit message that distills the
above.


I'm not convinced that this fix is correct or complete, but hey, it's
progress.  Here's some things I'm worried about:

1. this code assumes that a single partition can be in this "detach
concurrently" mode.  This sounds correct, because this operation needs
share update exclusive lock on the table, and we only allow one
partition to be in pending-detach state.  However, what if we have a
very slow process doing the query-side mode, and two processes manage to
detach two different tables in between?  I think this isn't possible
because of the wait-for-snapshots phase, but I need to convince myself
that this is the right explanation.

2. The new code in CreatePartitionPruneState assumes that if nparts
decreases, then it must be a detach, and if nparts increases, it must be
an attach.  Can these two things happen together in a way that we see
that the number of partitions remains the same, so we don't actually try
to construct planmap/partmap arrays by matching their OIDs?  I think the
only way to handle a possible problem here would be to verify the OIDs
every time we construct a partition descriptor.  I assume (without
checking) this would come with a performance cost, not sure.

3. The new stanza in CreatePartitionPruneState() should have more
sanity-checks that the two arrays we scan really match.

4. I made RelationBuildPartitionDesc retry only once.  I think this
should be enough, because a single AcceptInvalidationMessages should
process everything that we need to read to bring us up to current
reality, considering that DETACH CONCURRENTLY would need to wait for our
snapshot.

I would like to spend some more time on this, but I'm afraid I'll have
to put it off for a little while.

PS -- yes, the self-contradictory comment in CreatePartitionPruneState()
is bogus.  Patch 0002 fixes that.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Someone said that it is at least an order of magnitude more work to do
production software than a prototype. I think he is wrong by at least
an order of magnitude."                              (Brian Kernighan)

Attachment

pgsql-bugs by date:

Previous
From: hubert depesz lubaczewski
Date:
Subject: UNION removes almost all rows (not duplicates) - in fresh build of pg17!
Next
From: Tom Lane
Date:
Subject: Re: UNION removes almost all rows (not duplicates) - in fresh build of pg17!