Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943 - Mailing list pgsql-bugs
From | Alvaro Herrera |
---|---|
Subject | Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943 |
Date | |
Msg-id | 202405201616.y4ht2qe5ihoy@alvherre.pgsql Whole thread Raw |
In response to | Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943 (Alvaro Herrera <alvherre@alvh.no-ip.org>) |
Responses |
Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943
|
List | pgsql-bugs |
On 2024-May-13, Alvaro Herrera wrote: > Evidently there's something I'm missing in how this plancache > invalidation works. Actually, what I was missing is that I had forgotten how PartitionDirectory is actually used. In the original code I wrote for DETACH CONCURRENTLY for 2ndQPostgres a few years ago, I set things up so that both planner and executor used the same PartitionDesc for a partitioned table. But when this was reworked for real Postgres by Robert to implement concurrent ATTACH, he instead introduced PartitionDirectory which keeps a PartitionDesc unchanged -- but planner and executor use different PartitionDirectories. I probably reworked some of DETACH CONCURRENTLY to cope with that, but evidently must have missed this scenario. So the problem is that we read the partition descriptor during re-planning with N partitions, and then if we receive the corresponding invalidation message (at executor time) exactly before CreatePartitionPruneState() creates its PartitionDirectory, then we read N-1 partitions, and everything explodes. (If the invalidation is received at some other time, then the plan is remade before ExecInitAppend by plancache, and everything works correctly.) So one fix for this problem seems to be to patch CreatePartitionPruneState() to accept the possibility that one partition has been detached concurrently, and construct the planmap/partmap arrays with a marking that the partition in question has been pruned away. With that fix in place, I was now getting errors from RelationBuildPartitionDesc() that some partition in the partdesc had NULL relpartbound. If I understand correctly, the problem here is that we have a snapshot that still sees that partition as being in the hierarchy, but in reality its relpartbound has been nullified by the concurrent detach. I can fix this symptom by doing AcceptInvalidationMesages() and starting back at the top, where the list of partitions is obtained by find_inheritance_children_extended(). With these two things in place , the pgbench scripts run without errors. That's in patch 0001, which deserves a commit message that distills the above. I'm not convinced that this fix is correct or complete, but hey, it's progress. Here's some things I'm worried about: 1. this code assumes that a single partition can be in this "detach concurrently" mode. This sounds correct, because this operation needs share update exclusive lock on the table, and we only allow one partition to be in pending-detach state. However, what if we have a very slow process doing the query-side mode, and two processes manage to detach two different tables in between? I think this isn't possible because of the wait-for-snapshots phase, but I need to convince myself that this is the right explanation. 2. The new code in CreatePartitionPruneState assumes that if nparts decreases, then it must be a detach, and if nparts increases, it must be an attach. Can these two things happen together in a way that we see that the number of partitions remains the same, so we don't actually try to construct planmap/partmap arrays by matching their OIDs? I think the only way to handle a possible problem here would be to verify the OIDs every time we construct a partition descriptor. I assume (without checking) this would come with a performance cost, not sure. 3. The new stanza in CreatePartitionPruneState() should have more sanity-checks that the two arrays we scan really match. 4. I made RelationBuildPartitionDesc retry only once. I think this should be enough, because a single AcceptInvalidationMessages should process everything that we need to read to bring us up to current reality, considering that DETACH CONCURRENTLY would need to wait for our snapshot. I would like to spend some more time on this, but I'm afraid I'll have to put it off for a little while. PS -- yes, the self-contradictory comment in CreatePartitionPruneState() is bogus. Patch 0002 fixes that. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Someone said that it is at least an order of magnitude more work to do production software than a prototype. I think he is wrong by at least an order of magnitude." (Brian Kernighan)
Attachment
pgsql-bugs by date: