Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943 - Mailing list pgsql-bugs

From Alvaro Herrera
Subject Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943
Date
Msg-id 202406051539.h3o6qgkri5ij@alvherre.pgsql
Whole thread Raw
In response to Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943  (Tender Wang <tndrwang@gmail.com>)
List pgsql-bugs
On 2024-May-22, Tender Wang wrote:

> I have tested this patch locally and did not encounter any failures.
> I will take time to look the patch in detail and consider the issues you
> mentioned.

Thank you.  In the meantime,

> Alvaro Herrera <alvherre@alvh.no-ip.org> 于2024年5月21日周二 00:16写道:

> > 2. The new code in CreatePartitionPruneState assumes that if nparts
> > decreases, then it must be a detach, and if nparts increases, it must be
> > an attach.  Can these two things happen together in a way that we see
> > that the number of partitions remains the same, so we don't actually try
> > to construct planmap/partmap arrays by matching their OIDs?  I think the
> > only way to handle a possible problem here would be to verify the OIDs
> > every time we construct a partition descriptor.  I assume (without
> > checking) this would come with a performance cost, not sure.

I modified the scripts slightly so that two partitions would be
detached, and lo and behold -- the case where we have one new partition
appearing and one partition disappearing concurrently can indeed happen.
So we have that both nparts are identical, but the OID arrays don't
match.  I attach the scripts I used to test.

I think in order to fix this we would have to compare the OID arrays
each time through CreatePartitionPruneState, so that we can mark as
"pruned" (value -1) any partition that's not on either of the partdescs.
Having to compare the whole arrays each and every time might not be
great, but I don't see any backpatchable alternative at the moment.
Going forward, we could avoid the hit by having something like a
generation counter for the partitioned table (which is incremented for
each attach and detach), but of course that's not backpatchable.


PS: the pg_advisory_unlock() calls are necessary, because otherwise the
session that first succeeds the try_lock function retains the lock for
the whole duration of the pgbench script, so the other sessions always
skip the "\if :gotlock" block.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"I'm impressed how quickly you are fixing this obscure issue. I came from 
MS SQL and it would be hard for me to put into words how much of a better job
you all are doing on [PostgreSQL]."
 Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg00000.php

Attachment

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #18495: invalid type mapping for timestamptz from call of: getMetaData and then geColumns on PgConnection.
Next
From: Erik Wienhold
Date:
Subject: Re: BUG #18495: invalid type mapping for timestamptz from call of: getMetaData and then geColumns on PgConnection.