BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943 - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943
Date
Msg-id 18377-e0324601cfebdfe5@postgresql.org
Whole thread Raw
Responses Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943  (David Rowley <dgrowleyml@gmail.com>)
Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18377
Logged by:          yajun Hu
Email address:      1026592243@qq.com
PostgreSQL version: 14.11
Operating system:   CentOS7 with kernel version 5.10
Description:

I have reproduced this problem in REL_14_11 and the latest master branch
(393b5599e5177e456cdce500039813629d370b38).
The steps to reproduce are as follows.
1. ./configure  --enable-debug --enable-depend --enable-cassert CFLAGS=-O0
2. make -j; make install -j; initdb -D ./primary; pg_ctl -D ../primary -l
logfile start
3. alter system set plan_cache_mode to 'force_generic_plan' ; select
pg_reload_conf();
4. create table p( a int,b int) partition by range(a);create table p1
partition of p for values from (0) to (1);create table p2 partition of p for
values from (1) to (2);
5. use pgbench with following SQL
1.sql
SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
\if :gotlock
        alter table p detach partition p1 concurrently ;
        alter table p attach partition p1 for values from (0) to (1);
\endif
2.sql
\set id random(0,1)
select * from p where a = :id
pgbench --no-vacuum --client=5 --transactions=1000000 -f 1.sql -f 2.sql -h
127.0.0.1 -M prepared -p 5432

I tested that coredump will appear within 10 seconds, and the stack is as
follows:
(gdb) bt
#0  0x00007effe61aa277 in raise () from /lib64/libc.so.6
#1  0x00007effe61ab968 in abort () from /lib64/libc.so.6
#2  0x0000000000b8748d in ExceptionalCondition (conditionName=0xd25358
"partdesc->nparts >= pinfo->nparts", fileName=0xd24cfc "execPartition.c",
lineNumber=1943) at assert.c:66
#3  0x0000000000748bf1 in CreatePartitionPruneState (planstate=0x1898ad0,
pruneinfo=0x1884188) at execPartition.c:1943
#4  0x00000000007488cb in ExecInitPartitionPruning (planstate=0x1898ad0,
n_total_subplans=2, pruneinfo=0x1884188,
initially_valid_subplans=0x7ffdca29f7d0) at execPartition.c:1803
#5  0x000000000076171d in ExecInitAppend (node=0x17cb5e0, estate=0x1898870,
eflags=32) at nodeAppend.c:146
#6  0x00000000007499af in ExecInitNode (node=0x17cb5e0, estate=0x1898870,
eflags=32) at execProcnode.c:182
#7  0x000000000073f514 in InitPlan (queryDesc=0x1880f58, eflags=32) at
execMain.c:969
#8  0x000000000073e3ea in standard_ExecutorStart (queryDesc=0x1880f58,
eflags=32) at execMain.c:267
#9  0x000000000073e15f in ExecutorStart (queryDesc=0x1880f58, eflags=0) at
execMain.c:146
#10 0x00000000009cab67 in PortalStart (portal=0x181b000, params=0x1880ec8,
eflags=0, snapshot=0x0) at pquery.c:517
#11 0x00000000009c5e49 in exec_bind_message (input_message=0x7ffdca29fd20)
at postgres.c:2028
#12 0x00000000009c940e in PostgresMain (dbname=0x17d8e88 "postgres",
username=0x17d8e68 "postgres") at postgres.c:4723
#13 0x00000000008f8024 in BackendRun (port=0x17cd640) at postmaster.c:4477
#14 0x00000000008f77c1 in BackendStartup (port=0x17cd640) at
postmaster.c:4153
#15 0x00000000008f4256 in ServerLoop () at postmaster.c:1771
#16 0x00000000008f3c40 in PostmasterMain (argc=3, argv=0x179b680) at
postmaster.c:1470
#17 0x00000000007be309 in main (argc=3, argv=0x179b680) at main.c:198
(gdb) f 3
#3  0x0000000000748bf1 in CreatePartitionPruneState (planstate=0x1898ad0,
pruneinfo=0x1884188) at execPartition.c:1943
1943                            Assert(partdesc->nparts >= pinfo->nparts);
(gdb) p partdesc->nparts
$1 = 1
(gdb) p pinfo->nparts
$2 = 2

I tried to analyze this problem and found the following:
1. ERROR "could not match partition child tables to plan elements" will be
thrown in release mode
2. When Assert is false, the number of partitions in the plan is 2, but
detach concurrently has been submitted at this time, resulting in only 1
partition viewed during execution, which violates the design principle.
3. Perhaps detach concurrently should have a more appropriate waiting
mechanism or planning should exclude partitions in this scenario.


pgsql-bugs by date:

Previous
From: David Rowley
Date:
Subject: Re: BUG #18375: requested statistics kind "f" is not yet built for statistics object 16722
Next
From: Michael Paquier
Date:
Subject: Re: BUG #18314: PARALLEL UNSAFE function does not prevent parallel index build