Thread: BUG #18559: Crash after detaching a partition concurrently from another session
BUG #18559: Crash after detaching a partition concurrently from another session
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18559 Logged by: Kuntal Ghosh Email address: kuntalghosh.2007@gmail.com PostgreSQL version: 17beta2 Operating system: AL2 Description: I've encountered the following crash while dropping a partition table followed by detaching it concurrently. #0 0x0000000000900e5f in heap_getattr (tup=0x0, attnum=33, tupleDesc=0x7f40db0a5458, isnull=0x7ffcb110197e) at ../../../src/include/access/htup_details.h:801 801 if (attnum > (int) HeapTupleHeaderGetNatts(tup->t_data)) (gdb) bt #0 0x0000000000900e5f in heap_getattr (tup=0x0, attnum=33, tupleDesc=0x7f40db0a5458, isnull=0x7ffcb110197e) at ../../../src/include/access/htup_details.h:801 #1 0x000000000090123b in RelationBuildPartitionDesc (rel=0x7f40db0b68e8, omit_detached=true) at partdesc.c:237 #2 0x0000000000900fe0 in RelationGetPartitionDesc (rel=0x7f40db0b68e8, omit_detached=true) at partdesc.c:109 #3 0x0000000000901889 in PartitionDirectoryLookup (pdir=0x24287e8, rel=0x7f40db0b68e8) at partdesc.c:457 #4 0x00000000008e77c3 in set_relation_partition_info (root=0x241c308, rel=0x241d518, relation=0x7f40db0b68e8) at plancat.c:2367 #5 0x00000000008e48c6 in get_relation_info (root=0x241c308, relationObjectId=16388, inhparent=true, rel=0x241d518) at plancat.c:554 #6 0x00000000008eb8b7 in build_simple_rel (root=0x241c308, relid=1, parent=0x0) at relnode.c:340 #7 0x000000000089f007 in add_base_rels_to_query (root=0x241c308, jtnode=0x241be90) at initsplan.c:165 #8 0x000000000089f04e in add_base_rels_to_query (root=0x241c308, jtnode=0x241c238) at initsplan.c:173 #9 0x00000000008a5363 in query_planner (root=0x241c308, qp_callback=0x8aba74 <standard_qp_callback>, qp_extra=0x7ffcb1101da0) at planmain.c:170 #10 0x00000000008a7d88 in grouping_planner (root=0x241c308, tuple_fraction=0, setops=0x0) at planner.c:1520 #11 0x00000000008a74b0 in subquery_planner (glob=0x241b988, parse=0x241d1f8, parent_root=0x0, hasRecursion=false, tuple_fraction=0, setops=0x0) at planner.c:1089 #12 0x00000000008a5ae7 in standard_planner (parse=0x241d1f8, query_string=0x23732d8 "prepare p1 as select * from p;", cursorOptions=2048, boundParams=0x0) at planner.c:415 #13 0x00000000008a587e in planner (parse=0x241d1f8, query_string=0x23732d8 "prepare p1 as select * from p;", cursorOptions=2048, boundParams=0x0) at planner.c:282 #14 0x00000000009e7dbc in pg_plan_query (querytree=0x241d1f8, query_string=0x23732d8 "prepare p1 as select * from p;", cursorOptions=2048, boundParams=0x0) at postgres.c:904 #15 0x00000000009e7eed in pg_plan_queries (querytrees=0x241c2b8, query_string=0x23732d8 "prepare p1 as select * from p;", cursorOptions=2048, boundParams=0x0) at postgres.c:996 #16 0x0000000000b9e50f in BuildCachedPlan (plansource=0x2374270, qlist=0x241c2b8, boundParams=0x0, queryEnv=0x0) at plancache.c:962 #17 0x0000000000b9eaeb in GetCachedPlan (plansource=0x2374270, boundParams=0x0, owner=0x0, queryEnv=0x0) at plancache.c:1199 #18 0x00000000006cfd2c in ExecuteQuery (pstate=0x2372ed8, stmt=0x2349130, intoClause=0x0, params=0x0, dest=0x2372e48, qc=0x7ffcb1102630) at prepare.c:193 #19 0x00000000009f0c2c in standard_ProcessUtility (pstmt=0x23491e0, queryString=0x2348720 "execute p1;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x2372e48, qc=0x7ffcb1102630) at utility.c:750 #20 0x00000000009f061b in ProcessUtility (pstmt=0x23491e0, queryString=0x2348720 "execute p1;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x2372e48, qc=0x7ffcb1102630) at utility.c:523 #21 0x00000000009ef237 in PortalRunUtility (portal=0x23c8100, pstmt=0x23491e0, isTopLevel=true, setHoldSnapshot=true, dest=0x2372e48, qc=0x7ffcb1102630) at pquery.c:1158 #22 0x00000000009eefa0 in FillPortalStore (portal=0x23c8100, isTopLevel=true) at pquery.c:1031 #23 0x00000000009ee90e in PortalRun (portal=0x23c8100, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x23495a0, altdest=0x23495a0, qc=0x7ffcb1102800) at pquery.c:763 #24 0x00000000009e83e6 in exec_simple_query (query_string=0x2348720 "execute p1;") at postgres.c:1274 #25 0x00000000009ecb0b in PostgresMain (dbname=0x2381fa8 "postgres", username=0x2381f88 "kuntalgh") at postgres.c:4696 #26 0x00000000009e4c0a in BackendMain (startup_data=0x7ffcb1102b0c "", startup_data_len=4) at backend_startup.c:107 #27 0x0000000000910ea3 in postmaster_child_launch (child_type=B_BACKEND, startup_data=0x7ffcb1102b0c "", startup_data_len=4, client_sock=0x7ffcb1102b30) at launch_backend.c:274 #28 0x0000000000916661 in BackendStartup (client_sock=0x7ffcb1102b30) at postmaster.c:3495 #29 0x0000000000913d7c in ServerLoop () at postmaster.c:1662 #30 0x0000000000913736 in PostmasterMain (argc=3, argv=0x2342ea0) at postmaster.c:1360 #31 0x00000000007d2e9f in main (argc=3, argv=0x2342ea0) at main.c:197 I've reproduced the issue by following [1] with minor modification. 1. ./configure --enable-debug --enable-depend --enable-cassert CFLAGS=-O0 2. make -j; make install -j; initdb -D ./primary; pg_ctl -D ../primary -l logfile start 3. alter system set plan_cache_mode to 'force_generic_plan' ; select pg_reload_conf(); 4. create table p( a int,b int) partition by range(a);create table p1 partition of p for values from (0) to (1);create table p2 partition of p for values from (1) to (2); Now, we need to use GDB to reproduce the crash. Session 1: 1. Attach GDB and put a breakpoint at ATExecDetachPartition Session 2: 1. SQL:prepare p1 as select * from p; 2. Attach GDB and put a breakpoint at ProcessUtility() and find_inheritance_children_extended() Session 1: 1. alter table p detach partition p2 concurrently; 2. The session will be stalled at ATExecDetachPartition. Continue stepping next till CommitTransactionCommand(); Session 2: 1. SQL:execute p1; 2. The session will be stalled at ProcessUtility(). Before that, it takes the snapshot. Session 1: 1. Continue till DetachPartitionFinalize. Session 2: 1. Continue till find_inheritance_children_extended(). It'll find two partitions as transaction 1 isn't yet committed. Complete the execution in that function. Session 1: 1. Run to completion. 2. SQL: drop table p2; Session 1: 1. It will crash as it assumes an entry in pg_class for the dropped relation. The following code assumes that an pg_class entry for the detached partition will always be available which is wrong. Thanks, Kuntal [1] https://www.postgresql.org/message-id/CAHewXNkaKgVmT%2BOkVA9UHrEYm%2Bb8J6o_8%2B-84Qey6V5tM-%2Bz9A%40mail.gmail.com