BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning |
Date | |
Msg-id | 18484-3e357ade5fe50e61@postgresql.org Whole thread Raw |
Responses |
Re: BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18484 Logged by: Michael Banck Email address: michael.banck@netapp.com PostgreSQL version: 16.3 Operating system: Debian bookworm Description: A customer reported to us that a query/prepared statement on their database aborted with ERROR: out of memory DETAIL: Cannot enlarge string buffer containing 1073741812 bytes by 32 more bytes. The underlying database schema was migrated from Oracle ("no problems there") and the base table is two-level partitioned, one range partitioning per day (currently 135), and then each of those is list-partitioned to around 50 sub-partitions. Each partition has around 20 indexes. This leads to around 20k tables and 400k indexes in the schema. The query consists on a simple select on the table with a lot of OR conditions like ((((A_FOO = $1) OR (B_FOO = $2)) AND ((START_TIME <= $3) AND (END_TIME > $4))) OR (((A_FOO = $5) OR (B_FOO = $6)) AND ((START_TIME <= $7) AND (END_TIME > $8))) OR (((A_FOO = $9) OR (B_FOO = $10)) AND ((START_TIME <= $11) AND (END_TIME > $12))) OR (((A_FOO = $13) OR (B_FOO = $14)) AND ((START_TIME <= $15) AND (END_TIME > $16)) Overall there are around 100 parameters. The explain plan shows around 15k Parallel Seq Scans on the various paritions and is around a millions lines long (in order to get the explain, max_connections and max_locks_per_transaction need to be bumped up considerably from the defaults). They shared the schema with us and we were able to reproduce the error on the empty schema. If we set max_parallel_workers_per_gather=0, the prepared statement executes fine. Also, testing this on 17beta1 did not throw the error. The backtrace is: #0 errfinish (filename=0x55feb1d28100 "./build/../src/common/stringinfo.c", lineno=303, funcname=0x55feb1d28170 <__func__.0> "enlargeStringInfo") at ./build/../src/backend/utils/error/elog.c:480 #1 0x000055feb16c7292 in enlargeStringInfo (str=str@entry=0x7fff222fe4b0, needed=32) at ./build/../src/common/stringinfo.c:303 #2 0x000055feb1b9c666 in appendStringInfo (str=str@entry=0x7fff222fe4b0, fmt=fmt@entry=0x55feb1c9e3fa " :varlevelsup %u") at ./build/../src/common/stringinfo.c:110 #3 0x000055feb18e1810 in _outVar (node=0x560304a6c420, str=0x7fff222fe4b0) at ./build/../src/backend/nodes/outfuncs.funcs.c:110 #4 outNode (str=0x7fff222fe4b0, obj=0x560304a6c420) at ./build/../src/backend/nodes/outfuncs.switch.c:31 #5 0x000055feb18e4626 in _outList (str=0x7fff222fe4b0, node=0x560304a6c3d8) at ./build/../src/backend/nodes/outfuncs.c:294 #6 0x000055feb18e0b4a in outNode (str=<optimized out>, obj=<optimized out>) at ./build/../src/backend/nodes/outfuncs.c:843 #7 0x000055feb18e528c in _outOpExpr (str=str@entry=0x7fff222fe4b0, node=node@entry=0x560304a6c390) at ./build/../src/backend/nodes/outfuncs.funcs.c:239 #8 0x000055feb18e187d in outNode (str=0x7fff222fe4b0, obj=0x560304a6c390) at ./build/../src/backend/nodes/outfuncs.switch.c:58 #9 0x000055feb18e4626 in _outList (str=0x7fff222fe4b0, node=0x560304a6c348) at ./build/../src/backend/nodes/outfuncs.c:294 #10 0x000055feb18e0b4a in outNode (str=<optimized out>, obj=<optimized out>) at ./build/../src/backend/nodes/outfuncs.c:843 #11 0x000055feb18f6a81 in _outSeqScan (str=str@entry=0x7fff222fe4b0, node=node@entry=0x560304a6a2e8) at ./build/../src/backend/nodes/outfuncs.funcs.c:4610 #12 0x000055feb18e119b in outNode (str=0x7fff222fe4b0, obj=0x560304a6a2e8) at ./build/../src/backend/nodes/outfuncs.switch.c:961 #13 0x000055feb18e4626 in _outList (str=0x7fff222fe4b0, node=0x5602d5976a10) at ./build/../src/backend/nodes/outfuncs.c:294 #14 0x000055feb18e0b4a in outNode (str=<optimized out>, obj=<optimized out>) at ./build/../src/backend/nodes/outfuncs.c:843 #15 0x000055feb18f5dc5 in _outAppend (str=0x7fff222fe4b0, node=0x5602d50a0598) at ./build/../src/backend/nodes/outfuncs.funcs.c:4485 #16 0x000055feb18e11eb in outNode (str=0x7fff222fe4b0, obj=0x5602d50a0598) at ./build/../src/backend/nodes/outfuncs.switch.c:946 #17 0x000055feb18f50fd in _outPlannedStmt (str=str@entry=0x7fff222fe4b0, node=node@entry=0x56030f2d1da8) at ./build/../src/backend/nodes/outfuncs.funcs.c:4360 #18 0x000055feb18e122b in outNode (str=0x7fff222fe4b0, obj=0x56030f2d1da8) at ./build/../src/backend/nodes/outfuncs.switch.c:934 #19 0x000055feb18fc9e4 in nodeToString (obj=obj@entry=0x56030f2d1da8) at ./build/../src/backend/nodes/outfuncs.c:889 #20 0x000055feb18741f7 in ExecSerializePlan (estate=0x5602749b9010, plan=<optimized out>) at ./build/../src/backend/executor/execParallel.c:218 #21 ExecInitParallelPlan (planstate=0x5602749b9620, estate=0x5602749b9010, sendParams=0x0, nworkers=2, tuples_needed=-1) at ./build/../src/backend/executor/execParallel.c:633 #22 0x000055feb188b051 in ExecGather (pstate=0x5602749b9448) at ./build/../src/backend/executor/nodeGather.c:171 #23 0x000055feb18a1866 in ExecProcNode (node=0x5602749b9448) at ./build/../src/include/executor/executor.h:273 #24 ExecSort (pstate=0x5602749b9240) at ./build/../src/backend/executor/nodeSort.c:149 #25 0x000055feb1870c05 in ExecProcNode (node=0x5602749b9240) at ./build/../src/include/executor/executor.h:273 #26 ExecutePlan (execute_once=<optimized out>, dest=0x55feb2b3f6d8, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x5602749b9240, estate=0x5602749b9010) at ./build/../src/backend/executor/execMain.c:1670 #27 standard_ExecutorRun (queryDesc=0x55feb2b4ddd0, direction=<optimized out>, count=0, execute_once=<optimized out>) at ./build/../src/backend/executor/execMain.c:365 #28 0x000055feb1a18c9e in PortalRunSelect (portal=portal@entry=0x55feb2bdcf88, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x55feb2b3f6d8) at ./build/../src/backend/tcop/pquery.c:924 #29 0x000055feb1a1a130 in PortalRun (portal=portal@entry=0x55feb2bdcf88, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=false, run_once=run_once@entry=true, dest=dest@entry=0x55feb2b3f6d8, altdest=altdest@entry=0x55feb2b3f6d8, qc=<optimized out>) at ./build/../src/backend/tcop/pquery.c:768 #30 0x000055feb1825abc in ExecuteQuery (pstate=<optimized out>, stmt=<optimized out>, intoClause=<optimized out>, params=<optimized out>, dest=<optimized out>, qc=<optimized out>) at ./build/../src/backend/commands/prepare.c:258 #31 0x000055feb1a1b748 in standard_ProcessUtility (pstmt=0x55feb2afc048, queryString=0x55feb2afab30 "execute t (\n'262010000000001', '262010000000001', '20100621131707', '20090420140642',\n'262010000000002','262010000000002', '20090420140642', '20081209084155', \n'262010000000003', '262010000000003', "..., readOnlyTree=<optimized out>, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=<optimized out>, qc=<optimized out>) at ./build/../src/backend/tcop/utility.c:757 #32 0x000055feb1a19bb1 in PortalRunUtility (portal=portal@entry=0x55feb2bdce80, pstmt=0x55feb2afc048, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=true, dest=dest@entry=0x55feb2b3f6d8, qc=qc@entry=0x7fff222fe9d0) at ./build/../src/backend/tcop/pquery.c:1158 #33 0x000055feb1a19f77 in FillPortalStore (portal=portal@entry=0x55feb2bdce80, isTopLevel=isTopLevel@entry=true) at ./build/../src/backend/tcop/pquery.c:1031 #34 0x000055feb1a1a2bf in PortalRun (portal=portal@entry=0x55feb2bdce80, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x55feb2afc150, altdest=altdest@entry=0x55feb2afc150, qc=<optimized out>) at ./build/../src/backend/tcop/pquery.c:763 #35 0x000055feb1a164c5 in exec_simple_query (query_string=<optimized out>) at ./build/../src/backend/tcop/postgres.c:1272 #36 0x000055feb1a17209 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at ./build/../src/backend/tcop/postgres.c:4654 #37 0x000055feb198f33c in BackendRun (port=0x55feb2b4e7f0) at ./build/../src/backend/postmaster/postmaster.c:4464 #38 BackendStartup (port=0x55feb2b4e7f0) at ./build/../src/backend/postmaster/postmaster.c:4192 #39 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1782 #40 0x000055feb1990328 in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x55feb2ab7ef0) at ./build/../src/backend/postmaster/postmaster.c:1466 #41 0x000055feb16c763d in main (argc=5, argv=0x55feb2ab7ef0) at ./build/../src/backend/main/main.c:198 So I guess the problem is somewhere here: ExecInitParallelPlan - src/backend/executor/execParallel.c:633 /* Fix up and serialize plan to be sent to workers. */ pstmt_data = ExecSerializePlan(planstate->plan, estate); ExecSerializePlan - src/backend/executor/execParallel.c:218 /* Return serialized copy of our dummy PlannedStmt. */ return nodeToString(pstmt); I agree that the overall schema architecture should be rethought and much fewer partitions be used (and they are looking into this), but as the error does only happen with a parallel plan, I am reporting it here in case there is an easy fix for this, or at least maybe we can abort earlier with a better error message?
pgsql-bugs by date: