Thread: BUG #18722: Processing arrays with plpgsql raises errors
The following bug has been logged on the website: Bug reference: 18722 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 17.2 Operating system: Ubuntu 22.04 Description: The following script: CREATE FUNCTION make_ia() RETURNS int[] LANGUAGE plpgsql AS 'declare x int[]; begin x := array[0]; return x; end'; CREATE FUNCTION ia_eq(int[], int[]) RETURNS boolean LANGUAGE plpgsql AS 'begin return array_eq($1, $2); end'; CREATE OPERATOR = (procedure = ia_eq, leftarg = int[], rightarg = int[]); SELECT NULLIF(make_ia(), array[1]::int[]); fails with: ERROR: cache lookup failed for type 2139062143 Also, SELECT NULLIF(make_ia(), array[1]::int[]) = NULL; fails with: ERROR: invalid memory alloc request size 18446744073642179576 The backtrace of the latter error is: ... #6 0x000055b75ff0a96e in MemoryContextSizeFailure (context=0x55b760787d70, size=18446744073642179576, flags=0) at mcxt.c:1170 #7 0x000055b75fefb4ec in MemoryContextCheckSize (context=0x55b760787d70, size=18446744073642179576, flags=0) at ../../../../src/include/utils/memutils_internal.h:172 #8 0x000055b75fefc087 in AllocSetAllocLarge (context=0x55b760787d70, size=18446744073642179576, flags=0) at aset.c:705 #9 0x000055b75fefc6c0 in AllocSetAlloc (context=0x55b760787d70, size=18446744073642179576, flags=0) at aset.c:986 #10 0x000055b75ff0aa40 in MemoryContextAlloc (context=0x55b760787d70, size=18446744073642179576) at mcxt.c:1200 #11 0x000055b75fd04a64 in copy_byval_expanded_array (eah=0x55b760787e70, oldeah=0x55b760787e70) at array_expanded.c:197 #12 0x000055b75fd047d7 in expand_array (arraydatum=94246085885576, parentcontext=0x55b7607b0cd0, metacache=0x7ffd548c2cc0) at array_expanded.c:106 #13 0x00007f0094e5888f in plpgsql_exec_function (func=0x55b7606f2fb0, fcinfo=0x55b7607992a8, simple_eval_estate=0x0, simple_eval_resowner=0x0, procedure_resowner=0x0, atomic=true) at pl_exec.c:564 #14 0x00007f0094e75728 in plpgsql_call_handler (fcinfo=0x55b7607992a8) at pl_handler.c:276 #15 0x000055b75f9dcd47 in ExecInterpExpr (state=0x55b760799150, econtext=0x55b760798ef8, isnull=0x7ffd548c32df) at execExprInterp.c:746 ... (gdb) f 12 #12 0x000055b75fd047d7 in expand_array (arraydatum=94246085885576, parentcontext=0x55b7607b0cd0, metacache=0x7ffd548c2cc0) at array_expanded.c:106 106 copy_byval_expanded_array(eah, oldeah); (gdb) p/x *oldeah $1 = {hdr = {vl_len_ = 0xffffffff, eoh_methods = 0x55b7602d8f80, eoh_context = 0x55b760787d70, eoh_rw_ptr = {0x1, 0x3, 0x70, 0x7e, 0x78, 0x60, 0xb7, 0x55, 0x0, 0x0}, eoh_ro_ptr = {0x1, 0x2, 0x70, 0x7e, 0x78, 0x60, 0xb7, 0x55, 0x0, 0x0}}, ea_magic = 0x29170a59, ndims = 0x7f7f7f7f, dims = 0x7f7f7f7f7f7f7f7f, lbound = 0x7f7f7f7f7f7f7f7f, element_type = 0x7f7f7f7f, typlen = 0x7f7f, typbyval = 0x7f, typalign = 0x7f, dvalues = 0x7f7f7f7f7f7f7f7f, dnulls = 0x7f7f7f7f7f7f7f7f, dvalueslen = 0x7f7f7f7f, nelems = 0x7f7f7f7f, flat_size = 0x7f7f7f7f7f7f7f7f, fvalue = 0x7f7f7f7f7f7f7f7f, fstartptr = 0x7f7f7f7f7f7f7f7f, fendptr = 0x7f7f7f7f7f7f7f7f} I discovered this issue with SQLsmith. Reproduced starting from 1dc5ebc90.
PG Bug reporting form <noreply@postgresql.org> writes: > The following script: > CREATE FUNCTION make_ia() RETURNS int[] LANGUAGE plpgsql AS > 'declare x int[]; begin x := array[0]; return x; end'; > CREATE FUNCTION ia_eq(int[], int[]) RETURNS boolean LANGUAGE plpgsql AS > 'begin return array_eq($1, $2); end'; > CREATE OPERATOR = (procedure = ia_eq, leftarg = int[], rightarg = int[]); > SELECT NULLIF(make_ia(), array[1]::int[]); > fails with: > ERROR: cache lookup failed for type 2139062143 Nice catch! What is happening here is that make_ia returns a read/write pointer to an expanded array object. The EEOP_NULLIF code passes that pointer straight on to the equality function. Which in this case is a plpgsql function that will suppose it can take ownership of the expanded object, resulting in said object being freed before return. (Neither function has done anything wrong.) The problem is that EEOP_NULLIF then returns the original Datum pointer, which is now pointing at garbage. The different failures you get depending on what is done next with the Datum are not too surprising. What EEOP_NULLIF needs to do is pass a read-only pointer to the equality function, so that the object is not modified and remains available to return if we want to do so. Attached is a quick WIP patch to handle that. It is missing a test case, but the real omission is that llvm_compile_expr()'s EEOP_NULLIF handling also needs to be fixed, and I'm pretty unsure how to do that. I'm wondering now if any of our other conditional expressions have similar bugs ... regards, tom lane diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index 8f7a534005..69d36f70b3 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -1191,6 +1191,14 @@ ExecInitExprRec(Expr *node, ExprState *state, op->args, op->opfuncid, op->inputcollid, state); + /* + * If first argument is of varlena type, we'll need to ensure + * that the value passed to the comparison function is a + * read-only pointer. + */ + scratch.d.func.make_ro = + (get_typlen(exprType((Node *) linitial(op->args))) == -1); + /* * Change opcode of call instruction to EEOP_NULLIF. * diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 30c5a19aad..42da962178 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -1308,12 +1308,24 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) * The arguments are already evaluated into fcinfo->args. */ FunctionCallInfo fcinfo = op->d.func.fcinfo_data; + Datum save_arg0 = fcinfo->args[0].value; /* if either argument is NULL they can't be equal */ if (!fcinfo->args[0].isnull && !fcinfo->args[1].isnull) { Datum result; + /* + * If first argument is of varlena type, it might be an + * expanded datum. We need to ensure that the value passed to + * the comparison function is a read-only pointer. However, + * if we end by returning the first argument, that will be the + * original read-write pointer if it was read-write. + */ + if (op->d.func.make_ro) + fcinfo->args[0].value = + MakeExpandedObjectReadOnlyInternal(save_arg0); + fcinfo->isnull = false; result = op->d.func.fn_addr(fcinfo); @@ -1328,7 +1340,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) } /* Arguments aren't equal, so return the first one */ - *op->resvalue = fcinfo->args[0].value; + *op->resvalue = save_arg0; *op->resnull = fcinfo->args[0].isnull; EEO_NEXT(); diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h index cd97dfa062..56fb0d0adb 100644 --- a/src/include/executor/execExpr.h +++ b/src/include/executor/execExpr.h @@ -365,6 +365,7 @@ typedef struct ExprEvalStep /* faster to access without additional indirection: */ PGFunction fn_addr; /* actual call address */ int nargs; /* number of arguments */ + bool make_ro; /* make arg0 R/O (used only for NULLIF) */ } func; /* for EEOP_BOOL_*_STEP */
I wrote: > What EEOP_NULLIF needs to do is pass a read-only pointer to the > equality function, so that the object is not modified and remains > available to return if we want to do so. > Attached is a quick WIP patch to handle that. It is missing a test > case, but the real omission is that llvm_compile_expr()'s EEOP_NULLIF > handling also needs to be fixed, and I'm pretty unsure how to do that. Here's a fleshed-out patch with a test case and JIT support. This is about the first time I've messed with LLVM, so I wouldn't mind some review of what I did in llvmjit_expr.c. In particular, do I correctly understand that "l_funcvalue(b, v_fcinfo, 0)" produces a reference to a copy of the initial value of args[0].value? It seems to work in testing --- the original R/W pointer is returned out of the expression --- but if we returned the R/O pointer instead it'd be mighty difficult to detect the resulting inefficiency. > I'm wondering now if any of our other conditional expressions have > similar bugs ... I was amused to discover that case.sql already had infrastructure suitable for testing this, thanks to bug #14472. regards, tom lane diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index 8f7a534005..69d36f70b3 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -1191,6 +1191,14 @@ ExecInitExprRec(Expr *node, ExprState *state, op->args, op->opfuncid, op->inputcollid, state); + /* + * If first argument is of varlena type, we'll need to ensure + * that the value passed to the comparison function is a + * read-only pointer. + */ + scratch.d.func.make_ro = + (get_typlen(exprType((Node *) linitial(op->args))) == -1); + /* * Change opcode of call instruction to EEOP_NULLIF. * diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 30c5a19aad..42da962178 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -1308,12 +1308,24 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) * The arguments are already evaluated into fcinfo->args. */ FunctionCallInfo fcinfo = op->d.func.fcinfo_data; + Datum save_arg0 = fcinfo->args[0].value; /* if either argument is NULL they can't be equal */ if (!fcinfo->args[0].isnull && !fcinfo->args[1].isnull) { Datum result; + /* + * If first argument is of varlena type, it might be an + * expanded datum. We need to ensure that the value passed to + * the comparison function is a read-only pointer. However, + * if we end by returning the first argument, that will be the + * original read-write pointer if it was read-write. + */ + if (op->d.func.make_ro) + fcinfo->args[0].value = + MakeExpandedObjectReadOnlyInternal(save_arg0); + fcinfo->isnull = false; result = op->d.func.fn_addr(fcinfo); @@ -1328,7 +1340,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) } /* Arguments aren't equal, so return the first one */ - *op->resvalue = fcinfo->args[0].value; + *op->resvalue = save_arg0; *op->resnull = fcinfo->args[0].isnull; EEO_NEXT(); diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c index 9cec17544b..c533f55254 100644 --- a/src/backend/jit/llvm/llvmjit_expr.c +++ b/src/backend/jit/llvm/llvmjit_expr.c @@ -1559,6 +1559,9 @@ llvm_compile_expr(ExprState *state) v_fcinfo = l_ptr_const(fcinfo, l_ptr(StructFunctionCallInfoData)); + /* save original arg[0] */ + v_arg0 = l_funcvalue(b, v_fcinfo, 0); + /* if either argument is NULL they can't be equal */ v_argnull0 = l_funcnull(b, v_fcinfo, 0); v_argnull1 = l_funcnull(b, v_fcinfo, 1); @@ -1575,7 +1578,6 @@ llvm_compile_expr(ExprState *state) /* one (or both) of the arguments are null, return arg[0] */ LLVMPositionBuilderAtEnd(b, b_hasnull); - v_arg0 = l_funcvalue(b, v_fcinfo, 0); LLVMBuildStore(b, v_argnull0, v_resnullp); LLVMBuildStore(b, v_arg0, v_resvaluep); LLVMBuildBr(b, opblocks[opno + 1]); @@ -1583,12 +1585,35 @@ llvm_compile_expr(ExprState *state) /* build block to invoke function and check result */ LLVMPositionBuilderAtEnd(b, b_nonull); + /* + * If first argument is of varlena type, it might be an + * expanded datum. We need to ensure that the value + * passed to the comparison function is a read-only + * pointer. However, if we end by returning the first + * argument, that will be the original read-write pointer + * if it was read-write. + */ + if (op->d.func.make_ro) + { + LLVMValueRef v_params[1]; + LLVMValueRef v_arg0_ro; + + v_params[0] = v_arg0; + v_arg0_ro = + l_call(b, + llvm_pg_var_func_type("MakeExpandedObjectReadOnlyInternal"), + llvm_pg_func(mod, "MakeExpandedObjectReadOnlyInternal"), + v_params, lengthof(v_params), ""); + LLVMBuildStore(b, v_arg0_ro, + l_funcvaluep(b, v_fcinfo, 0)); + } + v_retval = BuildV1Call(context, b, mod, fcinfo, &v_fcinfo_isnull); /* - * If result not null, and arguments are equal return null - * (same result as if there'd been NULLs, hence reuse - * b_hasnull). + * If result not null and arguments are equal return null, + * else return arg[0] (same result as if there'd been + * NULLs, hence reuse b_hasnull). */ v_argsequal = LLVMBuildAnd(b, LLVMBuildICmp(b, LLVMIntEQ, diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h index cd97dfa062..56fb0d0adb 100644 --- a/src/include/executor/execExpr.h +++ b/src/include/executor/execExpr.h @@ -365,6 +365,7 @@ typedef struct ExprEvalStep /* faster to access without additional indirection: */ PGFunction fn_addr; /* actual call address */ int nargs; /* number of arguments */ + bool make_ro; /* make arg0 R/O (used only for NULLIF) */ } func; /* for EEOP_BOOL_*_STEP */ diff --git a/src/test/regress/expected/case.out b/src/test/regress/expected/case.out index f5136c17ab..efee7fc431 100644 --- a/src/test/regress/expected/case.out +++ b/src/test/regress/expected/case.out @@ -397,6 +397,14 @@ SELECT CASE make_ad(1,2) right (1 row) +-- While we're here, also test handling of a NULLIF arg that is a read/write +-- object (bug #18722) +SELECT NULLIF(make_ad(1,2), array[2,3]::arrdomain); + nullif +-------- + {1,2} +(1 row) + ROLLBACK; -- Test interaction of CASE with ArrayCoerceExpr (bug #15471) BEGIN; diff --git a/src/test/regress/sql/case.sql b/src/test/regress/sql/case.sql index 83fe43be6b..388d4c6f52 100644 --- a/src/test/regress/sql/case.sql +++ b/src/test/regress/sql/case.sql @@ -242,6 +242,11 @@ SELECT CASE make_ad(1,2) WHEN array[1,2]::arrdomain THEN 'right' END; +-- While we're here, also test handling of a NULLIF arg that is a read/write +-- object (bug #18722) + +SELECT NULLIF(make_ad(1,2), array[2,3]::arrdomain); + ROLLBACK; -- Test interaction of CASE with ArrayCoerceExpr (bug #15471)
On Sun, 24 Nov 2024 at 22:49, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Here's a fleshed-out patch with a test case and JIT support. This > is about the first time I've messed with LLVM, so I wouldn't mind > some review of what I did in llvmjit_expr.c. In particular, do I > correctly understand that "l_funcvalue(b, v_fcinfo, 0)" produces > a reference to a copy of the initial value of args[0].value? > I don't know about that, but I wonder if this bug could be fixed by having ExecInitExprRec() insert a EEOP_MAKE_READONLY step. Then it wouldn't be necessary to make any changes to the expression evaluation code. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On Sun, 24 Nov 2024 at 22:49, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Here's a fleshed-out patch with a test case and JIT support. This >> is about the first time I've messed with LLVM, so I wouldn't mind >> some review of what I did in llvmjit_expr.c. In particular, do I >> correctly understand that "l_funcvalue(b, v_fcinfo, 0)" produces >> a reference to a copy of the initial value of args[0].value? > I don't know about that, but I wonder if this bug could be fixed by > having ExecInitExprRec() insert a EEOP_MAKE_READONLY step. Then it > wouldn't be necessary to make any changes to the expression evaluation > code. That would entirely destroy one of the primary performance benefits of the expanded-object infrastructure. The idea is that if you have fconsumer(fproducer(...), ...) and fproducer returns a read-write pointer to an object it's built, then fconsumer should be able to take ownership of the object and use it as a local variable (possibly modifying it) without incurring any object-copying overhead. This works in any context where an intermediate expression value has a single consumer, which is most. If there are multiple consumers then we need to insert MAKE_READONLY steps for all (or all but one) of them. I overlooked EEOP_NULLIF as such a case, but I don't think there are so many more cases as to justify throwing away the concept altogether. regards, tom lane
On Mon, 25 Nov 2024 at 15:07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Dean Rasheed <dean.a.rasheed@gmail.com> writes: > > I don't know about that, but I wonder if this bug could be fixed by > > having ExecInitExprRec() insert a EEOP_MAKE_READONLY step. Then it > > wouldn't be necessary to make any changes to the expression evaluation > > code. > > That would entirely destroy one of the primary performance benefits of > the expanded-object infrastructure. The idea is that if you have > fconsumer(fproducer(...), ...) > and fproducer returns a read-write pointer to an object it's built, > then fconsumer should be able to take ownership of the object and > use it as a local variable (possibly modifying it) without incurring > any object-copying overhead. > > This works in any context where an intermediate expression value > has a single consumer, which is most. If there are multiple > consumers then we need to insert MAKE_READONLY steps for all > (or all but one) of them. I overlooked EEOP_NULLIF as such > a case, but I don't think there are so many more cases as to > justify throwing away the concept altogether. > I didn't mean do it in all cases, I just meant the NullIfExpr case identified here. My point was that instead of modifying the evaluation code for EEOP_NULLIF to make it call MakeExpandedObjectReadOnlyInternal(), it would be easier to insert a EEOP_MAKE_READONLY step for the first argument of the EEOP_NULLIF step. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > I didn't mean do it in all cases, I just meant the NullIfExpr case > identified here. My point was that instead of modifying the evaluation > code for EEOP_NULLIF to make it call > MakeExpandedObjectReadOnlyInternal(), it would be easier to insert a > EEOP_MAKE_READONLY step for the first argument of the EEOP_NULLIF > step. But then the NULLIF step would only have access to the R/O pointer, no? We do want to pass on a R/W pointer to the output, if we got one, to handle cases like fconsumer(NULLIF(fproducer(...), ...), ...) Admittedly that's a pretty edgy edge-case, but still we're leaving money on the table if we don't do it. So I think we have to deal with the issue within NULLIF. regards, tom lane
On Mon, 25 Nov 2024 at 19:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Dean Rasheed <dean.a.rasheed@gmail.com> writes: > > I didn't mean do it in all cases, I just meant the NullIfExpr case > > identified here. My point was that instead of modifying the evaluation > > code for EEOP_NULLIF to make it call > > MakeExpandedObjectReadOnlyInternal(), it would be easier to insert a > > EEOP_MAKE_READONLY step for the first argument of the EEOP_NULLIF > > step. > > But then the NULLIF step would only have access to the R/O pointer, > no? We do want to pass on a R/W pointer to the output, if we got > one, to handle cases like > fconsumer(NULLIF(fproducer(...), ...), ...) > Admittedly that's a pretty edgy edge-case, but still we're leaving > money on the table if we don't do it. So I think we have to deal > with the issue within NULLIF. > OK, that makes sense. Regards, Dean