Thread: BUG #18722: Processing arrays with plpgsql raises errors

BUG #18722: Processing arrays with plpgsql raises errors

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18722
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 17.2
Operating system:   Ubuntu 22.04
Description:

The following script:
CREATE FUNCTION make_ia() RETURNS int[] LANGUAGE plpgsql AS
  'declare x int[]; begin x := array[0]; return x; end';
CREATE FUNCTION ia_eq(int[], int[]) RETURNS boolean LANGUAGE plpgsql AS
'begin return array_eq($1, $2); end'; 
CREATE OPERATOR = (procedure = ia_eq, leftarg = int[], rightarg = int[]);
SELECT NULLIF(make_ia(), array[1]::int[]);

fails with:
ERROR:  cache lookup failed for type 2139062143

Also,
SELECT NULLIF(make_ia(), array[1]::int[]) = NULL;

fails with:
ERROR:  invalid memory alloc request size 18446744073642179576

The backtrace of the latter error is:
...
#6  0x000055b75ff0a96e in MemoryContextSizeFailure (context=0x55b760787d70,
size=18446744073642179576, flags=0) at mcxt.c:1170
#7  0x000055b75fefb4ec in MemoryContextCheckSize (context=0x55b760787d70,
size=18446744073642179576, flags=0) at
../../../../src/include/utils/memutils_internal.h:172
#8  0x000055b75fefc087 in AllocSetAllocLarge (context=0x55b760787d70,
size=18446744073642179576, flags=0) at aset.c:705
#9  0x000055b75fefc6c0 in AllocSetAlloc (context=0x55b760787d70,
size=18446744073642179576, flags=0) at aset.c:986
#10 0x000055b75ff0aa40 in MemoryContextAlloc (context=0x55b760787d70,
size=18446744073642179576) at mcxt.c:1200
#11 0x000055b75fd04a64 in copy_byval_expanded_array (eah=0x55b760787e70,
oldeah=0x55b760787e70) at array_expanded.c:197
#12 0x000055b75fd047d7 in expand_array (arraydatum=94246085885576,
parentcontext=0x55b7607b0cd0, metacache=0x7ffd548c2cc0) at
array_expanded.c:106
#13 0x00007f0094e5888f in plpgsql_exec_function (func=0x55b7606f2fb0,
fcinfo=0x55b7607992a8, simple_eval_estate=0x0, simple_eval_resowner=0x0,
procedure_resowner=0x0, atomic=true) at pl_exec.c:564
#14 0x00007f0094e75728 in plpgsql_call_handler (fcinfo=0x55b7607992a8) at
pl_handler.c:276
#15 0x000055b75f9dcd47 in ExecInterpExpr (state=0x55b760799150,
econtext=0x55b760798ef8, isnull=0x7ffd548c32df) at execExprInterp.c:746
...

(gdb) f 12
#12 0x000055b75fd047d7 in expand_array (arraydatum=94246085885576,
parentcontext=0x55b7607b0cd0, metacache=0x7ffd548c2cc0) at
array_expanded.c:106
106                             copy_byval_expanded_array(eah, oldeah);
(gdb) p/x *oldeah
$1 = {hdr = {vl_len_ = 0xffffffff, eoh_methods = 0x55b7602d8f80, eoh_context
= 0x55b760787d70, eoh_rw_ptr = {0x1, 0x3, 0x70, 0x7e, 0x78, 0x60, 0xb7,
0x55, 0x0, 0x0}, eoh_ro_ptr = {0x1, 0x2, 0x70, 0x7e, 0x78, 0x60, 0xb7, 0x55,
0x0, 
      0x0}}, ea_magic = 0x29170a59, ndims = 0x7f7f7f7f, dims =
0x7f7f7f7f7f7f7f7f, lbound = 0x7f7f7f7f7f7f7f7f, element_type = 0x7f7f7f7f,
typlen = 0x7f7f, typbyval = 0x7f, typalign = 0x7f, dvalues =
0x7f7f7f7f7f7f7f7f, 
  dnulls = 0x7f7f7f7f7f7f7f7f, dvalueslen = 0x7f7f7f7f, nelems = 0x7f7f7f7f,
flat_size = 0x7f7f7f7f7f7f7f7f, fvalue = 0x7f7f7f7f7f7f7f7f, fstartptr =
0x7f7f7f7f7f7f7f7f, fendptr = 0x7f7f7f7f7f7f7f7f}

I discovered this issue with SQLsmith.
Reproduced starting from 1dc5ebc90.


Re: BUG #18722: Processing arrays with plpgsql raises errors

From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes:
> The following script:
> CREATE FUNCTION make_ia() RETURNS int[] LANGUAGE plpgsql AS
>   'declare x int[]; begin x := array[0]; return x; end';
> CREATE FUNCTION ia_eq(int[], int[]) RETURNS boolean LANGUAGE plpgsql AS
> 'begin return array_eq($1, $2); end'; 
> CREATE OPERATOR = (procedure = ia_eq, leftarg = int[], rightarg = int[]);
> SELECT NULLIF(make_ia(), array[1]::int[]);

> fails with:
> ERROR:  cache lookup failed for type 2139062143

Nice catch!  What is happening here is that make_ia returns a
read/write pointer to an expanded array object.  The EEOP_NULLIF
code passes that pointer straight on to the equality function.
Which in this case is a plpgsql function that will suppose it
can take ownership of the expanded object, resulting in said
object being freed before return.  (Neither function has done
anything wrong.)  The problem is that EEOP_NULLIF then returns
the original Datum pointer, which is now pointing at garbage.
The different failures you get depending on what is done next
with the Datum are not too surprising.

What EEOP_NULLIF needs to do is pass a read-only pointer to the
equality function, so that the object is not modified and remains
available to return if we want to do so.

Attached is a quick WIP patch to handle that.  It is missing a test
case, but the real omission is that llvm_compile_expr()'s EEOP_NULLIF
handling also needs to be fixed, and I'm pretty unsure how to do that.

I'm wondering now if any of our other conditional expressions have
similar bugs ...

            regards, tom lane

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 8f7a534005..69d36f70b3 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -1191,6 +1191,14 @@ ExecInitExprRec(Expr *node, ExprState *state,
                              op->args, op->opfuncid, op->inputcollid,
                              state);

+                /*
+                 * If first argument is of varlena type, we'll need to ensure
+                 * that the value passed to the comparison function is a
+                 * read-only pointer.
+                 */
+                scratch.d.func.make_ro =
+                    (get_typlen(exprType((Node *) linitial(op->args))) == -1);
+
                 /*
                  * Change opcode of call instruction to EEOP_NULLIF.
                  *
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 30c5a19aad..42da962178 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -1308,12 +1308,24 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
              * The arguments are already evaluated into fcinfo->args.
              */
             FunctionCallInfo fcinfo = op->d.func.fcinfo_data;
+            Datum        save_arg0 = fcinfo->args[0].value;

             /* if either argument is NULL they can't be equal */
             if (!fcinfo->args[0].isnull && !fcinfo->args[1].isnull)
             {
                 Datum        result;

+                /*
+                 * If first argument is of varlena type, it might be an
+                 * expanded datum.  We need to ensure that the value passed to
+                 * the comparison function is a read-only pointer.  However,
+                 * if we end by returning the first argument, that will be the
+                 * original read-write pointer if it was read-write.
+                 */
+                if (op->d.func.make_ro)
+                    fcinfo->args[0].value =
+                        MakeExpandedObjectReadOnlyInternal(save_arg0);
+
                 fcinfo->isnull = false;
                 result = op->d.func.fn_addr(fcinfo);

@@ -1328,7 +1340,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
             }

             /* Arguments aren't equal, so return the first one */
-            *op->resvalue = fcinfo->args[0].value;
+            *op->resvalue = save_arg0;
             *op->resnull = fcinfo->args[0].isnull;

             EEO_NEXT();
diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index cd97dfa062..56fb0d0adb 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -365,6 +365,7 @@ typedef struct ExprEvalStep
             /* faster to access without additional indirection: */
             PGFunction    fn_addr;    /* actual call address */
             int            nargs;    /* number of arguments */
+            bool        make_ro;    /* make arg0 R/O (used only for NULLIF) */
         }            func;

         /* for EEOP_BOOL_*_STEP */

Re: BUG #18722: Processing arrays with plpgsql raises errors

From
Tom Lane
Date:
I wrote:
> What EEOP_NULLIF needs to do is pass a read-only pointer to the
> equality function, so that the object is not modified and remains
> available to return if we want to do so.

> Attached is a quick WIP patch to handle that.  It is missing a test
> case, but the real omission is that llvm_compile_expr()'s EEOP_NULLIF
> handling also needs to be fixed, and I'm pretty unsure how to do that.

Here's a fleshed-out patch with a test case and JIT support.  This
is about the first time I've messed with LLVM, so I wouldn't mind
some review of what I did in llvmjit_expr.c.  In particular, do I
correctly understand that "l_funcvalue(b, v_fcinfo, 0)" produces
a reference to a copy of the initial value of args[0].value?
It seems to work in testing --- the original R/W pointer is returned
out of the expression --- but if we returned the R/O pointer instead
it'd be mighty difficult to detect the resulting inefficiency.

> I'm wondering now if any of our other conditional expressions have
> similar bugs ...

I was amused to discover that case.sql already had infrastructure
suitable for testing this, thanks to bug #14472.

            regards, tom lane

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 8f7a534005..69d36f70b3 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -1191,6 +1191,14 @@ ExecInitExprRec(Expr *node, ExprState *state,
                              op->args, op->opfuncid, op->inputcollid,
                              state);

+                /*
+                 * If first argument is of varlena type, we'll need to ensure
+                 * that the value passed to the comparison function is a
+                 * read-only pointer.
+                 */
+                scratch.d.func.make_ro =
+                    (get_typlen(exprType((Node *) linitial(op->args))) == -1);
+
                 /*
                  * Change opcode of call instruction to EEOP_NULLIF.
                  *
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 30c5a19aad..42da962178 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -1308,12 +1308,24 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
              * The arguments are already evaluated into fcinfo->args.
              */
             FunctionCallInfo fcinfo = op->d.func.fcinfo_data;
+            Datum        save_arg0 = fcinfo->args[0].value;

             /* if either argument is NULL they can't be equal */
             if (!fcinfo->args[0].isnull && !fcinfo->args[1].isnull)
             {
                 Datum        result;

+                /*
+                 * If first argument is of varlena type, it might be an
+                 * expanded datum.  We need to ensure that the value passed to
+                 * the comparison function is a read-only pointer.  However,
+                 * if we end by returning the first argument, that will be the
+                 * original read-write pointer if it was read-write.
+                 */
+                if (op->d.func.make_ro)
+                    fcinfo->args[0].value =
+                        MakeExpandedObjectReadOnlyInternal(save_arg0);
+
                 fcinfo->isnull = false;
                 result = op->d.func.fn_addr(fcinfo);

@@ -1328,7 +1340,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
             }

             /* Arguments aren't equal, so return the first one */
-            *op->resvalue = fcinfo->args[0].value;
+            *op->resvalue = save_arg0;
             *op->resnull = fcinfo->args[0].isnull;

             EEO_NEXT();
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 9cec17544b..c533f55254 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -1559,6 +1559,9 @@ llvm_compile_expr(ExprState *state)

                     v_fcinfo = l_ptr_const(fcinfo, l_ptr(StructFunctionCallInfoData));

+                    /* save original arg[0] */
+                    v_arg0 = l_funcvalue(b, v_fcinfo, 0);
+
                     /* if either argument is NULL they can't be equal */
                     v_argnull0 = l_funcnull(b, v_fcinfo, 0);
                     v_argnull1 = l_funcnull(b, v_fcinfo, 1);
@@ -1575,7 +1578,6 @@ llvm_compile_expr(ExprState *state)

                     /* one (or both) of the arguments are null, return arg[0] */
                     LLVMPositionBuilderAtEnd(b, b_hasnull);
-                    v_arg0 = l_funcvalue(b, v_fcinfo, 0);
                     LLVMBuildStore(b, v_argnull0, v_resnullp);
                     LLVMBuildStore(b, v_arg0, v_resvaluep);
                     LLVMBuildBr(b, opblocks[opno + 1]);
@@ -1583,12 +1585,35 @@ llvm_compile_expr(ExprState *state)
                     /* build block to invoke function and check result */
                     LLVMPositionBuilderAtEnd(b, b_nonull);

+                    /*
+                     * If first argument is of varlena type, it might be an
+                     * expanded datum.  We need to ensure that the value
+                     * passed to the comparison function is a read-only
+                     * pointer.  However, if we end by returning the first
+                     * argument, that will be the original read-write pointer
+                     * if it was read-write.
+                     */
+                    if (op->d.func.make_ro)
+                    {
+                        LLVMValueRef v_params[1];
+                        LLVMValueRef v_arg0_ro;
+
+                        v_params[0] = v_arg0;
+                        v_arg0_ro =
+                            l_call(b,
+                                   llvm_pg_var_func_type("MakeExpandedObjectReadOnlyInternal"),
+                                   llvm_pg_func(mod, "MakeExpandedObjectReadOnlyInternal"),
+                                   v_params, lengthof(v_params), "");
+                        LLVMBuildStore(b, v_arg0_ro,
+                                       l_funcvaluep(b, v_fcinfo, 0));
+                    }
+
                     v_retval = BuildV1Call(context, b, mod, fcinfo, &v_fcinfo_isnull);

                     /*
-                     * If result not null, and arguments are equal return null
-                     * (same result as if there'd been NULLs, hence reuse
-                     * b_hasnull).
+                     * If result not null and arguments are equal return null,
+                     * else return arg[0] (same result as if there'd been
+                     * NULLs, hence reuse b_hasnull).
                      */
                     v_argsequal = LLVMBuildAnd(b,
                                                LLVMBuildICmp(b, LLVMIntEQ,
diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index cd97dfa062..56fb0d0adb 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -365,6 +365,7 @@ typedef struct ExprEvalStep
             /* faster to access without additional indirection: */
             PGFunction    fn_addr;    /* actual call address */
             int            nargs;    /* number of arguments */
+            bool        make_ro;    /* make arg0 R/O (used only for NULLIF) */
         }            func;

         /* for EEOP_BOOL_*_STEP */
diff --git a/src/test/regress/expected/case.out b/src/test/regress/expected/case.out
index f5136c17ab..efee7fc431 100644
--- a/src/test/regress/expected/case.out
+++ b/src/test/regress/expected/case.out
@@ -397,6 +397,14 @@ SELECT CASE make_ad(1,2)
  right
 (1 row)

+-- While we're here, also test handling of a NULLIF arg that is a read/write
+-- object (bug #18722)
+SELECT NULLIF(make_ad(1,2), array[2,3]::arrdomain);
+ nullif
+--------
+ {1,2}
+(1 row)
+
 ROLLBACK;
 -- Test interaction of CASE with ArrayCoerceExpr (bug #15471)
 BEGIN;
diff --git a/src/test/regress/sql/case.sql b/src/test/regress/sql/case.sql
index 83fe43be6b..388d4c6f52 100644
--- a/src/test/regress/sql/case.sql
+++ b/src/test/regress/sql/case.sql
@@ -242,6 +242,11 @@ SELECT CASE make_ad(1,2)
   WHEN array[1,2]::arrdomain THEN 'right'
   END;

+-- While we're here, also test handling of a NULLIF arg that is a read/write
+-- object (bug #18722)
+
+SELECT NULLIF(make_ad(1,2), array[2,3]::arrdomain);
+
 ROLLBACK;

 -- Test interaction of CASE with ArrayCoerceExpr (bug #15471)

Re: BUG #18722: Processing arrays with plpgsql raises errors

From
Dean Rasheed
Date:
On Sun, 24 Nov 2024 at 22:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Here's a fleshed-out patch with a test case and JIT support.  This
> is about the first time I've messed with LLVM, so I wouldn't mind
> some review of what I did in llvmjit_expr.c.  In particular, do I
> correctly understand that "l_funcvalue(b, v_fcinfo, 0)" produces
> a reference to a copy of the initial value of args[0].value?
>

I don't know about that, but I wonder if this bug could be fixed by
having ExecInitExprRec() insert a EEOP_MAKE_READONLY step. Then it
wouldn't be necessary to make any changes to the expression evaluation
code.

Regards,
Dean



Re: BUG #18722: Processing arrays with plpgsql raises errors

From
Tom Lane
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On Sun, 24 Nov 2024 at 22:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Here's a fleshed-out patch with a test case and JIT support.  This
>> is about the first time I've messed with LLVM, so I wouldn't mind
>> some review of what I did in llvmjit_expr.c.  In particular, do I
>> correctly understand that "l_funcvalue(b, v_fcinfo, 0)" produces
>> a reference to a copy of the initial value of args[0].value?

> I don't know about that, but I wonder if this bug could be fixed by
> having ExecInitExprRec() insert a EEOP_MAKE_READONLY step. Then it
> wouldn't be necessary to make any changes to the expression evaluation
> code.

That would entirely destroy one of the primary performance benefits of
the expanded-object infrastructure.  The idea is that if you have
    fconsumer(fproducer(...), ...)
and fproducer returns a read-write pointer to an object it's built,
then fconsumer should be able to take ownership of the object and
use it as a local variable (possibly modifying it) without incurring
any object-copying overhead.

This works in any context where an intermediate expression value
has a single consumer, which is most.  If there are multiple
consumers then we need to insert MAKE_READONLY steps for all
(or all but one) of them.  I overlooked EEOP_NULLIF as such
a case, but I don't think there are so many more cases as to
justify throwing away the concept altogether.

            regards, tom lane



Re: BUG #18722: Processing arrays with plpgsql raises errors

From
Dean Rasheed
Date:
On Mon, 25 Nov 2024 at 15:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> > I don't know about that, but I wonder if this bug could be fixed by
> > having ExecInitExprRec() insert a EEOP_MAKE_READONLY step. Then it
> > wouldn't be necessary to make any changes to the expression evaluation
> > code.
>
> That would entirely destroy one of the primary performance benefits of
> the expanded-object infrastructure.  The idea is that if you have
>     fconsumer(fproducer(...), ...)
> and fproducer returns a read-write pointer to an object it's built,
> then fconsumer should be able to take ownership of the object and
> use it as a local variable (possibly modifying it) without incurring
> any object-copying overhead.
>
> This works in any context where an intermediate expression value
> has a single consumer, which is most.  If there are multiple
> consumers then we need to insert MAKE_READONLY steps for all
> (or all but one) of them.  I overlooked EEOP_NULLIF as such
> a case, but I don't think there are so many more cases as to
> justify throwing away the concept altogether.
>

I didn't mean do it in all cases, I just meant the NullIfExpr case
identified here. My point was that instead of modifying the evaluation
code for EEOP_NULLIF to make it call
MakeExpandedObjectReadOnlyInternal(), it would be easier to insert a
EEOP_MAKE_READONLY step for the first argument of the EEOP_NULLIF
step.

Regards,
Dean



Re: BUG #18722: Processing arrays with plpgsql raises errors

From
Tom Lane
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> I didn't mean do it in all cases, I just meant the NullIfExpr case
> identified here. My point was that instead of modifying the evaluation
> code for EEOP_NULLIF to make it call
> MakeExpandedObjectReadOnlyInternal(), it would be easier to insert a
> EEOP_MAKE_READONLY step for the first argument of the EEOP_NULLIF
> step.

But then the NULLIF step would only have access to the R/O pointer,
no?  We do want to pass on a R/W pointer to the output, if we got
one, to handle cases like
    fconsumer(NULLIF(fproducer(...), ...), ...)
Admittedly that's a pretty edgy edge-case, but still we're leaving
money on the table if we don't do it.  So I think we have to deal
with the issue within NULLIF.

            regards, tom lane



Re: BUG #18722: Processing arrays with plpgsql raises errors

From
Dean Rasheed
Date:
On Mon, 25 Nov 2024 at 19:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> > I didn't mean do it in all cases, I just meant the NullIfExpr case
> > identified here. My point was that instead of modifying the evaluation
> > code for EEOP_NULLIF to make it call
> > MakeExpandedObjectReadOnlyInternal(), it would be easier to insert a
> > EEOP_MAKE_READONLY step for the first argument of the EEOP_NULLIF
> > step.
>
> But then the NULLIF step would only have access to the R/O pointer,
> no?  We do want to pass on a R/W pointer to the output, if we got
> one, to handle cases like
>         fconsumer(NULLIF(fproducer(...), ...), ...)
> Admittedly that's a pretty edgy edge-case, but still we're leaving
> money on the table if we don't do it.  So I think we have to deal
> with the issue within NULLIF.
>

OK, that makes sense.

Regards,
Dean