Thread: BUG #18070: Assertion failed when processing error from plpy's iterator

BUG #18070: Assertion failed when processing error from plpy's iterator

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18070
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 16beta3
Operating system:   Ubuntu 22.04
Description:

The following script:
CREATE EXTENSION plpython3u;

CREATE FUNCTION pyfunc() RETURNS SETOF text AS $$
  plan = plpy.prepare("SELECT * FROM pyfunc()", ["text"])
  for row in plpy.cursor(plan, ["w"]):
    yield row[0]
$$ LANGUAGE plpython3u;

SELECT pyfunc();

triggers the assertion failure:
TRAP: failed Assert("detail == NULL"), File: "plpy_elog.c", Line: 107, PID:
784729

Core was generated by `postgres: law regression [local] SELECT
                        '.
Program terminated with signal SIGABRT, Aborted.

warning: Section `.reg-xstate/784729' in core file too small.
#0  __pthread_kill_implementation (no_tid=0, signo=6,
threadid=140508500382656)
    at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6,
threadid=140508500382656)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140508500382656) at
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140508500382656, signo=signo@entry=6) at
./nptl/pthread_kill.c:89
#3  0x00007fcaae523476 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4  0x00007fcaae5097f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000055e42de8b7ab in ExceptionalCondition (conditionName=0x7fcaa360d2fc
"detail == NULL", 
    fileName=0x7fcaa360d2f0 "plpy_elog.c", lineNumber=107) at assert.c:66
#6  0x00007fcaa35fd766 in PLy_elog_impl (elevel=21, 
    fmt=0x7fcaa360d830 "error fetching next item from iterator") at
plpy_elog.c:107
#7  0x00007fcaa35ff443 in PLy_exec_function (fcinfo=0x55e432b63818,
proc=0x55e42f100b18)
    at plpy_exec.c:168
#8  0x00007fcaa36025df in plpython3_call_handler (fcinfo=0x55e432b63818) at
plpy_main.c:245
#9  0x000055e42d9f81e5 in ExecMakeTableFunctionResult
(setexpr=0x55e432b5fd60, econtext=0x55e432b5fc30, 
    argContext=0x55e432b63700, expectedDesc=0x55e432b5fe20,
randomAccess=true) at execSRF.c:235
#10 0x000055e42da1506e in FunctionNext (node=0x55e432b5fa20) at
nodeFunctionscan.c:95
#11 0x000055e42d9f9cc6 in ExecScanFetch (node=0x55e432b5fa20,
accessMtd=0x55e42da14fb8 <FunctionNext>, 
    recheckMtd=0x55e42da153bf <FunctionRecheck>) at execScan.c:132
#12 0x000055e42d9f9d3f in ExecScan (node=0x55e432b5fa20,
accessMtd=0x55e42da14fb8 <FunctionNext>, 
    recheckMtd=0x55e42da153bf <FunctionRecheck>) at execScan.c:181
#13 0x000055e42da15418 in ExecFunctionScan (pstate=0x55e432b5fa20) at
nodeFunctionscan.c:270
#14 0x000055e42d9f5689 in ExecProcNodeFirst (node=0x55e432b5fa20) at
execProcnode.c:464
#15 0x000055e42d9e8716 in ExecProcNode (node=0x55e432b5fa20)
    at ../../../src/include/executor/executor.h:273
#16 0x000055e42d9eb6ce in ExecutePlan (estate=0x55e432b5f7f8,
planstate=0x55e432b5fa20, 
    use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true,
numberTuples=1, 
    direction=ForwardScanDirection, dest=0x55e42e2190c0 <spi_printtupDR>,
execute_once=false)
    at execMain.c:1670
...

(gdb) frame 6
#6  0x00007fcaa35fd766 in PLy_elog_impl (elevel=21, 
    fmt=0x7fcaa360d830 "error fetching next item from iterator") at
plpy_elog.c:107
107                     Assert(detail == NULL);
(gdb) p xmsg
$1 = 0x55e432b68300 "spiexceptions.StatementTooComplex: error fetching next
item from iterator"
(gdb) p detail
$2 = 0x7fcaa2d50a70 "spiexceptions.StatementTooComplex: stack depth limit
exceeded"

Without asserts enabled, I get:
ERROR:  error fetching next item from iterator
DETAIL:  spiexceptions.StatementTooComplex: error fetching next item from
iterator
HINT:  Increase the configuration parameter "max_stack_depth" (currently
2048kB), after ensuring the platform's stack depth limit is adequate.
CONTEXT:  Traceback (most recent call last):
PL/Python function "pyfunc"

It looks like the following coding in PLy_elog_impl():
        /* Since we have a format string, we cannot have a SPI detail. */
        Assert(detail == NULL);

        /* If there's an exception message, it goes in the detail. */
        if (xmsg)
            detail = xmsg;

doesn't expect to process a call PLy_elog(LEVEL, "message") when some (SPI)
error occurred before the call.

Reproduced on REL_11_STABLE (with plpython2u) .. master.

Apparently, this coding exists since 2bd78eb8d.


Re: BUG #18070: Assertion failed when processing error from plpy's iterator

From
Alexander Lakhin
Date:
25.08.2023 17:00, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      18070
> ...
> It looks like the following coding in PLy_elog_impl():
>         /* Since we have a format string, we cannot have a SPI detail. */
>         Assert(detail == NULL);
>
>         /* If there's an exception message, it goes in the detail. */
>         if (xmsg)
>             detail = xmsg;
>
> doesn't expect to process a call PLy_elog(LEVEL, "message") when some (SPI)
> error occurred before the call.
>

Please look at the patch attached, which removes the Assert and changes
the detail message e.g. , for the query:
CREATE EXTENSION plpython3u;
CREATE TYPE tst AS (t text);
CREATE FUNCTION pyfunc() RETURNS SETOF text AS $$
   plan = plpy.prepare("SELECT * FROM tst", ["text"])
   for row in plpy.cursor(plan, ["w"]):
     yield row[0]
$$ LANGUAGE plpython3u;

SELECT pyfunc();

From:
ERROR:  error fetching next item from iterator
LINE 1: SELECT * FROM tst
                       ^
DETAIL:  spiexceptions.WrongObjectType: cannot open relation "tst"
QUERY:  SELECT * FROM tst
CONTEXT:  Traceback (most recent call last):
PL/Python function "pyfunc"

To:
ERROR:  error fetching next item from iterator
LINE 1: SELECT * FROM tst
                       ^
DETAIL:  This operation is not supported for composite types.
QUERY:  SELECT * FROM tst
CONTEXT:  Traceback (most recent call last):
PL/Python function "pyfunc"

Best regards,
Alexander
Attachment

Re: BUG #18070: Assertion failed when processing error from plpy's iterator

From
Michael Paquier
Date:
On Fri, Sep 15, 2023 at 08:00:00PM +0300, Alexander Lakhin wrote:
> -        /* Since we have a format string, we cannot have a SPI detail. */
> -        Assert(detail == NULL);
> -
> -        /* If there's an exception message, it goes in the detail. */
> -        if (xmsg)
> +        /*
> +         * When we have no detail from the error data, an exception message,
> +         * if not empty, goes in the detail.
> +         */
> +        if (detail == NULL && xmsg != NULL)
>              detail = xmsg;

Yeah, I think that you're right here to enforce the use of a DETAIL
message if it exists instead of the xmsg.  At least that feels more
natural because we would get the same error when attempting to select
from a composite type.

How about adding a regression test?  Say in plpython_error.sql where
we already check a bunch of error patterns?
--
Michael

Attachment

Re: BUG #18070: Assertion failed when processing error from plpy's iterator

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Fri, Sep 15, 2023 at 08:00:00PM +0300, Alexander Lakhin wrote:
>> +        /*
>> +         * When we have no detail from the error data, an exception message,
>> +         * if not empty, goes in the detail.
>> +         */
>> +        if (detail == NULL && xmsg != NULL)
>>         detail = xmsg;

> Yeah, I think that you're right here to enforce the use of a DETAIL
> message if it exists instead of the xmsg.

I dunno about that.  We're trying to shove three pounds of stuff into
a two-pound sack, which isn't going to work perfectly in any case,
but I'm not sure this is the right priority order for the available
strings.

In the original example with an infinite-recursion error, what we
end up with at the point of the Assert is

(gdb) i locals
save_errno = 2
xmsg = 0x666a8d8 "spiexceptions.StatementTooComplex: error fetching next item from iterator"
tbmsg = 0x666ace8 "Traceback (most recent call last):"
tb_depth = 1
emsg = {data = 0x666b0f8 "error fetching next item from iterator", len = 38,
  maxlen = 1024, cursor = 0}
exc = 0x18fd448
val = 0x7f1fefe39e88
tb = 0x7f1fef858608
primary = 0x666b0f8 "error fetching next item from iterator"
sqlerrcode = 16777477
detail = 0x7f1fefe21920 "spiexceptions.StatementTooComplex: stack depth limit exceeded"
hint = 0x7f1ffd5c6f78 "Increase the configuration parameter \"max_stack_depth\" (currently 2048kB), after ensuring the
platform'sstack depth limit is adequate." 
query = 0x0
position = 0
schema_name = 0x0
table_name = 0x0
column_name = 0x0
datatype_name = 0x0

Clearly, "detail" offers the key insight here and we don't want to
lose it; but xmsg seems rather duplicative of the fmt string
(which is what is reflected into emsg and primary here).

On the other hand, in Alexander's newer test case we find

(gdb) i locals
save_errno = 2
xmsg = 0x191ef50 "spiexceptions.WrongObjectType: cannot open relation \"tst\""
tbmsg = 0x191f360 "Traceback (most recent call last):"
tb_depth = 1
emsg = {data = 0x191f770 "error fetching next item from iterator", len = 38,
  maxlen = 1024, cursor = 0}
exc = 0x18f02f8
val = 0x7f1fefda7e88
tb = 0x7f1fefcfd3c8
primary = 0x191f770 "error fetching next item from iterator"
sqlerrcode = 151027844
detail = 0x7f1fefe1fa20 "This operation is not supported for composite types."
hint = 0x0
query = 0x7f1fefcfc3c0 "SELECT * FROM tst"
position = 15
schema_name = 0x0
table_name = 0x0
column_name = 0x0
datatype_name = 0x0

Here, xmsg seems strictly more useful than fmt, but the patch as
proposed would discard xmsg.

I wonder if we shouldn't simply say the rule is that xmsg, if set,
overrides fmt.  This'd likely have some impact on existing test
cases, but perhaps the net result would be no worse?

            regards, tom lane



Re: BUG #18070: Assertion failed when processing error from plpy's iterator

From
Alexander Lakhin
Date:
Hello,

16.09.2023 17:34, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
On Fri, Sep 15, 2023 at 08:00:00PM +0300, Alexander Lakhin wrote:
+        /*
+         * When we have no detail from the error data, an exception message,
+         * if not empty, goes in the detail.
+         */
+        if (detail == NULL && xmsg != NULL)		detail = xmsg;
Yeah, I think that you're right here to enforce the use of a DETAIL
message if it exists instead of the xmsg.
I dunno about that.  We're trying to shove three pounds of stuff into
a two-pound sack, which isn't going to work perfectly in any case,
but I'm not sure this is the right priority order for the available
strings.

Thanks for looking at this!

In the original example with an infinite-recursion error, what we
end up with at the point of the Assert is

(gdb) i locals
save_errno = 2
xmsg = 0x666a8d8 "spiexceptions.StatementTooComplex: error fetching next item from iterator"
tbmsg = 0x666ace8 "Traceback (most recent call last):"
tb_depth = 1
emsg = {data = 0x666b0f8 "error fetching next item from iterator", len = 38,   maxlen = 1024, cursor = 0}
exc = 0x18fd448
val = 0x7f1fefe39e88
tb = 0x7f1fef858608
primary = 0x666b0f8 "error fetching next item from iterator"
sqlerrcode = 16777477
detail = 0x7f1fefe21920 "spiexceptions.StatementTooComplex: stack depth limit exceeded"
hint = 0x7f1ffd5c6f78 "Increase the configuration parameter \"max_stack_depth\" (currently 2048kB), after ensuring the platform's stack depth limit is adequate."
query = 0x0
position = 0
schema_name = 0x0
table_name = 0x0
column_name = 0x0
datatype_name = 0x0

Clearly, "detail" offers the key insight here and we don't want to
lose it; but xmsg seems rather duplicative of the fmt string
(which is what is reflected into emsg and primary here).

On the other hand, in Alexander's newer test case we find

(gdb) i locals
save_errno = 2
xmsg = 0x191ef50 "spiexceptions.WrongObjectType: cannot open relation \"tst\""
tbmsg = 0x191f360 "Traceback (most recent call last):"
tb_depth = 1
emsg = {data = 0x191f770 "error fetching next item from iterator", len = 38,   maxlen = 1024, cursor = 0}
exc = 0x18f02f8
val = 0x7f1fefda7e88
tb = 0x7f1fefcfd3c8
primary = 0x191f770 "error fetching next item from iterator"
sqlerrcode = 151027844
detail = 0x7f1fefe1fa20 "This operation is not supported for composite types."
hint = 0x0
query = 0x7f1fefcfc3c0 "SELECT * FROM tst"
position = 15
schema_name = 0x0
table_name = 0x0
column_name = 0x0
datatype_name = 0x0

Here, xmsg seems strictly more useful than fmt, but the patch as
proposed would discard xmsg.

I wonder if we shouldn't simply say the rule is that xmsg, if set,
overrides fmt.  This'd likely have some impact on existing test
cases, but perhaps the net result would be no worse?

When looking for a representative query for a regression test, I've
also discovered that the result is not always appealing.

Yes, for the case "SELECT * FROM some_type" you mentioned, with the patched
version we get "detail" from the underlying exception:
DETAIL:  This operation is not supported for composite types.

But without the patch we get "message":
DETAIL:  spiexceptions.WrongObjectType: cannot open relation "tst"

In this case the difference is minor, but I would prefer to get a message
with the relation name.
I've tested another case, with to_date():
                ereturn(escontext, -1,
                                (errcode(ERRCODE_INVALID_DATETIME_FORMAT),
                                 errmsg("invalid value \"%s\" for \"%s\"",
                                                copy, node->key->name),
                                 errdetail("Value must be an integer.")));

With the patched version, the error produced by:
CREATE FUNCTION python_error_detail() RETURNS SETOF text AS $$
  plan = plpy.prepare("SELECT to_date('DD', 'DD') d")
  for row in plpy.cursor(plan):
    yield row['d']
$$ LANGUAGE plpython3u;

SELECT python_error_detail();

looks as follows:
 ERROR:  error fetching next item from iterator
 DETAIL:  Value must be an integer.
 CONTEXT:  Traceback (most recent call last):
 PL/Python function "python_error_detail"

I can't say that the detail info "Value must be an integer" is useful here.

So if we omit the exception message, while keeping the detail, we loose a
higher importance level of information in the stack (or a sack):
1) plpy_elog message
2) underlying exception message
3) underlying exception detail
4) underlying exception hint

I also thought about omitting the underlying exception detail, thus just
doing mapping:
"plpy_elog message" -> message,
"underlying exception message" -> detail,
"underlying exception hint" -> hint,
but it seems that presenting a hint without an intermediate level of
information (detail) could look weird in some cases too.

So my next idea was to concatenate "underlying exception message" +
"underlying exception detail" and to put the result in the detail.

It seems that it works pretty well, e.g. for the case with to_date(), we get:
SELECT python_error_detail();
ERROR:  error fetching next item from iterator
DETAIL:  spiexceptions.InvalidDatetimeFormat: invalid value "DD" for "DD"
Value must be an integer.
CONTEXT:  Traceback (most recent call last):
PL/Python function "python_error_detail"

(I've confirmed that all these strings are localized correctly.)

With the stack overflow case, the function PLy_elog() called recursively,
so the output looks like:
SELECT pyfunc();
ERROR:  error fetching next item from iterator
DETAIL:  spiexceptions.StatementTooComplex: error fetching next item from iterator
spiexceptions.StatementTooComplex: error fetching next item from iterator

...
spiexceptions.StatementTooComplex: error fetching next item from iterator
spiexceptions.StatementTooComplex: error fetching next item from iterator
spiexceptions.StatementTooComplex: stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth" (currently 2048kB), after ensuring the platform's stack depth limit is adequate.
CONTEXT:  Traceback (most recent call last):
PL/Python function "pyfunc"

This lengthy output is similar to what we get with a recursion in plperl,
so I don't consider this as an issue, but may be we could just check
the length of an error message before concatenation...

Please look at the v2 patch, which contains an addition to regression tests for the master.
If you agree with this approach, I'll make patches for back branches as well...

Best regards,
Alexander
Attachment

Re: BUG #18070: Assertion failed when processing error from plpy's iterator

From
Michael Paquier
Date:
On Sat, Sep 16, 2023 at 08:00:00PM +0300, Alexander Lakhin wrote:
> This lengthy output is similar to what we get with a recursion in plperl,
> so I don't consider this as an issue, but may be we could just check
> the length of an error message before concatenation...
>
> Please look at the v2 patch, which contains an addition to regression tests for the master.
> If you agree with this approach, I'll make patches for back branches as well...

+DETAIL:  spiexceptions.InvalidDatetimeFormat: invalid value "DD" for "DD"
+Value must be an integer.

That's the kind of things we've been doing for years in libpq.  So
even if there are no proper sentence terminations, that may be enough
for the user.

+        if (detail == NULL)
             detail = xmsg;
+        else if (xmsg != NULL) {
+            initStringInfo(&detailstr);
+            appendStringInfoString(&detailstr, xmsg);
+            appendStringInfoChar(&detailstr, '\n');
+            appendStringInfoString(&detailstr, detail);
+            detail = detailstr.data;
+        }

Perhaps it would be slightly cleaner to use a StringInfoData all the
time and rely on the presence of fmt to free the detail string?

Btw, backpatching a change in the error infrastructure does not seem a
good idea to me, as there could be applications that depend on the
existing behaviors?  It looks like we'd better just remove the
assertion in back-branches.
--
Michael

Attachment

Re: BUG #18070: Assertion failed when processing error from plpy's iterator

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> Btw, backpatching a change in the error infrastructure does not seem a
> good idea to me, as there could be applications that depend on the
> existing behaviors?  It looks like we'd better just remove the
> assertion in back-branches.

+1.  Not very sure what we want to change this to in HEAD, but
I wouldn't backpatch a behavioral change here.

            regards, tom lane



Re: BUG #18070: Assertion failed when processing error from plpy's iterator

From
Michael Paquier
Date:
On Sat, Sep 16, 2023 at 08:02:03PM -0400, Tom Lane wrote:
> +1.  Not very sure what we want to change this to in HEAD, but
> I wouldn't backpatch a behavioral change here.

Please find attached a proposal of patch for a backpatch, that applies
on HEAD.  The backpatch needs a few tweaks for alternate outputs, of
course, depending on the version of python.

Any thoughts?
--
Michael

Attachment

Re: BUG #18070: Assertion failed when processing error from plpy's iterator

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> Please find attached a proposal of patch for a backpatch, that applies
> on HEAD.  The backpatch needs a few tweaks for alternate outputs, of
> course, depending on the version of python.

Yeah, the alternate output files will be a bit of a pain, but
should be straightforward as long as you're paying attention.

My only suggestion is that

+  plan = plpy.prepare("SELECT to_date('DD', 'DD') d")
+...
+DETAIL:  spiexceptions.InvalidDatetimeFormat: invalid value "DD" for "DD"

seems unnecessarily ambiguous, and so confusing as to whether the error
message is correct.  Let's make the input string be something other
than 'DD' here.

            regards, tom lane



Re: BUG #18070: Assertion failed when processing error from plpy's iterator

From
Michael Paquier
Date:
On Sat, Sep 16, 2023 at 10:03:10PM -0400, Tom Lane wrote:
> seems unnecessarily ambiguous, and so confusing as to whether the error
> message is correct.  Let's make the input string be something other
> than 'DD' here.

Yep.  Good point to use a different string for the input.
--
Michael

Attachment

Re: BUG #18070: Assertion failed when processing error from plpy's iterator

From
Alexander Lakhin
Date:
17.09.2023 05:06, Michael Paquier wrote:
> On Sat, Sep 16, 2023 at 10:03:10PM -0400, Tom Lane wrote:
>> seems unnecessarily ambiguous, and so confusing as to whether the error
>> message is correct.  Let's make the input string be something other
>> than 'DD' here.
> Yep.  Good point to use a different string for the input.

Thanks for paying attention to it!

I agree with all the changes proposed.

I'm going to be off today, so I could prepare all the patches tomorrow,
or if you finish it, I would be very grateful.

Best regards,
Alexander



Re: BUG #18070: Assertion failed when processing error from plpy's iterator

From
Michael Paquier
Date:
On Sun, Sep 17, 2023 at 07:00:00AM +0300, Alexander Lakhin wrote:
> I'm going to be off today, so I could prepare all the patches tomorrow,
> or if you finish it, I would be very grateful.

Thanks.  It's a long weekend here as well, so I'll be able to do
anything serious only in 48 hours or so (aka being able to check on
the buildfarm).  If you have the room to send patches for the
back-branches until then, that would be super helpful, of course!
--
Michael

Attachment

Re: BUG #18070: Assertion failed when processing error from plpy's iterator

From
Alexander Lakhin
Date:
Hi,

17.09.2023 07:25, Michael Paquier wrote:
> Thanks.  It's a long weekend here as well, so I'll be able to do
> anything serious only in 48 hours or so (aka being able to check on
> the buildfarm).  If you have the room to send patches for the
> back-branches until then, that would be super helpful, of course!

Please look at the patches for backpatching (I think they meet our
agreements) and also the new main patch (which perhaps a step closer to
expectations).

Best regards,
Alexander
Attachment

Re: BUG #18070: Assertion failed when processing error from plpy's iterator

From
Michael Paquier
Date:
On Mon, Sep 18, 2023 at 12:00:00PM +0300, Alexander Lakhin wrote:
> Please look at the patches for backpatching (I think they meet our
> agreements) and also the new main patch (which perhaps a step closer to
> expectations).

Thanks for the patch set.  I'll look at the backpatch materials
tomorrow in more details!
--
Michael

Attachment

Re: BUG #18070: Assertion failed when processing error from plpy's iterator

From
Michael Paquier
Date:
On Mon, Sep 18, 2023 at 07:13:50PM +0900, Michael Paquier wrote:
> Thanks for the patch set.  I'll look at the backpatch materials
> tomorrow in more details!

Applied and backpatched.

I was wondering for a sec what's the business of the diffs between
plpython_error_5.out and plpython_error.out on 11 and 12, though.  We
make sure that _5.out uses plpython3u as language in the output
generated, but e748e902de has also proved that the buildfarm did not
flinch when the test broke the assumption behind the test diffs.  With
plpython_error_0.out gone in 13~ and out in one year, I am not
planning to stick my nose into this one and spend more time on it.
Just noted in passing..

The test added by this thread does not matter much, I guess, as the
error string is generated by plpython itself.
--
Michael

Attachment