Thread: PLTCL return_null crash...

PLTCL return_null crash...

From
"Ian Harding"
Date:
This is so odd.  I have used return_null with no problems, but now it crashes stuff all over the place.

Yes, I am the guy who hacked his pltcl.c, but only on one machine.  This seems to crash on all 3 of them.

PostgreSQL 7.2.1 on i386--netbsdelf, compiled by GCC egcs-1.1.2

bash-2.05$ createlang 'pltcl' test;
bash-2.05$ psql test
Welcome to psql, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

test=# create function crash () returns int as '
test'# return_null
test'# ' language 'pltcl';
CREATE
test=# select crash();
ERROR:  AllocSetFree: cannot find block containing chunk 0xbfbfcc48
test=#

Well, crash may be too harsh a term in this simple example, in others, however, it not only brings down my database
(withouta core file?) it kills the webserver!   

I can accept the notion that I have probably caused this, but I don't know how!  These computers are geographically
separated,and the one above is stone stock, no changes at all to anything (except NetBSD package stuff...) 

Does the above exhibit any similar wierdness on anyone else's 7.2.1?

Ian A. Harding
Programmer/Analyst II
Tacoma-Pierce County Health Department
(253) 798-3549
iharding@tpchd.org

WWSD - What Would Scooby Doo?


Re: PLTCL return_null crash...

From
Tom Lane
Date:
"Ian Harding" <ianh@tpchd.org> writes:
> test=# create function crash () returns int as '
> test'# return_null
> test'# ' language 'pltcl';
> CREATE
> test=# select crash();
> ERROR:  AllocSetFree: cannot find block containing chunk 0xbfbfcc48
> test=#

Hmm.  WorksForMe (on both 7.2.3 and CVS tip) ...

regression=# select crash();
 crash
-------

(1 row)

regression=# select crash() is null;
 ?column?
----------
 t
(1 row)

regression=#


A stack backtrace from the elog() call might prove enlightening.

            regards, tom lane

Re: PLTCL return_null crash...

From
"Nigel J. Andrews"
Date:
On Mon, 7 Oct 2002, Ian Harding wrote:
> [deleted]
>
> test=# create function crash () returns int as '
> test'# return_null
> test'# ' language 'pltcl';
> CREATE
> test=# select crash();
> ERROR:  AllocSetFree: cannot find block containing chunk 0xbfbfcc48
> test=#
>
> [deleted]
>
> Does the above exhibit any similar wierdness on anyone else's 7.2.1?
>

Crashes for me too. Not under 7.3 though so something changed somewhere,
somehow :)


--
Nigel J. Andrews


Re: PLTCL return_null crash...

From
"Nigel J. Andrews"
Date:
On Mon, 7 Oct 2002, Tom Lane wrote:

> "Ian Harding" <ianh@tpchd.org> writes:
> > test=# create function crash () returns int as '
> > test'# return_null
> > test'# ' language 'pltcl';
> > CREATE
> > test=# select crash();
> > ERROR:  AllocSetFree: cannot find block containing chunk 0xbfbfcc48
> > test=#
>
> Hmm.  WorksForMe (on both 7.2.3 and CVS tip) ...
>
> regression=# select crash();
>  crash
> -------
>
> (1 row)
>
> regression=# select crash() is null;
>  ?column?
> ----------
>  t
> (1 row)
>
> regression=#
>
>
> A stack backtrace from the elog() call might prove enlightening.

Here's one from my system.

Program received signal SIGSEGV, Segmentation fault.
0x8156df7 in pfree ()
(gdb) bt
#0  0x8156df7 in pfree ()
#1  0x4001611c in ?? () from /usr/local/stow/pgsql-7.2.1/lib/pltcl.so
#2  0x40015c37 in ?? () from /usr/local/stow/pgsql-7.2.1/lib/pltcl.so
#3  0x80c4d1d in ExecMakeFunctionResult ()
#4  0x80c4dda in ExecEvalFunc ()
#5  0x80c5310 in ExecEvalExpr ()
#6  0x80c55e9 in ExecTargetList ()
#7  0x80c587b in ExecProject ()
#8  0x80cb073 in ExecResult ()
#9  0x80c3d79 in ExecProcNode ()
#10 0x80c2d5e in ExecutePlan ()
#11 0x80c23f7 in ExecutorRun ()
#12 0x810f935 in ProcessQuery ()
#13 0x810e1e0 in pg_exec_query_string ()
#14 0x810f1be in PostgresMain ()
#15 0x80f631e in DoBackend ()
#16 0x80f5c6f in BackendStartup ()
#17 0x80f4e8c in ServerLoop ()
#18 0x80f4a0b in PostmasterMain ()
#19 0x80d44a5 in main ()
#20 0x400e6a42 in __libc_start_main () from /lib/libc.so.6
(gdb)

So we can see it's in pltcl but without debugging turned on it's a little
difficult to tell where.

Presumably the fault was removed between 1.48 and 1.49 of src/pl/tcl/pltcl.c


--
Nigel J. Andrews


Re: PLTCL return_null crash...

From
Tom Lane
Date:
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
> Presumably the fault was removed between 1.48 and 1.49 of src/pl/tcl/pltcl.c

But 1.49 is in 7.2.1, which you said you're using?

            regards, tom lane

Re: PLTCL return_null crash...

From
Joe Conway
Date:
Tom Lane wrote:
> "Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
>
>>Presumably the fault was removed between 1.48 and 1.49 of src/pl/tcl/pltcl.c
>
>
> But 1.49 is in 7.2.1, which you said you're using?
>

It crashes for me under 7.2.2 and 7.2.3 (but not in 7.3b2). The odd thing is,
even though I compiled --enable-debug, pltcl.so still seems to lack debug symbols:

#0  0x08166774 in pfree (pointer=0x8397450) at mcxt.c:448
#1  0x40028033 in pltcl_func_handler () from /usr/lib/pgsql/pltcl.so
#2  0x40027b8b in pltcl_call_handler () from /usr/lib/pgsql/pltcl.so
#3  0x080c96e0 in ExecMakeFunctionResult (fcache=0x8384728, arguments=0x0,
econtext=0x8384470, isNull=0xbfffebaf "",
     isDone=0xbfffebb0) at execQual.c:825

I tried putting a break in pltcl_func_handler, but here's what I get:

Breakpoint 1, 0x40027bea in pltcl_func_handler () from /usr/lib/pgsql/pltcl.so
(gdb) step
Single stepping until exit from function pltcl_func_handler,
which has no line number information.

Any idea wht I can't step through this? In any case, the problem seems to be
in this section of code:

<snip>
if (SPI_finish() != SPI_OK_FINISH)
   elog(ERROR, "pltcl: SPI_finish() failed");

UTF_BEGIN;
if (fcinfo->isnull)
   retval = (Datum) 0;
else
   retval = FunctionCall3(&prodesc->result_in_func,
                          PointerGetDatum(UTF_U2E(interp->result)),
                          ObjectIdGetDatum(prodesc->result_in_elem),
                          Int32GetDatum(-1));
UTF_END;
</snip>

where:

#define UTF_BEGIN  do { \
   unsigned char *_pltcl_utf_src; \
   unsigned char *_pltcl_utf_dst

#define UTF_END    if (_pltcl_utf_src!=_pltcl_utf_dst) \
   pfree(_pltcl_utf_dst); } while (0)

I was able to step into, and out of, SPI_finish(). The pfree(_pltcl_utf_dst)
seems to be where it is failing.

Joe


Re: PLTCL return_null crash...

From
"Nigel J. Andrews"
Date:
On Mon, 7 Oct 2002, Tom Lane wrote:

> "Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
> > Presumably the fault was removed between 1.48 and 1.49 of src/pl/tcl/pltcl.c
>
> But 1.49 is in 7.2.1, which you said you're using?

Ok, I miss understood the labeling.


--
Nigel J. Andrews


Re: PLTCL return_null crash...

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> Any idea wht I can't step through this? In any case, the problem seems to be
> in this section of code:

> <snip>
> if (SPI_finish() != SPI_OK_FINISH)
>    elog(ERROR, "pltcl: SPI_finish() failed");

> UTF_BEGIN;
> if (fcinfo->isnull)
>    retval = (Datum) 0;
> else
>    retval = FunctionCall3(&prodesc->result_in_func,
>                           PointerGetDatum(UTF_U2E(interp->result)),
>                           ObjectIdGetDatum(prodesc->result_in_elem),
>                           Int32GetDatum(-1));
> UTF_END;
> </snip>

Oh, but of course: if you are returning NULL then this sequence fails
because it pfrees an uninitialized pointer.  It's fixed in CVS tip,
where the sequence reads like

    if (SPI_finish() != SPI_OK_FINISH)
        elog(ERROR, "pltcl: SPI_finish() failed");

    if (fcinfo->isnull)
        retval = (Datum) 0;
    else
    {
        UTF_BEGIN;
        retval = FunctionCall3(&prodesc->result_in_func,
                               PointerGetDatum(UTF_U2E(interp->result)),
                               ObjectIdGetDatum(prodesc->result_in_elem),
                               Int32GetDatum(-1));
        UTF_END;
    }

The reason I failed to duplicate it here was I didn't compile with
--enable-multibyte.  The bug is definitely still there in 7.2.3 if
you use multibyte.

            regards, tom lane