Thread: codlin_month is up and complain - PL/Python crash

codlin_month is up and complain - PL/Python crash

From
Zdenek Kotala
Date:
I revived codlin_month and it falls during PL/Python test:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=codlin_moth&dt=2010-02-16%2015:09:05


TRAP: BadArgument("!(((context) != 0 && (((((Node*)((context)))->type) 
== T_AllocSetContext))))", File: "mcxt.c", Line: 641)

 feaf5005 _lwp_kill (1, 6, 80459c8, fea9bbde) + 15 fea9bbea raise    (6, 0, 8045a18, fea725aa) + 22 fea725ca abort
(8046670,8361f80, 8045a48, 8719ccf, 89021f0, 
 
89021e4) + f2 086d07c0 ExceptionalCondition (89021f0, 89021e4, 89021dc, 281) + 58 08719ccf MemoryContextSwitchTo
(89264ac,0, 0, 8045a7c) + 47 fec21990 PLy_spi_execute (0, 8b141cc, 80460f8, fe84abde) + 750 fe84ad6e PyCFunction_Call
(8b0ff6c,8b141cc, 0, fe8a8d92) + 19e fe8a91a0 call_function (80461bc, 1, 610f2d31, fe8a3206) + 41c fe8a6221
PyEval_EvalFrameEx(8b5798c, 0, 8b0cbdc, 0) + 3029 fe8a9310 fast_function (8b05144, 80462fc, 0, 0, 0, fe91c63c) + 108
fe8a8e72call_function (80462fc, 0, 80462d8, fe8a3206) + ee fe8a6221 PyEval_EvalFrameEx (8b576a4, 0, 8b0cbdc, 8b0cbdc) +
3029fe8a7cd0 PyEval_EvalCodeEx (8ab4770, 8b0cbdc, 8b0cbdc, 0, 0, 0) + 91c fe8a3102 PyEval_EvalCode (8ab4770, 8b0cbdc,
8b0cbdc,fec17831) + 32 fec1799c PLy_function_handler (8046980, 8b5d508, 8046880, fec1480f) + 17c fec14b92
plpython_call_handler(8046980, 8046bb0, 8046be8, 8323774) + 3aa 08324393 ExecEvalFunc (8a033b0, 8a0329c, 8a0390c,
8a039b8)+ e33 0832b1bc ExecProject (8a03920, 8046c6c, 2, 8977abc) + 834 08348785 ExecResult (8a03210, 8a03184, 0, 1) +
9d0831f66f ExecProcNode (8a03210, 1, 8a037ec, 8731314) + 227 0831a186 ExecutorRun (8a02d7c, 1, 0, 8719ad4) + 2de
084d7778PortalRun (898effc, 7fffffff, 1, 8977b38, 8977b38) + 450 084ceae9 exec_simple_query (8976984, 0, 80473b8,
84d5185)+ ba9 084d51a2 PostgresMain (2, 8973b4c, 897398c, 893d00c, 893d008, 
 
130d7661) + 7fa 0844aded BackendRun (898c3d0) + 1cd 084440f3 ServerLoop (1, 89561d4, 3, fea7bb7e, 5c54, feb83cd8) + 973
08443004PostmasterMain (3) + 119c 0837db12 main     (3, 8047b14, 8047b24, 80fa21f) + 1ea 080fa27d _start   (3, 8047be8,
8047fb0,8047fb0, 0, 8047c35) + 7d
 

It seems that problem is with compiler aggressive optimization. I change 
it to lower level and now it works fine. Interesting is that 
MemoryContext corruption only appears with PL/Python.
Zdenek


Re: codlin_month is up and complain - PL/Python crash

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> I revived codlin_month and it falls during PL/Python test:
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=codlin_moth&dt=2010-02-16%2015:09:05

All of the MemoryContextSwitchTo calls in plpython seem to be in
patterns like this:
   MemoryContext oldcontext;
   oldcontext = CurrentMemoryContext;   PG_TRY();   {       ... do something ...   }   PG_CATCH();   {
MemoryContextSwitchTo(oldcontext);

Since oldcontext is only set in the one place, it really shouldn't
require "volatile" decoration, but maybe it does.  Can you do some
testing to see if that would fix it?

(Of course, really plpython's bogus approach to error handling ought
to get thrown out and rewritten from scratch, but that's not happening
right now.)
        regards, tom lane


Re: codlin_month is up and complain - PL/Python crash

From
Peter Eisentraut
Date:
On ons, 2010-02-17 at 11:05 -0500, Tom Lane wrote:
> All of the MemoryContextSwitchTo calls in plpython seem to be in
> patterns like this:
> 
>     MemoryContext oldcontext;
> 
>     oldcontext = CurrentMemoryContext;
>     PG_TRY();
>     {
>         ... do something ...
>     }
>     PG_CATCH();
>     {
>         MemoryContextSwitchTo(oldcontext);
> 
> Since oldcontext is only set in the one place, it really shouldn't
> require "volatile" decoration, but maybe it does.

It is my understanding that local automatic variables may be clobbered
by [sig]longjmp unless they are marked volatile.  The PG_CATCH branch is
reached by means of a [sig]longjmp.  So that would mean that any
variable that you want to use both before the TRY and inside the CATCH
has to be volatile.




Re: codlin_month is up and complain - PL/Python crash

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On ons, 2010-02-17 at 11:05 -0500, Tom Lane wrote:
>> Since oldcontext is only set in the one place, it really shouldn't
>> require "volatile" decoration, but maybe it does.

> It is my understanding that local automatic variables may be clobbered
> by [sig]longjmp unless they are marked volatile.  The PG_CATCH branch is
> reached by means of a [sig]longjmp.  So that would mean that any
> variable that you want to use both before the TRY and inside the CATCH
> has to be volatile.

If the rule were quite that strict then we'd need many more "volatile"
markers than we have.  I believe the actual implementation issue is that
longjmp restores the register contents to what they were at the time of
the setjmp call, and thus a variable allocated in a register would get
restored to the value it had at entry to PG_TRY whereas a variable
allocated on the stack would still have an up-to-date value.  Now the
picture isn't quite that simple since a sufficiently smart compiler
might move the variable's value around within the routine.  But the
behavior gcc appears to exhibit is that it won't warn about variables
that are only assigned once before the PG_TRY is entered, and that seems
reasonable to me since such a variable ought to have the correct value
either way.

It might be interesting to modify these bits of code so that the
oldcontext variables are assigned only at declaration:
MemoryContext oldcontext = CurrentMemoryContext;
...PG_TRY();

and see if that makes the issue go away.
        regards, tom lane


Re: codlin_month is up and complain - PL/Python crash

From
Peter Eisentraut
Date:
On ons, 2010-02-17 at 11:26 -0500, Tom Lane wrote:
> But the behavior gcc appears to exhibit is that it won't warn about
> variables that are only assigned once before the PG_TRY is entered,
> and that seems reasonable to me since such a variable ought to have
> the correct value either way. 

FWIW, this is a Sun Studio build that is complaining here.



Re: codlin_month is up and complain - PL/Python crash

From
Zdenek Kotala
Date:
Dne 17.02.10 18:39, Peter Eisentraut napsal(a):
> On ons, 2010-02-17 at 11:26 -0500, Tom Lane wrote:
>> But the behavior gcc appears to exhibit is that it won't warn about
>> variables that are only assigned once before the PG_TRY is entered,
>> and that seems reasonable to me since such a variable ought to have
>> the correct value either way.
>
> FWIW, this is a Sun Studio build that is complaining here.
>

Yes It is SS12. I add volatile keyword and problem disappears. The code 
difference is following:


<     PLy_spi_execute+0x742:  83 ec 0c           subl   $0xc,%esp
<     PLy_spi_execute+0x745:  ff b5 b8 f9 ff ff  pushl  0xfffff9b8(%ebp)
<     PLy_spi_execute+0x74b:  e8 fc ff ff ff     call   MemoryContextSwitch

>     PLy_spi_execute+0x742:  8b 85 cc f9 ff ff  movl 
0xfffff9cc(%ebp),%eax>     PLy_spi_execute+0x748:  83 ec 0c           subl   $0xc,%esp>     PLy_spi_execute+0x74b:  50
              pushl  %eax>     PLy_spi_execute+0x74c:  e8 fc ff ff ff     call   MemoryContextSwitch
 

Good to mention that SS inline PLy_spi_execute_query inside 
PLy_spi_execute(), because it is only one caller.

Zdenek


Re: codlin_month is up and complain - PL/Python crash

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> Dne 17.02.10 18:39, Peter Eisentraut napsal(a):
>> FWIW, this is a Sun Studio build that is complaining here.

> Yes It is SS12. I add volatile keyword and problem disappears.

OK, I've applied that change in CVS.  Please change codlin_moth back to
the higher optimization setting.
        regards, tom lane