Thread: codlin_month is up and complain - PL/Python crash
I revived codlin_month and it falls during PL/Python test: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=codlin_moth&dt=2010-02-16%2015:09:05 TRAP: BadArgument("!(((context) != 0 && (((((Node*)((context)))->type) == T_AllocSetContext))))", File: "mcxt.c", Line: 641) feaf5005 _lwp_kill (1, 6, 80459c8, fea9bbde) + 15 fea9bbea raise (6, 0, 8045a18, fea725aa) + 22 fea725ca abort (8046670,8361f80, 8045a48, 8719ccf, 89021f0, 89021e4) + f2 086d07c0 ExceptionalCondition (89021f0, 89021e4, 89021dc, 281) + 58 08719ccf MemoryContextSwitchTo (89264ac,0, 0, 8045a7c) + 47 fec21990 PLy_spi_execute (0, 8b141cc, 80460f8, fe84abde) + 750 fe84ad6e PyCFunction_Call (8b0ff6c,8b141cc, 0, fe8a8d92) + 19e fe8a91a0 call_function (80461bc, 1, 610f2d31, fe8a3206) + 41c fe8a6221 PyEval_EvalFrameEx(8b5798c, 0, 8b0cbdc, 0) + 3029 fe8a9310 fast_function (8b05144, 80462fc, 0, 0, 0, fe91c63c) + 108 fe8a8e72call_function (80462fc, 0, 80462d8, fe8a3206) + ee fe8a6221 PyEval_EvalFrameEx (8b576a4, 0, 8b0cbdc, 8b0cbdc) + 3029fe8a7cd0 PyEval_EvalCodeEx (8ab4770, 8b0cbdc, 8b0cbdc, 0, 0, 0) + 91c fe8a3102 PyEval_EvalCode (8ab4770, 8b0cbdc, 8b0cbdc,fec17831) + 32 fec1799c PLy_function_handler (8046980, 8b5d508, 8046880, fec1480f) + 17c fec14b92 plpython_call_handler(8046980, 8046bb0, 8046be8, 8323774) + 3aa 08324393 ExecEvalFunc (8a033b0, 8a0329c, 8a0390c, 8a039b8)+ e33 0832b1bc ExecProject (8a03920, 8046c6c, 2, 8977abc) + 834 08348785 ExecResult (8a03210, 8a03184, 0, 1) + 9d0831f66f ExecProcNode (8a03210, 1, 8a037ec, 8731314) + 227 0831a186 ExecutorRun (8a02d7c, 1, 0, 8719ad4) + 2de 084d7778PortalRun (898effc, 7fffffff, 1, 8977b38, 8977b38) + 450 084ceae9 exec_simple_query (8976984, 0, 80473b8, 84d5185)+ ba9 084d51a2 PostgresMain (2, 8973b4c, 897398c, 893d00c, 893d008, 130d7661) + 7fa 0844aded BackendRun (898c3d0) + 1cd 084440f3 ServerLoop (1, 89561d4, 3, fea7bb7e, 5c54, feb83cd8) + 973 08443004PostmasterMain (3) + 119c 0837db12 main (3, 8047b14, 8047b24, 80fa21f) + 1ea 080fa27d _start (3, 8047be8, 8047fb0,8047fb0, 0, 8047c35) + 7d It seems that problem is with compiler aggressive optimization. I change it to lower level and now it works fine. Interesting is that MemoryContext corruption only appears with PL/Python. Zdenek
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > I revived codlin_month and it falls during PL/Python test: > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=codlin_moth&dt=2010-02-16%2015:09:05 All of the MemoryContextSwitchTo calls in plpython seem to be in patterns like this: MemoryContext oldcontext; oldcontext = CurrentMemoryContext; PG_TRY(); { ... do something ... } PG_CATCH(); { MemoryContextSwitchTo(oldcontext); Since oldcontext is only set in the one place, it really shouldn't require "volatile" decoration, but maybe it does. Can you do some testing to see if that would fix it? (Of course, really plpython's bogus approach to error handling ought to get thrown out and rewritten from scratch, but that's not happening right now.) regards, tom lane
On ons, 2010-02-17 at 11:05 -0500, Tom Lane wrote: > All of the MemoryContextSwitchTo calls in plpython seem to be in > patterns like this: > > MemoryContext oldcontext; > > oldcontext = CurrentMemoryContext; > PG_TRY(); > { > ... do something ... > } > PG_CATCH(); > { > MemoryContextSwitchTo(oldcontext); > > Since oldcontext is only set in the one place, it really shouldn't > require "volatile" decoration, but maybe it does. It is my understanding that local automatic variables may be clobbered by [sig]longjmp unless they are marked volatile. The PG_CATCH branch is reached by means of a [sig]longjmp. So that would mean that any variable that you want to use both before the TRY and inside the CATCH has to be volatile.
Peter Eisentraut <peter_e@gmx.net> writes: > On ons, 2010-02-17 at 11:05 -0500, Tom Lane wrote: >> Since oldcontext is only set in the one place, it really shouldn't >> require "volatile" decoration, but maybe it does. > It is my understanding that local automatic variables may be clobbered > by [sig]longjmp unless they are marked volatile. The PG_CATCH branch is > reached by means of a [sig]longjmp. So that would mean that any > variable that you want to use both before the TRY and inside the CATCH > has to be volatile. If the rule were quite that strict then we'd need many more "volatile" markers than we have. I believe the actual implementation issue is that longjmp restores the register contents to what they were at the time of the setjmp call, and thus a variable allocated in a register would get restored to the value it had at entry to PG_TRY whereas a variable allocated on the stack would still have an up-to-date value. Now the picture isn't quite that simple since a sufficiently smart compiler might move the variable's value around within the routine. But the behavior gcc appears to exhibit is that it won't warn about variables that are only assigned once before the PG_TRY is entered, and that seems reasonable to me since such a variable ought to have the correct value either way. It might be interesting to modify these bits of code so that the oldcontext variables are assigned only at declaration: MemoryContext oldcontext = CurrentMemoryContext; ...PG_TRY(); and see if that makes the issue go away. regards, tom lane
On ons, 2010-02-17 at 11:26 -0500, Tom Lane wrote: > But the behavior gcc appears to exhibit is that it won't warn about > variables that are only assigned once before the PG_TRY is entered, > and that seems reasonable to me since such a variable ought to have > the correct value either way. FWIW, this is a Sun Studio build that is complaining here.
Dne 17.02.10 18:39, Peter Eisentraut napsal(a): > On ons, 2010-02-17 at 11:26 -0500, Tom Lane wrote: >> But the behavior gcc appears to exhibit is that it won't warn about >> variables that are only assigned once before the PG_TRY is entered, >> and that seems reasonable to me since such a variable ought to have >> the correct value either way. > > FWIW, this is a Sun Studio build that is complaining here. > Yes It is SS12. I add volatile keyword and problem disappears. The code difference is following: < PLy_spi_execute+0x742: 83 ec 0c subl $0xc,%esp < PLy_spi_execute+0x745: ff b5 b8 f9 ff ff pushl 0xfffff9b8(%ebp) < PLy_spi_execute+0x74b: e8 fc ff ff ff call MemoryContextSwitch > PLy_spi_execute+0x742: 8b 85 cc f9 ff ff movl 0xfffff9cc(%ebp),%eax> PLy_spi_execute+0x748: 83 ec 0c subl $0xc,%esp> PLy_spi_execute+0x74b: 50 pushl %eax> PLy_spi_execute+0x74c: e8 fc ff ff ff call MemoryContextSwitch Good to mention that SS inline PLy_spi_execute_query inside PLy_spi_execute(), because it is only one caller. Zdenek
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes: > Dne 17.02.10 18:39, Peter Eisentraut napsal(a): >> FWIW, this is a Sun Studio build that is complaining here. > Yes It is SS12. I add volatile keyword and problem disappears. OK, I've applied that change in CVS. Please change codlin_moth back to the higher optimization setting. regards, tom lane