Weird irreproducible behaviors in plpython tests - Mailing list pgsql-hackers

From Tom Lane
Subject Weird irreproducible behaviors in plpython tests
Date
Msg-id 10578.1460322657@sss.pgh.pa.us
Whole thread Raw
Responses Re: Weird irreproducible behaviors in plpython tests  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Buildfarm member tick failed today with what appears to be a bug
introduced by (or at least exposed by) the recent plpython ereport
patch.  Presumably the fact that it's failing and no other critters are
has to do with its use of -DCLOBBER_CACHE_ALWAYS and/or
-DRANDOMIZE_ALLOCATED_MEMORY.  However, I cannot reproduce the failure
by using those switches, even though tick's CentOS platform should be
pretty much identical to my RHEL6 machine.  Can anyone else reproduce it?

After failing at that, it occurred to me to try it under valgrind,
and kaboom!  I found a *different* bug, which has apparently been
there a long time.  (I say different because I don't see how this
one could produce tick's symptoms; it's a reference to already-freed
strings, but not an attempt to pfree one.)  I'll be fixing this one
shortly, but now we have another puzzle: why isn't buildfarm member
skink seeing the same failure?  It is running the plpython tests.
Can anyone else reproduce a failure by valgrind'ing the plpython
tests?  It looks here like

==00:00:00:29.670 24996== Invalid read of size 1
==00:00:00:29.670 24996==    at 0x4A07F92: strlen (mc_replace_strmem.c:403)
==00:00:00:29.670 24996==    by 0x826860: MemoryContextStrdup (mcxt.c:1158)
==00:00:00:29.670 24996==    by 0x800441: set_errdata_field (elog.c:1230)
==00:00:00:29.670 24996==    by 0x803EE8: err_generic_string (elog.c:1210)
==00:00:00:29.670 24996==    by 0xDFEF382: PLy_elog (plpy_elog.c:117)
==00:00:00:29.670 24996==    by 0xDFF049D: PLy_procedure_call (plpy_exec.c:1084)
==00:00:00:29.670 24996==    by 0xDFF1BBF: PLy_exec_function (plpy_exec.c:105)
==00:00:00:29.670 24996==    by 0xDFF25C6: plpython_inline_handler (plpy_main.c:345)
==00:00:00:29.670 24996==    by 0x809737: OidFunctionCall1Coll (fmgr.c:1596)
==00:00:00:29.670 24996==    by 0x70E97F: standard_ProcessUtility (utility.c:515)
==00:00:00:29.670 24996==    by 0x70A856: PortalRunUtility (pquery.c:1175)
==00:00:00:29.670 24996==    by 0x70B8FF: PortalRunMulti (pquery.c:1306)
==00:00:00:29.670 24996==  Address 0xefe2894 is 36 bytes inside a block of size 49 free'd
==00:00:00:29.670 24996==    at 0x4A06430: free (vg_replace_malloc.c:446)
==00:00:00:29.670 24996==    by 0x398AE90D5C: ??? (in /usr/lib64/libpython2.6.so.1.0)
==00:00:00:29.670 24996==    by 0x398AE79E3A: ??? (in /usr/lib64/libpython2.6.so.1.0)
==00:00:00:29.670 24996==    by 0x398AE5C586: ??? (in /usr/lib64/libpython2.6.so.1.0)
==00:00:00:29.670 24996==    by 0x398AE5C5BF: ??? (in /usr/lib64/libpython2.6.so.1.0)
==00:00:00:29.670 24996==    by 0x398AE9A704: ??? (in /usr/lib64/libpython2.6.so.1.0)
==00:00:00:29.670 24996==    by 0xDFEECFE: PLy_traceback (plpy_elog.c:358)
==00:00:00:29.670 24996==    by 0xDFEF21E: PLy_elog (plpy_elog.c:83)
==00:00:00:29.670 24996==    by 0xDFF049D: PLy_procedure_call (plpy_exec.c:1084)
==00:00:00:29.670 24996==    by 0xDFF1BBF: PLy_exec_function (plpy_exec.c:105)
==00:00:00:29.670 24996==    by 0xDFF25C6: plpython_inline_handler (plpy_main.c:345)
==00:00:00:29.670 24996==    by 0x809737: OidFunctionCall1Coll (fmgr.c:1596)

the core of the problem being that PLy_get_spi_error_data returns a
pointer to a string that is pointing into the "val" (a/k/a "v") Python
object, which PLy_traceback has carefully released the last refcount on.
So this coding should have failed immediately under any valgrind
testing.  The ereport patch perhaps gave us more scope for the error,
but for me, valgrind'ing the plpython tests fails similarly in 9.5.
So we should have noticed this long since.

I dislike bugs that are platform-dependent with no obvious
reason for it :-(
        regards, tom lane



pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: VS 2015 support in src/tools/msvc
Next
From: Andres Freund
Date:
Subject: Re: Weird irreproducible behaviors in plpython tests