Thread: Re: pgsql: Refactor dlopen() support
Peter Eisentraut <peter_e@gmx.net> writes: > Refactor dlopen() support Buildfarm member locust doesn't like this much. I've been able to reproduce the problem on an old Mac laptop running the same macOS release, viz 10.5.8. (Note that we're not seeing it on earlier or later releases, which is odd in itself.) According to my machine, the crash is happening here: #0 _PG_init () at plpy_main.c:98 98 *plpython_version_bitmask_ptr |= (1 << PY_MAJOR_VERSION); and the reason is that the rendezvous variable sometimes contains garbage. Most sessions correctly see it as initially zero, but sometimes it contains (gdb) p plpython_version_bitmask_ptr $1 = (int *) 0x1d and I've also seen (gdb) p plpython_version_bitmask_ptr $1 = (int *) 0x7f7f7f7f It's mostly repeatable but not completely so: the 0x1d case seems to come up every time through the plpython_do test, but I don't always see the 0x7f7f7f7f case. (Maybe that's a timing artifact? It takes a variable amount of time to recover from the first crash in plpython_do, so the rest of the plpython test run isn't exactly operating in uniform conditions.) No idea what's going on here, and I'm about out of steam for tonight. regards, tom lane
On 07/09/2018 08:30, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> Refactor dlopen() support > > Buildfarm member locust doesn't like this much. I've been able to > reproduce the problem on an old Mac laptop running the same macOS release, > viz 10.5.8. (Note that we're not seeing it on earlier or later releases, > which is odd in itself.) Nothing should have changed on macOS except that the intermediate functions pg_dl*() were replaced by direct calls to dl*(). Very strange. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 07/09/2018 08:30, Tom Lane wrote: >> Buildfarm member locust doesn't like this much. I've been able to >> reproduce the problem on an old Mac laptop running the same macOS release, >> viz 10.5.8. (Note that we're not seeing it on earlier or later releases, >> which is odd in itself.) > Nothing should have changed on macOS except that the intermediate > functions pg_dl*() were replaced by direct calls to dl*(). Very strange. Somehow or other, the changes you made in dfmgr.c's #include lines have made it so that find_rendezvous_variable's local "bool found" variable is actually of type _Bool (which is word-wide on these machines). However, hash_search thinks its output variable is of type pointer to "typedef char bool". The proximate cause of the observed failure is that find_rendezvous_variable sees "found" as true when it should not, and thus fails to zero out the variable's value. No time to look further right now, but there's something rotten about the way we're handling bool. regards, tom lane
On 07/09/2018 16:19, Tom Lane wrote: > Somehow or other, the changes you made in dfmgr.c's #include lines > have made it so that find_rendezvous_variable's local "bool found" > variable is actually of type _Bool (which is word-wide on these > machines). However, hash_search thinks its output variable is > of type pointer to "typedef char bool". The proximate cause of > the observed failure is that find_rendezvous_variable sees "found" > as true when it should not, and thus fails to zero out the variable's > value. Ah because dlfcn.h includes stdbool.h. Hmm. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 07/09/2018 16:19, Tom Lane wrote: >> Somehow or other, the changes you made in dfmgr.c's #include lines >> have made it so that find_rendezvous_variable's local "bool found" >> variable is actually of type _Bool (which is word-wide on these >> machines). > Ah because dlfcn.h includes stdbool.h. Hmm. Yeah, and that's still true as of current macOS, it seems. I can make the problem go away with the attached patch (borrowed from similar code in plperl.h). It's kind of grotty but I'm not sure there's a better way. regards, tom lane diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c index c2a2572..4a5cc7c 100644 *** a/src/backend/utils/fmgr/dfmgr.c --- b/src/backend/utils/fmgr/dfmgr.c *************** *** 18,24 **** --- 18,34 ---- #ifdef HAVE_DLOPEN #include <dlfcn.h> + + /* + * On macOS, <dlfcn.h> insists on including <stdbool.h>. If we're not + * using stdbool, undef bool to undo the damage. + */ + #ifndef USE_STDBOOL + #ifdef bool + #undef bool #endif + #endif + #endif /* HAVE_DLOPEN */ #include "fmgr.h" #include "lib/stringinfo.h"