Thread: RTLD_GLOBAL (& JIT inlining)
Hi Peter, All, First question: Why do we currently use RTLD_GLOBAL loading extension libraries, but take pains ([1]) to make things work without RTLD_GLOBAL. It seems like it'd be both safer to RTLD_LOCAL on platforms supporting it (or equivalent), as well as less error-prone because we'd be less likely to depend on it like we did e.g. during transforms development. It seems error prone because on systmes I'm aware of (IOW linux ;)) conflicting symbols will not cause errors. Neither when there's a conflict between a .so and the main binary, nor between different shlibs. My reading of glibc's ld is that the search order for unresolved references in a .so is main binary, and then other RTLD_GLOBAL shared libraries in order of time they're loaded. Right now it appears that e.g. hstore_plperl has some issue, building with RTLD_LOCAL makes its test fail. Which means it'll probably not work on some platforms. I think using RTLD_LOCAL on most machines would be a much better idea. I've not found proper explanations why GLOBAL is used. We started using it ages ago, with [2], but that commit contains no explanation, and a quick search didn't show up anything either. Peter? Secondly: For JITing internal & C operators, and other functions referenced in JITed code, can be inlined if the definition is available in a suitable form. The details how that works don't matter much here. For my jitting work I've so far treated symbols of binaries and all referenced extensions as being part of a single namespace. Currently the order in which symbols with multiple definitions were looked up was in essentially in filesystem order. While I'm not aware of any observable problems, reviewing that decision I think that's a terrible idea. First and foremost it seems obviously wrong to not load symbols from the main postgres binary first. But leaving that aside, I'm uncomfortable doing inlining between shlibs for JITing, because that can lead to symbols beeing reloaded differently between non JITed code (where all external libraries loaded in the current library are searched in load/usage order) and JITed code (where the symbols would be resolved according to some static order). I'm about to rewrite it so that functions are inlined just from the main binary, or the basename of the shared library explicitly referenced in the function definition. That's still different, but at least easy to understand. But that still means there's inconsistent order between different methods of execution, which isn't great. Comments? Greetings, Andres Freund [1] http://archives.postgresql.org/message-id/2652.1475512158%40sss.pgh.pa.us [2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c87bc779d4e9f109e92f8b8f1dfad5d6739f8e97
Hi, On 2018-02-22 23:11:07 -0800, Andres Freund wrote: > I think using RTLD_LOCAL on most machines would be a much better > idea. I've not found proper explanations why GLOBAL is used. We started > using it ages ago, with [2], but that commit contains no explanation, > and a quick search didn't show up anything either. Peter? Hm. No idea if that was the reason back then, but I think it's unfortunately not easy to change. The reason why some plperlu tests fail, is that plperl extension libraries rely on plperl to be loaded into the global namespace. When plperl gets loaded with RTLD_LOCAL all the dependant shared libaries (i.e. libperl.so) also get loaded with it. That works perfectly for plperl itself, but once per loads an extension library is loaded, it fails to resolve libperl.so symbols... I can "fix" this by dlopen(RTLD_GLOBAL) libperl.so, but that's obviously not a practial solution. FWIW, something in plperl's error handling appears to be busted. Instead of a helpful error report we get: ERROR: 22021: invalid byte sequence for encoding "UTF8": 0x00 which doesn't help much to nail down the issue. With a breakpoint it becomes clear what the issue is: #0 report_invalid_encoding (encoding=6, mbstr=0x557ae880416b "", len=252) at /home/andres/src/postgresql/src/backend/utils/mb/wchar.c:1997 #1 0x0000557ae73d4473 in pg_verify_mbstr_len (encoding=6, mbstr=0x557ae880416b "", len=252, noError=0 '\000') at /home/andres/src/postgresql/src/backend/utils/mb/wchar.c:1936 #2 0x0000557ae73d4354 in pg_verify_mbstr (encoding=6, mbstr=0x557ae8804080 "Can't load '/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so' for module List::Util:/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so: undefined symbol: PL_memory_wrap at /usr/share/perl/5.26/XSLoader.pmline 96.\n", len=487, noError=0 '\000') at /home/andres/src/postgresql/src/backend/utils/mb/wchar.c:1879 #3 0x0000557ae73d119a in pg_any_to_server ( s=0x557ae8804080 "Can't load '/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so' for module List::Util: /usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so:undefined symbol: PL_memory_wrap at /usr/share/perl/5.26/XSLoader.pmline 96.\n", len=487, encoding=6) at /home/andres/src/postgresql/src/backend/utils/mb/mbutils.c:572 #4 0x00007f4cabb6b123 in utf_u2e ( utf8_str=0x557ae8804080 "Can't load '/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so' for module List::Util:/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so: undefined symbol: PL_memory_wrap at /usr/share/perl/5.26/XSLoader.pmline 96.\n", len=487) at /home/andres/src/postgresql/src/pl/plperl/plperl_helpers.h:16 #5 0x00007f4cabb6b2f6 in sv2cstr (sv=0x557ae8769568) at /home/andres/src/postgresql/src/pl/plperl/plperl_helpers.h:96 #6 0x00007f4cabb732dc in plperl_call_perl_func (desc=0x557ae86cf6b0, fcinfo=0x557ae875fc20) at /home/andres/src/postgresql/src/pl/plperl/plperl.c:2250 #7 0x00007f4cabb74a3b in plperl_func_handler (fcinfo=0x557ae875fc20) at /home/andres/src/postgresql/src/pl/plperl/plperl.c:2438 I don't now perl apis at all, but it's clearly wrong that fram 3,4 show len=487, which comes from SvPVutf8() in sv2cstr(). It appears that somehow perl throws out two error messages separated by a null byte... (gdb) p *(char[487]*)utf8_str $42 = "Can't load '/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so' for module List::Util: /usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so:undefined symbol: PL_memory_wrap at /usr/share/perl/5.26/XSLoader.pmline 96.\n\000 at /usr/lib/x86_64-linux-gnu/perl5/5.26/List/Util.pm line 23.\nCompilationfailed in require at /usr/lib/x86_64-linux-gnu/perl5/5.26/Scalar/Util.pm line 23.\nCompilation failed inrequire at /usr/lib/x86_64-linux-gnu/perl/5.26/Data/Dumper.pm line 303.\n" Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > First question: > Why do we currently use RTLD_GLOBAL loading extension libraries, but > take pains ([1]) to make things work without RTLD_GLOBAL. As mentioned in that very message, the point was better missing-symbol error detection, not RTLD_GLOBAL/LOCAL per se. > I think using RTLD_LOCAL on most machines would be a much better > idea. I've not found proper explanations why GLOBAL is used. We started > using it ages ago, with [2], but that commit contains no explanation, > and a quick search didn't show up anything either. Peter? https://www.postgresql.org/message-id/7142.1277772335@sss.pgh.pa.us My position is the same as then: I'm happy to remove it if it doesn't break things anywhere ... but it seems like it would cause problems for plpython, unless their behavior has changed since 2001 which is surely possible. regards, tom lane
On 2018-02-23 11:05:23 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I think using RTLD_LOCAL on most machines would be a much better > > idea. I've not found proper explanations why GLOBAL is used. We started > > using it ages ago, with [2], but that commit contains no explanation, > > and a quick search didn't show up anything either. Peter? > > https://www.postgresql.org/message-id/7142.1277772335@sss.pgh.pa.us > > My position is the same as then: I'm happy to remove it if it doesn't > break things anywhere ... but it seems like it would cause problems for > plpython, unless their behavior has changed since 2001 which is > surely possible. It hasn't, and it's not just plpython, at least plperl is affected as well. I'd guess most libraries that have their own dynamic loading support are affected. So RTLD_LOCAL is out of the question, but I think we can get a good bit of the benefit by either specifying -Wl,-Bsymbolic at shlib build time, or RTLD_DEEPBIND at dlopen() time. Either leads to the opened shared library effectively being put at the beginning of the search path, therefore avoiding the issue that an earlier loaded shared library or symbols from the main binary can accidentally overwrite things in the shared library itself. Which incidentally also makes loading a bit faster. A bit of googling suggests -Bsymbolic is likely to be more portable. A quick test confirms that under linux our tests pass with it, after patching Makefile.shlib. Greetings, Andres Freund
On Mon, Feb 26, 2018 at 11:28 PM, Andres Freund <andres@anarazel.de> wrote: > So RTLD_LOCAL is out of the question, but I think we can get a good bit > of the benefit by either specifying -Wl,-Bsymbolic at shlib build time, > or RTLD_DEEPBIND at dlopen() time. Either leads to the opened shared > library effectively being put at the beginning of the search path, > therefore avoiding the issue that an earlier loaded shared library or > symbols from the main binary can accidentally overwrite things in the > shared library itself. Which incidentally also makes loading a bit > faster. I think this would also fix oracle_fdw crashing when postgres is compiled with --with-ldap. At least RTLD_DEEPBIND helped. [1] [1] https://www.postgresql.org/message-id/CA%2BCSw_tPDYgnzCYW0S4oU0mTUoUhZ9pc7MRBPXVD-3Zbiwni9w%40mail.gmail.com Ants Aasma
Hello. At Mon, 26 Feb 2018 23:50:41 +0200, Ants Aasma <ants.aasma@eesti.ee> wrote in <CA+CSw_uD6dZdg9BbOLCx+oY0rufsRSp5V5W8teonCB=OK=74aw@mail.gmail.com> > On Mon, Feb 26, 2018 at 11:28 PM, Andres Freund <andres@anarazel.de> wrote: > > So RTLD_LOCAL is out of the question, but I think we can get a good bit > > of the benefit by either specifying -Wl,-Bsymbolic at shlib build time, > > or RTLD_DEEPBIND at dlopen() time. Either leads to the opened shared > > library effectively being put at the beginning of the search path, > > therefore avoiding the issue that an earlier loaded shared library or > > symbols from the main binary can accidentally overwrite things in the > > shared library itself. Which incidentally also makes loading a bit > > faster. > > I think this would also fix oracle_fdw crashing when postgres is > compiled with --with-ldap. At least RTLD_DEEPBIND helped. [1] > > [1] https://www.postgresql.org/message-id/CA%2BCSw_tPDYgnzCYW0S4oU0mTUoUhZ9pc7MRBPXVD-3Zbiwni9w%40mail.gmail.com I saw several cases of the misbinding specifically among extensions using different versions of the same library or a part of which was just transplanted from another extension. Now I'm seeing a case nearby again and found this thread. Some pl-libraries (plpython does but plperl doesn't for me) mandates to export their symbols for external modules. So RTLD_GLOBAL is needed for such extensions but not needed in most cases. We need a means to instruct whether a module wants to expose symbols or not. The extension control file doesn't seem the place. The most appropriate way seems to be another magic data. With the attached patch, external modules are loaded RTLD_LOCAL by default. PG_MODULE_EXPORT_SYMBOL in a module source files let it loaded RTLD_GLOBAL. I suppose this is the change with the least impact on existing modules because I believe most of them don't need to export symbols. I think this works for all platforms that have dlopen or shl_load. Windows doesn't the equivalent but anyway we should explicitly declare using dllexport. Any opinions or thoughts? regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 114692655060bf74d01e0f5452e89f3bd332dfa1 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Thu, 24 Jan 2019 13:35:19 +0900 Subject: [PATCH] Add control on wheter symbols in external module is exported We load exteral modlues with RTLD_GLOBAL, which is problematic when two or more modules share the same symbol. On the other hand some modules mandates to export its symbols. This patch enables modules to decide whether to export them or not. They is defaultly hidden. plpython module is modified along with as it needs to export symbols. --- doc/src/sgml/xfunc.sgml | 8 ++++++++ src/backend/utils/fmgr/dfmgr.c | 13 ++++++++++++- src/include/fmgr.h | 8 ++++++++ src/pl/plpython/plpy_main.c | 1 + 4 files changed, 29 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml index e18272c33a..9a65e76eda 100644 --- a/doc/src/sgml/xfunc.sgml +++ b/doc/src/sgml/xfunc.sgml @@ -1875,6 +1875,14 @@ PG_MODULE_MAGIC; (Presently, unloads are disabled and will never occur, but this may change in the future.) </para> + <para> + The dynamic loader loads a file as its symbols are hidden from external + modules by default. A file is loaded exporting the symbols by writhing + this in one of the module source files. +<programlisting> +PG_MODULE_EXPORT_SYMBOL; +</programlisting> + </para> </sect2> diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c index 456297a531..98ac1cc438 100644 --- a/src/backend/utils/fmgr/dfmgr.c +++ b/src/backend/utils/fmgr/dfmgr.c @@ -236,7 +236,7 @@ internal_load_library(const char *libname) #endif file_scanner->next = NULL; - file_scanner->handle = dlopen(file_scanner->filename, RTLD_NOW | RTLD_GLOBAL); + file_scanner->handle = dlopen(file_scanner->filename, RTLD_NOW | RTLD_LOCAL); if (file_scanner->handle == NULL) { load_error = dlerror(); @@ -281,6 +281,17 @@ internal_load_library(const char *libname) errhint("Extension libraries are required to use the PG_MODULE_MAGIC macro."))); } + /* Check if the module wants to export symbols */ + if (dlsym(file_scanner->handle, PG_MODULE_EXPORT_SYMBOL_NAME_STRING)) + { + elog(DEBUG3, + "loaded dynamic-link library \"%s\" with RTLD_GLOBAL", libname); + /* want to export, reload it the way */ + dlclose(file_scanner->handle); + file_scanner->handle = dlopen(file_scanner->filename, + RTLD_NOW | RTLD_GLOBAL); + } + /* * If the library has a _PG_init() function, call it. */ diff --git a/src/include/fmgr.h b/src/include/fmgr.h index ead17f0e44..c072850646 100644 --- a/src/include/fmgr.h +++ b/src/include/fmgr.h @@ -456,6 +456,14 @@ PG_MAGIC_FUNCTION_NAME(void) \ } \ extern int no_such_variable +#define PG_MODULE_EXPORT_SYMBOL_NAME Pg_module_exrpot_symbols +#define PG_MODULE_EXPORT_SYMBOL_NAME_STRING "Pg_module_exrpot_symbols" + +#define PG_MODULE_EXPORT_SYMBOL \ +extern PGDLLEXPORT void PG_MODULE_EXPORT_SYMBOL_NAME(void); \ +extern PGDLLEXPORT void PG_MODULE_EXPORT_SYMBOL_NAME(void){} + + /*------------------------------------------------------------------------- * Support routines and macros for callers of fmgr-compatible functions diff --git a/src/pl/plpython/plpy_main.c b/src/pl/plpython/plpy_main.c index 6a66eba176..9165b5f455 100644 --- a/src/pl/plpython/plpy_main.c +++ b/src/pl/plpython/plpy_main.c @@ -42,6 +42,7 @@ extern void _PG_init(void); PG_MODULE_MAGIC; +PG_MODULE_EXPORT_SYMBOL; PG_FUNCTION_INFO_V1(plpython_validator); PG_FUNCTION_INFO_V1(plpython_call_handler); -- 2.16.3
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > With the attached patch, external modules are loaded RTLD_LOCAL > by default. PG_MODULE_EXPORT_SYMBOL in a module source files let > it loaded RTLD_GLOBAL. I suppose this is the change with the > least impact on existing modules because I believe most of them > don't need to export symbols. This seems really hacky :-( In the PL cases, at least, it's not so much the PL itself that wants to export symbols as the underlying language library (libpython, libperl, etc). You're assuming that an action on the PL .so will propagate to the underlying language .so, which seems like a pretty shaky assumption. I wonder also whether closing and reopening the DL handle will really do anything at all for already-loaded libraries ... regards, tom lane
At Thu, 24 Jan 2019 01:02:14 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <24744.1548309734@sss.pgh.pa.us> > Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > > With the attached patch, external modules are loaded RTLD_LOCAL > > by default. PG_MODULE_EXPORT_SYMBOL in a module source files let > > it loaded RTLD_GLOBAL. I suppose this is the change with the > > least impact on existing modules because I believe most of them > > don't need to export symbols. > > This seems really hacky :-( > > In the PL cases, at least, it's not so much the PL itself that > wants to export symbols as the underlying language library > (libpython, libperl, etc). You're assuming that an action on > the PL .so will propagate to the underlying language .so, which (If I understand you correctly) I'm not assuming that the propagation happens or even the contray. As can be seen from it actually works, symbols of underlying libraries referred from pl*.so are resoved using dependency list in pl*.so itself independently of the RTLD flags given for it. The problem here is while underlying language libraries cannot ferer symbols back in the pl*.so when it is loaded with RTLD_LOCAL, RTLD_GLOBAL makes some extensions unhappy. I think that the assumption that RTLD_GLOBAL resolves all was proved rather shaky. RTLD_DEEPBIND is not available on some platforms so we need to choose between LOCAL and GLOBAL per loading module. > seems like a pretty shaky assumption. I wonder also whether > closing and reopening the DL handle will really do anything > at all for already-loaded libraries ... internal_load_library() avoids duplicate loading so I believe it cannot happen. Regarding duplicate dlopen, RTLD_GLOBAL just promotes RTLD_LOCAL. In other words, the former wins. So it's not a problem if the dlclose() there did nothing. We could RTLD_NOLOAD for almost all platforms but win32 implement needs additinal change to make work it as expected. If you are saying hacky about that it uses only the symbol of unused function, we can actually call it to get the flag. regards. -- Kyotaro Horiguchi NTT Open Source Software Center