Thread: RTLD_GLOBAL (& JIT inlining)

RTLD_GLOBAL (& JIT inlining)

From
Andres Freund
Date:
Hi Peter, All,


First question:

Why do we currently use RTLD_GLOBAL loading extension libraries, but
take pains ([1]) to make things work without RTLD_GLOBAL. It seems like
it'd be both safer to RTLD_LOCAL on platforms supporting it (or
equivalent), as well as less error-prone because we'd be less likely to
depend on it like we did e.g. during transforms development.


It seems error prone because on systmes I'm aware of (IOW linux ;))
conflicting symbols will not cause errors. Neither when there's a
conflict between a .so and the main binary, nor between different
shlibs.  My reading of glibc's ld is that the search order for
unresolved references in a .so is main binary, and then other
RTLD_GLOBAL shared libraries in order of time they're loaded.

Right now it appears that e.g. hstore_plperl has some issue, building
with RTLD_LOCAL makes its test fail. Which means it'll probably not work
on some platforms.


I think using RTLD_LOCAL on most machines would be a much better
idea. I've not found proper explanations why GLOBAL is used. We started
using it ages ago, with [2], but that commit contains no explanation,
and a quick search didn't show up anything either. Peter?


Secondly:

For JITing internal & C operators, and other functions referenced in
JITed code, can be inlined if the definition is available in a suitable
form. The details how that works don't matter much here.


For my jitting work I've so far treated symbols of binaries and all
referenced extensions as being part of a single namespace. Currently the
order in which symbols with multiple definitions were looked up was in
essentially in filesystem order.

While I'm not aware of any observable problems, reviewing that decision
I think that's a terrible idea. First and foremost it seems obviously
wrong to not load symbols from the main postgres binary first. But
leaving that aside, I'm uncomfortable doing inlining between shlibs for
JITing, because that can lead to symbols beeing reloaded differently
between non JITed code (where all external libraries loaded in the
current library are searched in load/usage order) and JITed code (where
the symbols would be resolved according to some static order).

I'm about to rewrite it so that functions are inlined just from the main
binary, or the basename of the shared library explicitly referenced in
the function definition.  That's still different, but at least easy to
understand.  But that still means there's inconsistent order between
different methods of execution, which isn't great.

Comments?

Greetings,

Andres Freund

[1] http://archives.postgresql.org/message-id/2652.1475512158%40sss.pgh.pa.us
[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c87bc779d4e9f109e92f8b8f1dfad5d6739f8e97


Re: RTLD_GLOBAL (& JIT inlining)

From
Andres Freund
Date:
Hi,

On 2018-02-22 23:11:07 -0800, Andres Freund wrote:
> I think using RTLD_LOCAL on most machines would be a much better
> idea. I've not found proper explanations why GLOBAL is used. We started
> using it ages ago, with [2], but that commit contains no explanation,
> and a quick search didn't show up anything either. Peter?

Hm. No idea if that was the reason back then, but I think it's
unfortunately not easy to change.  The reason why some plperlu tests
fail, is that plperl extension libraries rely on plperl to be loaded
into the global namespace.

When plperl gets loaded with RTLD_LOCAL all the dependant shared
libaries (i.e. libperl.so) also get loaded with it. That works perfectly
for plperl itself, but once per loads an extension library is loaded, it
fails to resolve libperl.so symbols...  I can "fix" this by
dlopen(RTLD_GLOBAL) libperl.so, but that's obviously not a practial
solution.


FWIW, something in plperl's error handling appears to be busted. Instead
of a helpful error report we get:
ERROR:  22021: invalid byte sequence for encoding "UTF8": 0x00
which doesn't help much to nail down the issue.

With a breakpoint it becomes clear what the issue is:

#0  report_invalid_encoding (encoding=6, mbstr=0x557ae880416b "", len=252)
    at /home/andres/src/postgresql/src/backend/utils/mb/wchar.c:1997
#1  0x0000557ae73d4473 in pg_verify_mbstr_len (encoding=6, mbstr=0x557ae880416b "", len=252, noError=0 '\000')
    at /home/andres/src/postgresql/src/backend/utils/mb/wchar.c:1936
#2  0x0000557ae73d4354 in pg_verify_mbstr (encoding=6, 
    mbstr=0x557ae8804080 "Can't load '/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so' for module
List::Util:/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so: undefined symbol: PL_memory_wrap at
/usr/share/perl/5.26/XSLoader.pmline 96.\n", len=487, noError=0 '\000') at
/home/andres/src/postgresql/src/backend/utils/mb/wchar.c:1879
#3  0x0000557ae73d119a in pg_any_to_server (
    s=0x557ae8804080 "Can't load '/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so' for module List::Util:
/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so:undefined symbol: PL_memory_wrap at
/usr/share/perl/5.26/XSLoader.pmline 96.\n", len=487, encoding=6) at
/home/andres/src/postgresql/src/backend/utils/mb/mbutils.c:572
#4  0x00007f4cabb6b123 in utf_u2e (
    utf8_str=0x557ae8804080 "Can't load '/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so' for module
List::Util:/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so: undefined symbol: PL_memory_wrap at
/usr/share/perl/5.26/XSLoader.pmline 96.\n", len=487) at /home/andres/src/postgresql/src/pl/plperl/plperl_helpers.h:16
 
#5  0x00007f4cabb6b2f6 in sv2cstr (sv=0x557ae8769568) at /home/andres/src/postgresql/src/pl/plperl/plperl_helpers.h:96
#6  0x00007f4cabb732dc in plperl_call_perl_func (desc=0x557ae86cf6b0, fcinfo=0x557ae875fc20)
    at /home/andres/src/postgresql/src/pl/plperl/plperl.c:2250
#7  0x00007f4cabb74a3b in plperl_func_handler (fcinfo=0x557ae875fc20)
    at /home/andres/src/postgresql/src/pl/plperl/plperl.c:2438

I don't now perl apis at all, but it's clearly wrong that fram 3,4 show
len=487, which comes from SvPVutf8() in sv2cstr().  It appears that
somehow perl throws out two error messages separated by a null byte...

(gdb) p *(char[487]*)utf8_str
$42 = "Can't load '/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so' for module List::Util:
/usr/lib/x86_64-linux-gnu/perl5/5.26/auto/List/Util/Util.so:undefined symbol: PL_memory_wrap at
/usr/share/perl/5.26/XSLoader.pmline 96.\n\000 at /usr/lib/x86_64-linux-gnu/perl5/5.26/List/Util.pm line
23.\nCompilationfailed in require at /usr/lib/x86_64-linux-gnu/perl5/5.26/Scalar/Util.pm line 23.\nCompilation failed
inrequire at /usr/lib/x86_64-linux-gnu/perl/5.26/Data/Dumper.pm line 303.\n"
 


Greetings,

Andres Freund


Re: RTLD_GLOBAL (& JIT inlining)

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> First question:
> Why do we currently use RTLD_GLOBAL loading extension libraries, but
> take pains ([1]) to make things work without RTLD_GLOBAL.

As mentioned in that very message, the point was better missing-symbol
error detection, not RTLD_GLOBAL/LOCAL per se.

> I think using RTLD_LOCAL on most machines would be a much better
> idea. I've not found proper explanations why GLOBAL is used. We started
> using it ages ago, with [2], but that commit contains no explanation,
> and a quick search didn't show up anything either. Peter?

https://www.postgresql.org/message-id/7142.1277772335@sss.pgh.pa.us

My position is the same as then: I'm happy to remove it if it doesn't
break things anywhere ... but it seems like it would cause problems for
plpython, unless their behavior has changed since 2001 which is
surely possible.

            regards, tom lane


Re: RTLD_GLOBAL (& JIT inlining)

From
Andres Freund
Date:
On 2018-02-23 11:05:23 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I think using RTLD_LOCAL on most machines would be a much better
> > idea. I've not found proper explanations why GLOBAL is used. We started
> > using it ages ago, with [2], but that commit contains no explanation,
> > and a quick search didn't show up anything either. Peter?
> 
> https://www.postgresql.org/message-id/7142.1277772335@sss.pgh.pa.us
> 
> My position is the same as then: I'm happy to remove it if it doesn't
> break things anywhere ... but it seems like it would cause problems for
> plpython, unless their behavior has changed since 2001 which is
> surely possible.

It hasn't, and it's not just plpython, at least plperl is affected as
well. I'd guess most libraries that have their own dynamic loading
support are affected.

So RTLD_LOCAL is out of the question, but I think we can get a good bit
of the benefit by either specifying -Wl,-Bsymbolic at shlib build time,
or RTLD_DEEPBIND at dlopen() time.  Either leads to the opened shared
library effectively being put at the beginning of the search path,
therefore avoiding the issue that an earlier loaded shared library or
symbols from the main binary can accidentally overwrite things in the
shared library itself. Which incidentally also makes loading a bit
faster.

A bit of googling suggests -Bsymbolic is likely to be more portable.

A quick test confirms that under linux our tests pass with it, after
patching Makefile.shlib.

Greetings,

Andres Freund


Re: RTLD_GLOBAL (& JIT inlining)

From
Ants Aasma
Date:
On Mon, Feb 26, 2018 at 11:28 PM, Andres Freund <andres@anarazel.de> wrote:
> So RTLD_LOCAL is out of the question, but I think we can get a good bit
> of the benefit by either specifying -Wl,-Bsymbolic at shlib build time,
> or RTLD_DEEPBIND at dlopen() time.  Either leads to the opened shared
> library effectively being put at the beginning of the search path,
> therefore avoiding the issue that an earlier loaded shared library or
> symbols from the main binary can accidentally overwrite things in the
> shared library itself. Which incidentally also makes loading a bit
> faster.

I think this would also fix oracle_fdw crashing when postgres is
compiled with --with-ldap. At least RTLD_DEEPBIND helped. [1]

[1] https://www.postgresql.org/message-id/CA%2BCSw_tPDYgnzCYW0S4oU0mTUoUhZ9pc7MRBPXVD-3Zbiwni9w%40mail.gmail.com

Ants Aasma


Re: RTLD_GLOBAL (& JIT inlining)

From
Kyotaro HORIGUCHI
Date:
Hello.

At Mon, 26 Feb 2018 23:50:41 +0200, Ants Aasma <ants.aasma@eesti.ee> wrote in
<CA+CSw_uD6dZdg9BbOLCx+oY0rufsRSp5V5W8teonCB=OK=74aw@mail.gmail.com>
> On Mon, Feb 26, 2018 at 11:28 PM, Andres Freund <andres@anarazel.de> wrote:
> > So RTLD_LOCAL is out of the question, but I think we can get a good bit
> > of the benefit by either specifying -Wl,-Bsymbolic at shlib build time,
> > or RTLD_DEEPBIND at dlopen() time.  Either leads to the opened shared
> > library effectively being put at the beginning of the search path,
> > therefore avoiding the issue that an earlier loaded shared library or
> > symbols from the main binary can accidentally overwrite things in the
> > shared library itself. Which incidentally also makes loading a bit
> > faster.
> 
> I think this would also fix oracle_fdw crashing when postgres is
> compiled with --with-ldap. At least RTLD_DEEPBIND helped. [1]
> 
> [1] https://www.postgresql.org/message-id/CA%2BCSw_tPDYgnzCYW0S4oU0mTUoUhZ9pc7MRBPXVD-3Zbiwni9w%40mail.gmail.com

I saw several cases of the misbinding specifically among
extensions using different versions of the same library or a part
of which was just transplanted from another extension. Now I'm
seeing a case nearby again and found this thread.

Some pl-libraries (plpython does but plperl doesn't for me)
mandates to export their symbols for external modules. So
RTLD_GLOBAL is needed for such extensions but not needed in most
cases. We need a means to instruct whether a module wants to
expose symbols or not.

The extension control file doesn't seem the place. The most
appropriate way seems to be another magic data.

With the attached patch, external modules are loaded RTLD_LOCAL
by default. PG_MODULE_EXPORT_SYMBOL in a module source files let
it loaded RTLD_GLOBAL. I suppose this is the change with the
least impact on existing modules because I believe most of them
don't need to export symbols.

I think this works for all platforms that have dlopen or
shl_load. Windows doesn't the equivalent but anyway we should
explicitly declare using dllexport.

Any opinions or thoughts?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 114692655060bf74d01e0f5452e89f3bd332dfa1 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 24 Jan 2019 13:35:19 +0900
Subject: [PATCH] Add control on wheter symbols in external module is exported

We load exteral modlues with RTLD_GLOBAL, which is problematic when
two or more modules share the same symbol. On the other hand some
modules mandates to export its symbols. This patch enables modules
to decide whether to export them or not. They is defaultly hidden.

plpython module is modified along with as it needs to export symbols.
---
 doc/src/sgml/xfunc.sgml        |  8 ++++++++
 src/backend/utils/fmgr/dfmgr.c | 13 ++++++++++++-
 src/include/fmgr.h             |  8 ++++++++
 src/pl/plpython/plpy_main.c    |  1 +
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index e18272c33a..9a65e76eda 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -1875,6 +1875,14 @@ PG_MODULE_MAGIC;
     (Presently, unloads are disabled and will never occur, but this may
     change in the future.)
    </para>
+   <para>
+     The dynamic loader loads a file as its symbols are hidden from external
+     modules by default. A file is loaded exporting the symbols by writhing
+     this in one of the module source files.
+<programlisting>
+PG_MODULE_EXPORT_SYMBOL;
+</programlisting>
+   </para>
 
   </sect2>
 
diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c
index 456297a531..98ac1cc438 100644
--- a/src/backend/utils/fmgr/dfmgr.c
+++ b/src/backend/utils/fmgr/dfmgr.c
@@ -236,7 +236,7 @@ internal_load_library(const char *libname)
 #endif
         file_scanner->next = NULL;
 
-        file_scanner->handle = dlopen(file_scanner->filename, RTLD_NOW | RTLD_GLOBAL);
+        file_scanner->handle = dlopen(file_scanner->filename, RTLD_NOW | RTLD_LOCAL);
         if (file_scanner->handle == NULL)
         {
             load_error = dlerror();
@@ -281,6 +281,17 @@ internal_load_library(const char *libname)
                      errhint("Extension libraries are required to use the PG_MODULE_MAGIC macro.")));
         }
 
+        /* Check if the module wants to export symbols */
+        if (dlsym(file_scanner->handle, PG_MODULE_EXPORT_SYMBOL_NAME_STRING))
+        {
+            elog(DEBUG3,
+                 "loaded dynamic-link library \"%s\" with RTLD_GLOBAL", libname);
+            /* want to export, reload it the way */
+            dlclose(file_scanner->handle);
+            file_scanner->handle = dlopen(file_scanner->filename,
+                                          RTLD_NOW | RTLD_GLOBAL);
+        }
+
         /*
          * If the library has a _PG_init() function, call it.
          */
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index ead17f0e44..c072850646 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -456,6 +456,14 @@ PG_MAGIC_FUNCTION_NAME(void) \
 } \
 extern int no_such_variable
 
+#define PG_MODULE_EXPORT_SYMBOL_NAME Pg_module_exrpot_symbols
+#define PG_MODULE_EXPORT_SYMBOL_NAME_STRING "Pg_module_exrpot_symbols"
+
+#define PG_MODULE_EXPORT_SYMBOL    \
+extern PGDLLEXPORT void PG_MODULE_EXPORT_SYMBOL_NAME(void); \
+extern PGDLLEXPORT void PG_MODULE_EXPORT_SYMBOL_NAME(void){}
+
+
 
 /*-------------------------------------------------------------------------
  *        Support routines and macros for callers of fmgr-compatible functions
diff --git a/src/pl/plpython/plpy_main.c b/src/pl/plpython/plpy_main.c
index 6a66eba176..9165b5f455 100644
--- a/src/pl/plpython/plpy_main.c
+++ b/src/pl/plpython/plpy_main.c
@@ -42,6 +42,7 @@
 extern void _PG_init(void);
 
 PG_MODULE_MAGIC;
+PG_MODULE_EXPORT_SYMBOL;
 
 PG_FUNCTION_INFO_V1(plpython_validator);
 PG_FUNCTION_INFO_V1(plpython_call_handler);
-- 
2.16.3


Re: RTLD_GLOBAL (& JIT inlining)

From
Tom Lane
Date:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> With the attached patch, external modules are loaded RTLD_LOCAL
> by default. PG_MODULE_EXPORT_SYMBOL in a module source files let
> it loaded RTLD_GLOBAL. I suppose this is the change with the
> least impact on existing modules because I believe most of them
> don't need to export symbols.

This seems really hacky :-(

In the PL cases, at least, it's not so much the PL itself that
wants to export symbols as the underlying language library
(libpython, libperl, etc).  You're assuming that an action on
the PL .so will propagate to the underlying language .so, which
seems like a pretty shaky assumption.  I wonder also whether
closing and reopening the DL handle will really do anything
at all for already-loaded libraries ...

            regards, tom lane


Re: RTLD_GLOBAL (& JIT inlining)

From
Kyotaro HORIGUCHI
Date:
At Thu, 24 Jan 2019 01:02:14 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <24744.1548309734@sss.pgh.pa.us>
> Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> > With the attached patch, external modules are loaded RTLD_LOCAL
> > by default. PG_MODULE_EXPORT_SYMBOL in a module source files let
> > it loaded RTLD_GLOBAL. I suppose this is the change with the
> > least impact on existing modules because I believe most of them
> > don't need to export symbols.
> 
> This seems really hacky :-(
> 
> In the PL cases, at least, it's not so much the PL itself that
> wants to export symbols as the underlying language library
> (libpython, libperl, etc).  You're assuming that an action on
> the PL .so will propagate to the underlying language .so, which

(If I understand you correctly) I'm not assuming that the
propagation happens or even the contray.  As can be seen from it
actually works, symbols of underlying libraries referred from
pl*.so are resoved using dependency list in pl*.so itself
independently of the RTLD flags given for it. The problem here is
while underlying language libraries cannot ferer symbols back in
the pl*.so when it is loaded with RTLD_LOCAL, RTLD_GLOBAL makes
some extensions unhappy.

I think that the assumption that RTLD_GLOBAL resolves all was
proved rather shaky. RTLD_DEEPBIND is not available on some
platforms so we need to choose between LOCAL and GLOBAL per
loading module.

> seems like a pretty shaky assumption.  I wonder also whether
> closing and reopening the DL handle will really do anything
> at all for already-loaded libraries ...

internal_load_library() avoids duplicate loading so I believe it
cannot happen.  Regarding duplicate dlopen, RTLD_GLOBAL just
promotes RTLD_LOCAL. In other words, the former wins. So it's not
a problem if the dlclose() there did nothing.

We could RTLD_NOLOAD for almost all platforms but win32 implement
needs additinal change to make work it as expected.

If you are saying hacky about that it uses only the symbol of
unused function, we can actually call it to get the flag.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center