Thread: Re: [COMMITTERS] pgsql: Add transforms feature

Re: [COMMITTERS] pgsql: Add transforms feature

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On 4/26/15 12:36 PM, Tom Lane wrote:
>> I don't know why this patch is fooling around with compile/link flags,
>> but it's broken at least prairiedog

> The addition of the link flag -undefined dynamic_lookup is so that
> plugins can refer to symbols from other plugins.  This is how other
> platforms work by default, and it is standard for OS X in GNU libtool
> and many other projects.  I'm confused why prairiedog is complaining
> about that.

We could look into a fix for prairiedog's situation, but TBH I think that
that is going in 100% the wrong direction.  I'm pretty desperately unhappy
with the design choice you've made here, for two reasons:

1. Preventing undefined-symbol errors at link time might be standard
according to some platforms, but it is not the universal default, and
I think there is no very good reason to assume that it is possible
everywhere.  So I'm afraid that prairiedog is just the canary in the coal
mine, warning us about portability problems we're going to have with
other non-mainstream platforms not represented in the buildfarm.

2. Preventing undefined-symbol errors at link time will hide actual coding
errors, not only the intended cross-plugin reference cases.  We have
repeatedly seen the buildfarm members that complain about this find actual
bugs that were missed on laxer platforms.  Therefore I think that this
approach is going to lead to masking bugs that we'll find only by much
more expensive/painful means than noticing a buildfarm failure.

Surely there are other ways to deal with this requirement that don't
require forcing undefined symbols to be treated as valid.  For instance,
our existing find_rendezvous_variable() infrastructure might offer a
usable solution for passing a function pointer from plugin A to plugin B.
Indeed, that's pretty much what it was invented for.
        regards, tom lane



Re: [COMMITTERS] pgsql: Add transforms feature

From
Peter Eisentraut
Date:
On 5/3/15 2:27 PM, Tom Lane wrote:
> 1. Preventing undefined-symbol errors at link time might be standard
> according to some platforms, but it is not the universal default, and
> I think there is no very good reason to assume that it is possible
> everywhere.  So I'm afraid that prairiedog is just the canary in the coal
> mine, warning us about portability problems we're going to have with
> other non-mainstream platforms not represented in the buildfarm.

I don't think this is an issue, for the following reasons:

Any dlopenable module will have undefined symbols, namely those that it
uses to call back into the executable that loads the module.  The only
way it can know to not complain about those symbols is if we tell it
about that executable at build time.  Our current shared library build
setup makes reference to the postgres executable only on three
platforms: Windows, OS X, AIX.  All other platforms necessarily accept
all undefined symbols at build time.  We already know that it can also
be made to work on Windows and OS X.  I expect that we might need some
tweaking on AIX, but I'm quite sure it can be made to work there also,
because ...

More generally, the way to build a dlopenable module with GNU libtool is
   libtool --mode=link gcc -module -o hello.la foo.lo hello.lo

This works on "all" platforms, for a very large value of "all".  There
isn't even an option to make a reference to the loading executable.

As a side note, Perl and Python themselves don't even build without this
option, so even if such platforms existed, they couldn't install Perl or
Python as we know it, and so wouldn't be able to use these features.


> 2. Preventing undefined-symbol errors at link time will hide actual coding
> errors, not only the intended cross-plugin reference cases.  We have
> repeatedly seen the buildfarm members that complain about this find actual
> bugs that were missed on laxer platforms.  Therefore I think that this
> approach is going to lead to masking bugs that we'll find only by much
> more expensive/painful means than noticing a buildfarm failure.

I appreciate this issue, and I have actually done quite a bit of
research in the hope of being able to provide similar functionality on
other platforms.  But on platforms such as Linux, there is no equivalent
option at all.  That is, you cannot build a dlopenable module with the
provision to error on all undefined symbols
except those found, say, in the postgres binary.  So there is no way to
make all platforms reasonably similar here.

Which leads to a social problem.  This is a feature intended for
extension authors.  Extension authors will just build their code on
Linux, and if it runs, they ship it.  Not everyone (hardly anyone) has
the option of doing platform testing for extensions.  So if some
non-mainstream platform is more picky than others, then the inevitable
result is that extensions are more likely to be broken on non-mainstream
platforms.  This is already frequently the case for other reasons, with
"non-mainstream" effectively meaning anything other than the author's
own machine in some cases, it appears.  We're not going to make this
better by maintaining platform-specific traps.

This issue already exists independent of this feature.  If I want to
create an extension that adds some functionality to hstore or hll or
postgis, I'll just call their symbols, it works, I'll ship it.  We don't
have any way to prevent this, because many mainstream platforms don't
have the required fine-grained undefined symbol controls.

And the extension authors' path of least resistance when faced with a
build failure report on OS X, say, is to just add an option to the link
command in their extension, it works, it ships.

Moreover, I'm not sure this error checking actually buys us much in
practice.  A typoed symbol will be flagged by a compiler warning, and
any undefined symbol will be caught be the test suite as soon as the
module is loaded. So I can't imagine what those buildfarm complaints
are, although I'd be interested look into them the analyze the
circumstances if someone could point me to some concrete cases.

Nonetheless, if there is so much attachment to this behavior, there
might be a simple compromise.  We keep the error checking on by default,
but allow a pgxs-level variable setting like SHLIB_ALLOW_UNDEFINED.
This would essentially be equivalent to the
I'll-just-hack-my-extension's-makefile approach mentioned above that
will inevitably happen, but that way we can document it and maintain
some level of control over it.


> Surely there are other ways to deal with this requirement that don't
> require forcing undefined symbols to be treated as valid.  For instance,
> our existing find_rendezvous_variable() infrastructure might offer a
> usable solution for passing a function pointer from plugin A to plugin B.
> Indeed, that's pretty much what it was invented for.

The rendezvous variables were invented for a different problem, namely
that you can load modules in arbitrary order, and only the first guy
initializes a variable.  That's not the functionality that we need.

Also, any such approach would likely entail casting all functions and
variables through common pointer types, which would lose all kinds of
compiler checking.  (All rendezvous variables are void pointers.)

It would put quite a bit of extra burden on the authors of modules such
as hstore or postgis to not only maintain a list of exported functions
but also test those interfaces.  Either they'd have to dogfood those
interfaces and replace calls like hstoreCheckKeyLen(...) by
((somecrazycast) myfunctionlookup(hstoreCheckKeyLen))(...), or they'd
have to maintain a parallel set of interfaces that they could only test
by implementing a separate mock user of those interfaces.

At the end of the day, you'll end up implementing a bad version of half
the dynamic linker again in anticipation of a problem we know nothing about.



Re: [COMMITTERS] pgsql: Add transforms feature

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On 5/3/15 2:27 PM, Tom Lane wrote:
>> 2. Preventing undefined-symbol errors at link time will hide actual coding
>> errors, not only the intended cross-plugin reference cases.  We have
>> repeatedly seen the buildfarm members that complain about this find actual
>> bugs that were missed on laxer platforms.  Therefore I think that this
>> approach is going to lead to masking bugs that we'll find only by much
>> more expensive/painful means than noticing a buildfarm failure.

> I appreciate this issue, and I have actually done quite a bit of
> research in the hope of being able to provide similar functionality on
> other platforms.

I agree that it's probably impractical to persuade toolchains to do this
if they don't want to.  But that doesn't mean that it's a good idea to
remove the whole class of error checks on platforms where it *is* possible
to make the check.  The entire reason that we have a buildfarm is, in
essence, to get the union of all checks that are possible on any supported
platform.

> Moreover, I'm not sure this error checking actually buys us much in
> practice.  A typoed symbol will be flagged by a compiler warning, and
> any undefined symbol will be caught be the test suite as soon as the
> module is loaded. So I can't imagine what those buildfarm complaints
> are, although I'd be interested look into them the analyze the
> circumstances if someone could point me to some concrete cases.

On many platforms, undefined symbols will only be complained of if you
actually try to call them, so I think this position places undue faith
in the code coverage of our buildfarm tests.  (As a counterexample,
consider SSL, which I don't think we're exercising at all in the
buildfarm because of the difficulty of setup.  And we have had bugs of
this sort versus OpenSSL, cf commit c9e1ad7fa.)

>> Surely there are other ways to deal with this requirement that don't
>> require forcing undefined symbols to be treated as valid.  For instance,
>> our existing find_rendezvous_variable() infrastructure might offer a
>> usable solution for passing a function pointer from plugin A to plugin B.

> The rendezvous variables were invented for a different problem, namely
> that you can load modules in arbitrary order, and only the first guy
> initializes a variable.  That's not the functionality that we need.

Sure it is.  Look at what plpgsql uses it for: to export a
"PLpgSQL_plugin" struct containing function pointers that might be used
by other extensions.  That seems to me to be exactly the same case as
what we're talking about here.

> Also, any such approach would likely entail casting all functions and
> variables through common pointer types, which would lose all kinds of
> compiler checking.  (All rendezvous variables are void pointers.)

Only with very poor coding style.  Properly declared function pointers
are pretty type-safe.

> It would put quite a bit of extra burden on the authors of modules such
> as hstore or postgis to not only maintain a list of exported functions
> but also test those interfaces.

Well, that actually gets to the core of my point: if a module author
exports a struct full of function pointers then he has put a stake in the
ground as to exactly what his exported API *is*.  With what you've done
for transforms, what is the exported API of hstore or plpython or plperl?
Conceivably, any non-static function in those libraries is now fair game
for somebody else to call.  We have this problem already with respect to
the core code, and it's nasty.  We should not be encouraging the same
situation to develop for extensions.

I do not understand the second part of your point.  Surely, if an
extension author expects other extensions to call function X, he's got the
same testing problem regardless of how the other extensions would reach X
exactly.
        regards, tom lane



TRANSFORM modules vs. AIX

From
Noah Misch
Date:
(was Re: [COMMITTERS] pgsql: Add transforms feature)

On Tue, May 05, 2015 at 10:24:03PM -0400, Peter Eisentraut wrote:
> Any dlopenable module will have undefined symbols, namely those that it
> uses to call back into the executable that loads the module.  The only
> way it can know to not complain about those symbols is if we tell it
> about that executable at build time.  Our current shared library build
> setup makes reference to the postgres executable only on three
> platforms: Windows, OS X, AIX.  All other platforms necessarily accept
> all undefined symbols at build time.  We already know that it can also
> be made to work on Windows and OS X.  I expect that we might need some
> tweaking on AIX, but I'm quite sure it can be made to work there also,

We can address AIX almost exactly like we address Windows, attached.
Relocating an AIX installation will then require adding $(pkglibdir) to
LD_LIBRARY_PATH.  Windows, too, must need the reference from ltree_plpython to
plpython; I'll add it separately.  (It is for PLyUnicode_FromStringAndSize(),
which we omit under Python 2.  The buildfarm's Windows+GCC+Python animal,
frogmouth, uses Python 2.)


contrib/hstore_plperl test coverage revealed a PL/Perl bug found in all
supported branches.  Any import of a native-code module crashes:

  create language plperlu;
  CREATE FUNCTION test1none() RETURNS int LANGUAGE plperlu
    AS $$ use IO; return 0; $$;

"man perlaix" gives the essential clue:

       Note that starting from Perl 5.7.2 (and consequently 5.8.0) and AIX
       4.3 or newer Perl uses the AIX native dynamic loading interface in
       the so called runtime linking mode instead of the emulated interface
       that was used in Perl releases 5.6.1 and earlier or, for AIX releases
       4.2 and earlier.  This change does break backward compatibility with
       compiled modules from earlier perl releases.  The change was made to
       make Perl more compliant with other applications like Apache/mod_perl
       which are using the AIX native interface. This change also enables
       the use of C++ code with static constructors and destructors in perl
       extensions, which was not possible using the emulated interface.

PostgreSQL uses AIX default linking, not AIX runtime linking.  Perl module
shared objects (e.g. IO.so) contain undefined symbols and rely on the runtime
linker to resolve them.  In an executable free from the runtime linker, those
symbols just remain NULL.  I am inclined to back-patch the following:

--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -90,3 +90,3 @@ ifeq ($(PORTNAME), aix)
 postgres: $(POSTGRES_IMP)
-       $(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) $(call expand_subsys,$(OBJS))
-Wl,-bE:$(top_builddir)/src/backend/$(POSTGRES_IMP)$(LIBS) -o $@ 
+       $(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) $(call expand_subsys,$(OBJS))
-Wl,-bE:$(top_builddir)/src/backend/$(POSTGRES_IMP)$(LIBS) -Wl,-brtllib -o $@ 

That allows the runtime linker to process modules dlopen()'ed within the
postgres executable.  To my knowledge, it changes nothing else.


At some point, we might switch from AIX default linking to runtime linking.
(Concretely, that entails building modules with -Wl,-G and the postgres
executable with -Wl,-brtl.)  Runtime linking treats undefined symbols and
duplicate symbols much more like other modern systems treat them.  I don't
personally wish to rock this boat to that degree.  "ld -lfoo" ignores foo.so,
but "ld -brtl -lfoo" prefers foo.so over foo.a.  Therefore, such a change
wouldn't be back-patch material.  Details on AIX shared library linking:

http://www-01.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.genprogc/shared_object_runtime_linking.htm
http://www-01.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.cmds3/ld.htm

Attachment