Thread: Re: [COMMITTERS] pgsql: Add transforms feature
Peter Eisentraut <peter_e@gmx.net> writes: > On 4/26/15 12:36 PM, Tom Lane wrote: >> I don't know why this patch is fooling around with compile/link flags, >> but it's broken at least prairiedog > The addition of the link flag -undefined dynamic_lookup is so that > plugins can refer to symbols from other plugins. This is how other > platforms work by default, and it is standard for OS X in GNU libtool > and many other projects. I'm confused why prairiedog is complaining > about that. We could look into a fix for prairiedog's situation, but TBH I think that that is going in 100% the wrong direction. I'm pretty desperately unhappy with the design choice you've made here, for two reasons: 1. Preventing undefined-symbol errors at link time might be standard according to some platforms, but it is not the universal default, and I think there is no very good reason to assume that it is possible everywhere. So I'm afraid that prairiedog is just the canary in the coal mine, warning us about portability problems we're going to have with other non-mainstream platforms not represented in the buildfarm. 2. Preventing undefined-symbol errors at link time will hide actual coding errors, not only the intended cross-plugin reference cases. We have repeatedly seen the buildfarm members that complain about this find actual bugs that were missed on laxer platforms. Therefore I think that this approach is going to lead to masking bugs that we'll find only by much more expensive/painful means than noticing a buildfarm failure. Surely there are other ways to deal with this requirement that don't require forcing undefined symbols to be treated as valid. For instance, our existing find_rendezvous_variable() infrastructure might offer a usable solution for passing a function pointer from plugin A to plugin B. Indeed, that's pretty much what it was invented for. regards, tom lane
On 5/3/15 2:27 PM, Tom Lane wrote: > 1. Preventing undefined-symbol errors at link time might be standard > according to some platforms, but it is not the universal default, and > I think there is no very good reason to assume that it is possible > everywhere. So I'm afraid that prairiedog is just the canary in the coal > mine, warning us about portability problems we're going to have with > other non-mainstream platforms not represented in the buildfarm. I don't think this is an issue, for the following reasons: Any dlopenable module will have undefined symbols, namely those that it uses to call back into the executable that loads the module. The only way it can know to not complain about those symbols is if we tell it about that executable at build time. Our current shared library build setup makes reference to the postgres executable only on three platforms: Windows, OS X, AIX. All other platforms necessarily accept all undefined symbols at build time. We already know that it can also be made to work on Windows and OS X. I expect that we might need some tweaking on AIX, but I'm quite sure it can be made to work there also, because ... More generally, the way to build a dlopenable module with GNU libtool is libtool --mode=link gcc -module -o hello.la foo.lo hello.lo This works on "all" platforms, for a very large value of "all". There isn't even an option to make a reference to the loading executable. As a side note, Perl and Python themselves don't even build without this option, so even if such platforms existed, they couldn't install Perl or Python as we know it, and so wouldn't be able to use these features. > 2. Preventing undefined-symbol errors at link time will hide actual coding > errors, not only the intended cross-plugin reference cases. We have > repeatedly seen the buildfarm members that complain about this find actual > bugs that were missed on laxer platforms. Therefore I think that this > approach is going to lead to masking bugs that we'll find only by much > more expensive/painful means than noticing a buildfarm failure. I appreciate this issue, and I have actually done quite a bit of research in the hope of being able to provide similar functionality on other platforms. But on platforms such as Linux, there is no equivalent option at all. That is, you cannot build a dlopenable module with the provision to error on all undefined symbols except those found, say, in the postgres binary. So there is no way to make all platforms reasonably similar here. Which leads to a social problem. This is a feature intended for extension authors. Extension authors will just build their code on Linux, and if it runs, they ship it. Not everyone (hardly anyone) has the option of doing platform testing for extensions. So if some non-mainstream platform is more picky than others, then the inevitable result is that extensions are more likely to be broken on non-mainstream platforms. This is already frequently the case for other reasons, with "non-mainstream" effectively meaning anything other than the author's own machine in some cases, it appears. We're not going to make this better by maintaining platform-specific traps. This issue already exists independent of this feature. If I want to create an extension that adds some functionality to hstore or hll or postgis, I'll just call their symbols, it works, I'll ship it. We don't have any way to prevent this, because many mainstream platforms don't have the required fine-grained undefined symbol controls. And the extension authors' path of least resistance when faced with a build failure report on OS X, say, is to just add an option to the link command in their extension, it works, it ships. Moreover, I'm not sure this error checking actually buys us much in practice. A typoed symbol will be flagged by a compiler warning, and any undefined symbol will be caught be the test suite as soon as the module is loaded. So I can't imagine what those buildfarm complaints are, although I'd be interested look into them the analyze the circumstances if someone could point me to some concrete cases. Nonetheless, if there is so much attachment to this behavior, there might be a simple compromise. We keep the error checking on by default, but allow a pgxs-level variable setting like SHLIB_ALLOW_UNDEFINED. This would essentially be equivalent to the I'll-just-hack-my-extension's-makefile approach mentioned above that will inevitably happen, but that way we can document it and maintain some level of control over it. > Surely there are other ways to deal with this requirement that don't > require forcing undefined symbols to be treated as valid. For instance, > our existing find_rendezvous_variable() infrastructure might offer a > usable solution for passing a function pointer from plugin A to plugin B. > Indeed, that's pretty much what it was invented for. The rendezvous variables were invented for a different problem, namely that you can load modules in arbitrary order, and only the first guy initializes a variable. That's not the functionality that we need. Also, any such approach would likely entail casting all functions and variables through common pointer types, which would lose all kinds of compiler checking. (All rendezvous variables are void pointers.) It would put quite a bit of extra burden on the authors of modules such as hstore or postgis to not only maintain a list of exported functions but also test those interfaces. Either they'd have to dogfood those interfaces and replace calls like hstoreCheckKeyLen(...) by ((somecrazycast) myfunctionlookup(hstoreCheckKeyLen))(...), or they'd have to maintain a parallel set of interfaces that they could only test by implementing a separate mock user of those interfaces. At the end of the day, you'll end up implementing a bad version of half the dynamic linker again in anticipation of a problem we know nothing about.
Peter Eisentraut <peter_e@gmx.net> writes: > On 5/3/15 2:27 PM, Tom Lane wrote: >> 2. Preventing undefined-symbol errors at link time will hide actual coding >> errors, not only the intended cross-plugin reference cases. We have >> repeatedly seen the buildfarm members that complain about this find actual >> bugs that were missed on laxer platforms. Therefore I think that this >> approach is going to lead to masking bugs that we'll find only by much >> more expensive/painful means than noticing a buildfarm failure. > I appreciate this issue, and I have actually done quite a bit of > research in the hope of being able to provide similar functionality on > other platforms. I agree that it's probably impractical to persuade toolchains to do this if they don't want to. But that doesn't mean that it's a good idea to remove the whole class of error checks on platforms where it *is* possible to make the check. The entire reason that we have a buildfarm is, in essence, to get the union of all checks that are possible on any supported platform. > Moreover, I'm not sure this error checking actually buys us much in > practice. A typoed symbol will be flagged by a compiler warning, and > any undefined symbol will be caught be the test suite as soon as the > module is loaded. So I can't imagine what those buildfarm complaints > are, although I'd be interested look into them the analyze the > circumstances if someone could point me to some concrete cases. On many platforms, undefined symbols will only be complained of if you actually try to call them, so I think this position places undue faith in the code coverage of our buildfarm tests. (As a counterexample, consider SSL, which I don't think we're exercising at all in the buildfarm because of the difficulty of setup. And we have had bugs of this sort versus OpenSSL, cf commit c9e1ad7fa.) >> Surely there are other ways to deal with this requirement that don't >> require forcing undefined symbols to be treated as valid. For instance, >> our existing find_rendezvous_variable() infrastructure might offer a >> usable solution for passing a function pointer from plugin A to plugin B. > The rendezvous variables were invented for a different problem, namely > that you can load modules in arbitrary order, and only the first guy > initializes a variable. That's not the functionality that we need. Sure it is. Look at what plpgsql uses it for: to export a "PLpgSQL_plugin" struct containing function pointers that might be used by other extensions. That seems to me to be exactly the same case as what we're talking about here. > Also, any such approach would likely entail casting all functions and > variables through common pointer types, which would lose all kinds of > compiler checking. (All rendezvous variables are void pointers.) Only with very poor coding style. Properly declared function pointers are pretty type-safe. > It would put quite a bit of extra burden on the authors of modules such > as hstore or postgis to not only maintain a list of exported functions > but also test those interfaces. Well, that actually gets to the core of my point: if a module author exports a struct full of function pointers then he has put a stake in the ground as to exactly what his exported API *is*. With what you've done for transforms, what is the exported API of hstore or plpython or plperl? Conceivably, any non-static function in those libraries is now fair game for somebody else to call. We have this problem already with respect to the core code, and it's nasty. We should not be encouraging the same situation to develop for extensions. I do not understand the second part of your point. Surely, if an extension author expects other extensions to call function X, he's got the same testing problem regardless of how the other extensions would reach X exactly. regards, tom lane
(was Re: [COMMITTERS] pgsql: Add transforms feature) On Tue, May 05, 2015 at 10:24:03PM -0400, Peter Eisentraut wrote: > Any dlopenable module will have undefined symbols, namely those that it > uses to call back into the executable that loads the module. The only > way it can know to not complain about those symbols is if we tell it > about that executable at build time. Our current shared library build > setup makes reference to the postgres executable only on three > platforms: Windows, OS X, AIX. All other platforms necessarily accept > all undefined symbols at build time. We already know that it can also > be made to work on Windows and OS X. I expect that we might need some > tweaking on AIX, but I'm quite sure it can be made to work there also, We can address AIX almost exactly like we address Windows, attached. Relocating an AIX installation will then require adding $(pkglibdir) to LD_LIBRARY_PATH. Windows, too, must need the reference from ltree_plpython to plpython; I'll add it separately. (It is for PLyUnicode_FromStringAndSize(), which we omit under Python 2. The buildfarm's Windows+GCC+Python animal, frogmouth, uses Python 2.) contrib/hstore_plperl test coverage revealed a PL/Perl bug found in all supported branches. Any import of a native-code module crashes: create language plperlu; CREATE FUNCTION test1none() RETURNS int LANGUAGE plperlu AS $$ use IO; return 0; $$; "man perlaix" gives the essential clue: Note that starting from Perl 5.7.2 (and consequently 5.8.0) and AIX 4.3 or newer Perl uses the AIX native dynamic loading interface in the so called runtime linking mode instead of the emulated interface that was used in Perl releases 5.6.1 and earlier or, for AIX releases 4.2 and earlier. This change does break backward compatibility with compiled modules from earlier perl releases. The change was made to make Perl more compliant with other applications like Apache/mod_perl which are using the AIX native interface. This change also enables the use of C++ code with static constructors and destructors in perl extensions, which was not possible using the emulated interface. PostgreSQL uses AIX default linking, not AIX runtime linking. Perl module shared objects (e.g. IO.so) contain undefined symbols and rely on the runtime linker to resolve them. In an executable free from the runtime linker, those symbols just remain NULL. I am inclined to back-patch the following: --- a/src/backend/Makefile +++ b/src/backend/Makefile @@ -90,3 +90,3 @@ ifeq ($(PORTNAME), aix) postgres: $(POSTGRES_IMP) - $(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) $(call expand_subsys,$(OBJS)) -Wl,-bE:$(top_builddir)/src/backend/$(POSTGRES_IMP)$(LIBS) -o $@ + $(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) $(call expand_subsys,$(OBJS)) -Wl,-bE:$(top_builddir)/src/backend/$(POSTGRES_IMP)$(LIBS) -Wl,-brtllib -o $@ That allows the runtime linker to process modules dlopen()'ed within the postgres executable. To my knowledge, it changes nothing else. At some point, we might switch from AIX default linking to runtime linking. (Concretely, that entails building modules with -Wl,-G and the postgres executable with -Wl,-brtl.) Runtime linking treats undefined symbols and duplicate symbols much more like other modern systems treat them. I don't personally wish to rock this boat to that degree. "ld -lfoo" ignores foo.so, but "ld -brtl -lfoo" prefers foo.so over foo.a. Therefore, such a change wouldn't be back-patch material. Details on AIX shared library linking: http://www-01.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.genprogc/shared_object_runtime_linking.htm http://www-01.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.cmds3/ld.htm