Re: narwhal and PGDLLIMPORT - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: narwhal and PGDLLIMPORT |
Date | |
Msg-id | 20141020202447.GH7176@awork2.anarazel.de Whole thread Raw |
In response to | Re: narwhal and PGDLLIMPORT (Noah Misch <noah@leadboat.com>) |
Responses |
Re: narwhal and PGDLLIMPORT
|
List | pgsql-hackers |
On 2014-10-20 01:03:31 -0400, Noah Misch wrote: > On Wed, Oct 15, 2014 at 12:53:03AM -0400, Noah Misch wrote: > > On Tue, Oct 14, 2014 at 07:07:17PM -0400, Tom Lane wrote: > > > Dave Page <dpage@pgadmin.org> writes: > > > > On Tue, Oct 14, 2014 at 11:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > >> I think we're hoping that somebody will step up and investigate how > > > >> narwhal's problem might be fixed. > > > > I have planned to look at reproducing narwhal's problem once the dust settles > > on orangutan, but I wouldn't mind if narwhal went away instead. > > > > No argument here. I would kind of like to have more than zero > > > understanding of *why* it's failing, just in case there's more to it > > > than "oh, probably a bug in this old toolchain". But finding that out > > > might well take significant time, and in the end not tell us anything > > > very useful. > > > > Agreed on all those points. > > I reproduced narwhal's problem using its toolchain on another 32-bit Windows > Server 2003 system. The crash happens at the SHGetFolderPath() call in > pqGetHomeDirectory(). A program can acquire that function via shfolder.dll or > via shell32.dll; we've used the former method since commit 889f038, for better > compatibility[1] with Windows NT 4.0. On this system, shfolder.dll's version > loads and unloads shell32.dll. In PostgreSQL built using this older compiler, > shfolder.dll:SHGetFolderPath() unloads libpq in addition to unloading shell32! > That started with commit 846e91e. I don't expect to understand the mechanism > behind it, but I recommend we switch back to linking libpq with shell32.dll. > The MSVC build already does that in all supported branches, and it feels right > for the MinGW build to follow suit in 9.4+. Windows versions that lack the > symbol in shell32.dll are now ancient history. Ick. Nice detective work of a ugly situation. > I happened to try the same contrib/dblink test suite on PostgreSQL built with > modern MinGW-w64 (i686-4.9.1-release-win32-dwarf-rt_v3-rev1). That, too, gave > a crash-like symptom starting with commit 846e91e. Specifically, a backend > that LOADed any module linked to libpq (libpqwalreceiver, dblink, > postgres_fdw) would suffer this after calling exit(0): > > === > 3056 2014-10-20 00:40:15.163 GMT LOG: disconnection: session time: 0:00:00.515 user=cyg_server database=template1 host=127.0.0.1port=3936 > > This application has requested the Runtime to terminate it in an unusual way. > Please contact the application's support team for more information. > > This application has requested the Runtime to terminate it in an unusual way. > Please contact the application's support team for more information. > 9300 2014-10-20 00:40:15.163 GMT LOG: server process (PID 3056) exited with exit code 3 > === > > The mechanism turned out to be disjoint from the mechanism behind the > ancient-compiler crash. Based on the functions called from exit(), my best > guess is that exit() encountered recursion and used something like an abort() > to escape. Hm. > (I can send the gdb transcript if anyone is curious to see the > gory details.) That would be interesting. > The proximate cause was commit 846e91e allowing modules to use > shared libgcc. A 32-bit libpq acquires 64-bit integer division from libgcc. > Passing -static-libgcc to the link restores the libgcc situation as it stood > before commit 846e91e. The main beneficiary of shared libgcc is C++/Java > exception handling, so PostgreSQL doesn't care. No doubt there's some deeper > bug in libgcc or in PostgreSQL; loading a module that links with shared libgcc > should not disrupt exit(). I'm content with this workaround. I'm unconvinced by this reasoning. Popular postgres extensions like postgis do use C++. It's imo not hard to imagine situations where switching to a statically linked libgcc statically could cause problems. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: