Thread: Re: BUG #5862: Postgres dumps core upon a connection attempt
"Matt Zinicola" wrote: > PostgreSQL version: 9.0.3 > Operating system: Linux (Fedora 14, kernel 2.6.35-10-74), 64-bit > Description: Postgres dumps core upon a connection attempt > Details: > > A simple compile from source and install (as per usual) on Fedora > 14 yielded crashes of client applications attempting to connect. > > I first observed this with archiveopeteryx. As a sanity check, I > then attempted a connection with psql itself, which also crashed. > > Please let me know if further information is needed. Build options? Error messages? Contents of log files? Backtrace from the core file you mentioned? -Kevin
Apologies for lack of detail. Although I've been using Postgres for years, this is the first time I've had such an issue. Build options were only --with-perl and --with-python Below is the output when two different applications attempt to connect to my 9.0.3 server (note, the second is psql itself): [root@infinity postgres]# /etc/init.d/archiveopteryx start Starting Archiveopteryx: aox: Couldn't connect to PostgreSQL. (on backend 1) /etc/init.d/archiveopteryx: line 24: 4240 Segmentation fault /usr/local/archiveopteryx/bin/aox start done. [postgres@infinity scripts]$ psql template1 Segmentation fault (core dumped) Kevin suggested doing a 'make check'. I did so, and it ended with the following: mkdir ./testtablespace ./pg_regress --inputdir=. --dlpath=. --multibyte=SQL_ASCII --temp-install=./tmp_check --top-builddir=../../.. --schedule=./parallel_schedule make[2]: *** [check] Segmentation fault (core dumped) make[2]: Leaving directory `/usr/local/src/postgresql-9.0.3/src/test/regress' make[1]: *** [check] Error 2 make[1]: Leaving directory `/usr/local/src/postgresql-9.0.3/src/test' make: *** [check] Error 2 Also, my server doesn't seem to be logging anything, either (although I'm using the same configuration and start script as 9.0.2) Lastly, I don't see any 'core' files in the places I would expect. If/when I find them, I can send along. - Matt On Wed, 2011-02-02 at 15:56 -0600, Kevin Grittner wrote: > "Matt Zinicola" wrote: > > > PostgreSQL version: 9.0.3 > > Operating system: Linux (Fedora 14, kernel 2.6.35-10-74), 64-bit > > Description: Postgres dumps core upon a connection attempt > > Details: > > > > A simple compile from source and install (as per usual) on Fedora > > 14 yielded crashes of client applications attempting to connect. > > > > I first observed this with archiveopeteryx. As a sanity check, I > > then attempted a connection with psql itself, which also crashed. > > > > Please let me know if further information is needed. > > Build options? Error messages? Contents of log files? Backtrace > from the core file you mentioned? > > -Kevin
On 03/02/11 09:53, Matt Zinicola wrote: > Apologies for lack of detail. Although I've been using Postgres for > years, this is the first time I've had such an issue. > > Build options were only --with-perl and --with-python > > Below is the output when two different applications attempt to connect > to my 9.0.3 server (note, the second is psql itself): > > [root@infinity postgres]# /etc/init.d/archiveopteryx start > Starting Archiveopteryx: aox: Couldn't connect to PostgreSQL. (on > backend 1) > /etc/init.d/archiveopteryx: line 24: 4240 Segmentation > fault /usr/local/archiveopteryx/bin/aox start > done. > > [postgres@infinity scripts]$ psql template1 > Segmentation fault (core dumped) OK, so it's not the PostgreSQL backend that's crashing, it's psql. You almost certainly have conflicting libraries lurking around somewhere, so psql was built against one libpq but lands up getting linked to another at runtime. -- System & Network Administrator POST Newspapers
Hrm. I did see the Fedora stashed copies of libpq.so.5 and libpq.so.5.2 in /usr/lib64. I looked everywhere on the system for libpq.so*, and saw that the only remaining copies where those in my source directory... so I re-built 9.0.3. A 'make check' still died in the same place within the regression tests. I did a 'make install' anyhow. I cleaned out my data directory and attempted a new initdb with 9.0.3. That seg faulted as well: [postgres@infinity local]$ /usr/local/pgsql/bin/initdb -D /data/postgres Segmentation fault (core dumped) Any other suggestions? - Matt Quoting Craig Ringer <craig@postnewspapers.com.au>: > On 03/02/11 09:53, Matt Zinicola wrote: > > Apologies for lack of detail. Although I've been using Postgres for > > years, this is the first time I've had such an issue. > > > > Build options were only --with-perl and --with-python > > > > Below is the output when two different applications attempt to connect > > to my 9.0.3 server (note, the second is psql itself): > > > > [root@infinity postgres]# /etc/init.d/archiveopteryx start > > Starting Archiveopteryx: aox: Couldn't connect to PostgreSQL. (on > > backend 1) > > /etc/init.d/archiveopteryx: line 24: 4240 Segmentation > > fault /usr/local/archiveopteryx/bin/aox start > > done. > > > > [postgres@infinity scripts]$ psql template1 > > Segmentation fault (core dumped) > > OK, so it's not the PostgreSQL backend that's crashing, it's psql. > > You almost certainly have conflicting libraries lurking around > somewhere, so psql was built against one libpq but lands up getting > linked to another at runtime. > > -- > System & Network Administrator > POST Newspapers >
On 03/02/11 10:33, Matt Zinicola wrote: > > Hrm. > > I did see the Fedora stashed copies of libpq.so.5 and libpq.so.5.2 in > /usr/lib64. I looked everywhere on the system for libpq.so*, and saw that the > only remaining copies where those in my source directory... so I re-built > 9.0.3. A 'make check' still died in the same place within the regression > tests. I did a 'make install' anyhow. I cleaned out my data directory and > attempted a new initdb with 9.0.3. That seg faulted as well: > > [postgres@infinity local]$ /usr/local/pgsql/bin/initdb -D /data/postgres > Segmentation fault (core dumped) What does: ldd /usr/local/pgsql/bin/initdb say? -- System & Network Administrator POST Newspapers
On 03/02/11 11:11, Matt Zinicola wrote: OK, it doesn't seem to be a simple problem of linking to the wrong library then. psql is linking to the correct libpq. initdb isn't linking to anything much at all, but still crashes for no apparent reason. Something else may be going on. Please supply the full command line you used to "./configure" when compiling postgres. If you're not sure what it was, you can find it in the top of "config.log" in your compile directory. Is there any chance you can get us a backtrace of one of the crashing programs? Try this: gdb --args psql Once it loads, it'll drop you to a (gdb) prompt. Enter "run" then press enter. (gdb) run Psql will then load for a while, crash, and drop you back to a (gdb) prompt after printing out a message like: Program received signal SIGSEGV, Segmentation fault. Enter the "bt" command at the (gdb) prompt and press enter. (gdb) bt ... then copy and paste everything from "gdb --args psql" through to the end of the output printed by "bt", put it on http://pastebin.com/ and send a link to that in your reply email here. I've created a sample to give you the idea, by starting psql then intentionally crashing it by sending it a manual SIGSEGV. See: http://pastebin.com/b8D9i2tb -- System & Network Administrator POST Newspapers
As far as the configure options -- Originally, they were merely --with-perl and --with-python, but just to rule out problems there, I've since just been going with a straight compile (not additional options). I will get the backtrace, etc. within the next hour or so. Thanks! - Matt Quoting Craig Ringer <craig@postnewspapers.com.au>: > On 03/02/11 11:11, Matt Zinicola wrote: > > OK, it doesn't seem to be a simple problem of linking to the wrong > library then. psql is linking to the correct libpq. initdb isn't linking > to anything much at all, but still crashes for no apparent reason. > Something else may be going on. Please supply the full command line you > used to "./configure" when compiling postgres. If you're not sure what > it was, you can find it in the top of "config.log" in your compile > directory. > > Is there any chance you can get us a backtrace of one of the crashing > programs? Try this: > > gdb --args psql > > Once it loads, it'll drop you to a > > (gdb) > > prompt. Enter "run" then press enter. > > (gdb) run > > Psql will then load for a while, crash, and drop you back to a (gdb) > prompt after printing out a message like: > > Program received signal SIGSEGV, Segmentation fault. > > Enter the "bt" command at the (gdb) prompt and press enter. > > (gdb) bt > > ... then copy and paste everything from "gdb --args psql" through to the > end of the output printed by "bt", put it on http://pastebin.com/ and > send a link to that in your reply email here. > > I've created a sample to give you the idea, by starting psql then > intentionally crashing it by sending it a manual SIGSEGV. See: > > http://pastebin.com/b8D9i2tb > > > -- > System & Network Administrator > POST Newspapers >
On 02/03/2011 11:15 PM, Matt Zinicola wrote: > > I re-compiled with '--enable-debug' and got the symbols. The pastebin is at > http://pastebin.com/xMhEHFdT That's really interesting. It's getting a NULL path pointer when - I think - it tries to determine the location of the executables. Presumably this is something bizarre in your environment - but I have no idea what it might be. Maybe someone else reading will have an idea. (Please reply-to-all on further messages so the -bugs list sees things) -- Craig Ringer
Craig Ringer <craig@postnewspapers.com.au> writes: > On 02/03/2011 11:15 PM, Matt Zinicola wrote: >> I re-compiled with '--enable-debug' and got the symbols. The pastebin is at >> http://pastebin.com/xMhEHFdT > That's really interesting. It's getting a NULL path pointer when - I > think - it tries to determine the location of the executables. Hmm ... gdb is evidently lying to us to some extent, because some of those variables can't possibly be NULL, and control wouldn't have got to where it says if others of them were. However, it seems clear that it's dying while trying to determine the actual location of the initdb executable. Are there any symlinks involved in the path /usr/local/pgsql/bin/initdb ? Is that located on an unusual filesystem? regards, tom lane
On Thu, 2011-02-03 at 22:23 -0500, Tom Lane wrote: > Craig Ringer <craig@postnewspapers.com.au> writes: > > On 02/03/2011 11:15 PM, Matt Zinicola wrote: > >> I re-compiled with '--enable-debug' and got the symbols. The pastebin is at > >> http://pastebin.com/xMhEHFdT > > > That's really interesting. It's getting a NULL path pointer when - I > > think - it tries to determine the location of the executables. > > Hmm ... gdb is evidently lying to us to some extent, because some of > those variables can't possibly be NULL, and control wouldn't have got > to where it says if others of them were. However, it seems clear that > it's dying while trying to determine the actual location of the initdb > executable. Are there any symlinks involved in the path > /usr/local/pgsql/bin/initdb ? Is that located on an unusual filesystem? > > regards, tom lane It wasn't an unusual filesystem (other than being within a logical volume). Nothing out of the ordinary -- a local /ext3 filesystem. I did a clean re-install of Fedora from scratch, and boom! Postgres compiled and installed just fine. Two interesting tidits here (perhaps of note) -- 1) Against my judgment, I had been using Fedora's upgrade process the last two times I updated (from F12 to F13, and from F13 to F14). I wonder if that botched something in my environment and 2) Nothing else on the system seemed to have trouble (at least up until that point in time). I suspect it was definitely something underneath Postgres, as when I deleted everything in /usr/local/pgsql and my cluster (/data/postgres) and started anew, it still had the very same problem. Just some additional points of info. In any case... a "clean" install of Fedora 14 did not yield this problem, so feel free to close this issue if/when you feel appropriate. Thanks to everyone that was lending assistance. It's much appreciated. - Matt
On 04/02/11 15:11, Craig Ringer wrote: > On 02/03/2011 11:15 PM, Matt Zinicola wrote: >> >> I re-compiled with '--enable-debug' and got the symbols. The >> pastebin is at >> http://pastebin.com/xMhEHFdT > > That's really interesting. It's getting a NULL path pointer when - I > think - it tries to determine the location of the executables. > > Presumably this is something bizarre in your environment - but I have > no idea what it might be. Maybe someone else reading will have an idea. > (Coming in too late, but...) I'd be interested to see what happens if you do: $ export PATH=/usr/local/pgsql/bin:$PATH $ export LD_LIBRARY_PATH=/usr/local/pgsql/lib $ initdb -D /data/postgres $ pg_ctl -D /data/postgres start; $ psql I'm guessing that there are older libraries or binaries earlier in your various env paths, and these are tripping up postgres. Cheers Mark
Matthew Zinicola <matt@zinicola.com> writes: > It wasn't an unusual filesystem (other than being within a logical > volume). Nothing out of the ordinary -- a local /ext3 filesystem. I > did a clean re-install of Fedora from scratch, and boom! Postgres > compiled and installed just fine. > Two interesting tidits here (perhaps of note) -- 1) Against my > judgment, I had been using Fedora's upgrade process the last two times I > updated (from F12 to F13, and from F13 to F14). I wonder if that > botched something in my environment and 2) Nothing else on the system > seemed to have trouble (at least up until that point in time). Hmm. Given that you couldn't reproduce it on a clean system, I'd have to agree that it sounds like something was a bit wacko about the upgraded system. One does hear of people having trouble with that process from time to time. regards, tom lane