Thread: Re: BUG #5862: Postgres dumps core upon a connection attempt

Re: BUG #5862: Postgres dumps core upon a connection attempt

From
"Kevin Grittner"
Date:
"Matt Zinicola"  wrote:

> PostgreSQL version: 9.0.3
> Operating system: Linux (Fedora 14, kernel 2.6.35-10-74), 64-bit
> Description: Postgres dumps core upon a connection attempt
> Details:
>
> A simple compile from source and install (as per usual) on Fedora
> 14 yielded crashes of client applications attempting to connect.
>
> I first observed this with archiveopeteryx. As a sanity check, I
> then attempted a connection with psql itself, which also crashed.
>
> Please let me know if further information is needed.

Build options?  Error messages?  Contents of log files?  Backtrace
from the core file you mentioned?

-Kevin

Re: BUG #5862: Postgres dumps core upon a connection attempt

From
Matt Zinicola
Date:
Apologies for lack of detail.  Although I've been using Postgres for
years, this is the first time I've had such an issue.

Build options were only --with-perl and --with-python

Below is the output when two different applications attempt to connect
to my 9.0.3 server (note, the second is psql itself):

[root@infinity postgres]# /etc/init.d/archiveopteryx start
Starting Archiveopteryx: aox: Couldn't connect to PostgreSQL. (on
backend 1)
/etc/init.d/archiveopteryx: line 24:  4240 Segmentation
fault      /usr/local/archiveopteryx/bin/aox start
done.

[postgres@infinity scripts]$ psql template1
Segmentation fault (core dumped)



Kevin suggested doing a 'make check'.  I did so, and it ended with the
following:

mkdir ./testtablespace
./pg_regress --inputdir=. --dlpath=. --multibyte=SQL_ASCII
--temp-install=./tmp_check --top-builddir=../../..
--schedule=./parallel_schedule
make[2]: *** [check] Segmentation fault (core dumped)
make[2]: Leaving directory
`/usr/local/src/postgresql-9.0.3/src/test/regress'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/usr/local/src/postgresql-9.0.3/src/test'
make: *** [check] Error 2

Also, my server doesn't seem to be logging anything, either (although
I'm using the same configuration and start script as 9.0.2)

Lastly, I don't see any 'core' files in the places I would expect.
If/when I find them, I can send along.

- Matt




On Wed, 2011-02-02 at 15:56 -0600, Kevin Grittner wrote:
> "Matt Zinicola"  wrote:
>
> > PostgreSQL version: 9.0.3
> > Operating system: Linux (Fedora 14, kernel 2.6.35-10-74), 64-bit
> > Description: Postgres dumps core upon a connection attempt
> > Details:
> >
> > A simple compile from source and install (as per usual) on Fedora
> > 14 yielded crashes of client applications attempting to connect.
> >
> > I first observed this with archiveopeteryx. As a sanity check, I
> > then attempted a connection with psql itself, which also crashed.
> >
> > Please let me know if further information is needed.
>
> Build options?  Error messages?  Contents of log files?  Backtrace
> from the core file you mentioned?
>
> -Kevin

Re: BUG #5862: Postgres dumps core upon a connection attempt

From
Craig Ringer
Date:
On 03/02/11 09:53, Matt Zinicola wrote:
> Apologies for lack of detail.  Although I've been using Postgres for
> years, this is the first time I've had such an issue.
>
> Build options were only --with-perl and --with-python
>
> Below is the output when two different applications attempt to connect
> to my 9.0.3 server (note, the second is psql itself):
>
> [root@infinity postgres]# /etc/init.d/archiveopteryx start
> Starting Archiveopteryx: aox: Couldn't connect to PostgreSQL. (on
> backend 1)
> /etc/init.d/archiveopteryx: line 24:  4240 Segmentation
> fault      /usr/local/archiveopteryx/bin/aox start
> done.
>
> [postgres@infinity scripts]$ psql template1
> Segmentation fault (core dumped)

OK, so it's not the PostgreSQL backend that's crashing, it's psql.

You almost certainly have conflicting libraries lurking around
somewhere, so psql was built against one libpq but lands up getting
linked to another at runtime.

--
System & Network Administrator
POST Newspapers

Re: BUG #5862: Postgres dumps core upon a connection attempt

From
Matt Zinicola
Date:
Hrm.

I did see the Fedora stashed copies of libpq.so.5 and libpq.so.5.2 in
/usr/lib64.  I looked everywhere on the system for libpq.so*, and saw that the
only remaining copies where those in my source directory... so I re-built
9.0.3.  A 'make check' still died in the same place within the regression
tests.  I did a 'make install' anyhow.  I cleaned out my data directory and
attempted a new initdb with 9.0.3.  That seg faulted as well:

[postgres@infinity local]$ /usr/local/pgsql/bin/initdb -D /data/postgres
Segmentation fault (core dumped)

Any other suggestions?
- Matt



Quoting Craig Ringer <craig@postnewspapers.com.au>:

> On 03/02/11 09:53, Matt Zinicola wrote:
> > Apologies for lack of detail.  Although I've been using Postgres for
> > years, this is the first time I've had such an issue.
> >
> > Build options were only --with-perl and --with-python
> >
> > Below is the output when two different applications attempt to connect
> > to my 9.0.3 server (note, the second is psql itself):
> >
> > [root@infinity postgres]# /etc/init.d/archiveopteryx start
> > Starting Archiveopteryx: aox: Couldn't connect to PostgreSQL. (on
> > backend 1)
> > /etc/init.d/archiveopteryx: line 24:  4240 Segmentation
> > fault      /usr/local/archiveopteryx/bin/aox start
> > done.
> >
> > [postgres@infinity scripts]$ psql template1
> > Segmentation fault (core dumped)
>
> OK, so it's not the PostgreSQL backend that's crashing, it's psql.
>
> You almost certainly have conflicting libraries lurking around
> somewhere, so psql was built against one libpq but lands up getting
> linked to another at runtime.
>
> --
> System & Network Administrator
> POST Newspapers
>

Re: BUG #5862: Postgres dumps core upon a connection attempt

From
Craig Ringer
Date:
On 03/02/11 10:33, Matt Zinicola wrote:
>
> Hrm.
>
> I did see the Fedora stashed copies of libpq.so.5 and libpq.so.5.2 in
> /usr/lib64.  I looked everywhere on the system for libpq.so*, and saw that the
> only remaining copies where those in my source directory... so I re-built
> 9.0.3.  A 'make check' still died in the same place within the regression
> tests.  I did a 'make install' anyhow.  I cleaned out my data directory and
> attempted a new initdb with 9.0.3.  That seg faulted as well:
>
> [postgres@infinity local]$ /usr/local/pgsql/bin/initdb -D /data/postgres
> Segmentation fault (core dumped)

What does:

  ldd /usr/local/pgsql/bin/initdb

say?

--
System & Network Administrator
POST Newspapers

Re: BUG #5862: Postgres dumps core upon a connection attempt

From
Craig Ringer
Date:
On 03/02/11 11:11, Matt Zinicola wrote:

OK, it doesn't seem to be a simple problem of linking to the wrong
library then. psql is linking to the correct libpq. initdb isn't linking
to anything much at all, but still crashes for no apparent reason.
Something else may be going on. Please supply the full command line you
used to "./configure" when compiling postgres. If you're not sure what
it was, you can find it in the top of "config.log" in your compile
directory.

Is there any chance you can get us a backtrace of one of the crashing
programs? Try this:

  gdb --args psql

Once it loads, it'll drop you to a

  (gdb)

prompt. Enter "run" then press enter.

  (gdb) run

Psql will then load for a while, crash, and drop you back to a (gdb)
prompt after printing out a message like:

 Program received signal SIGSEGV, Segmentation fault.

Enter the "bt" command at the (gdb) prompt and press enter.

  (gdb) bt

... then copy and paste everything from "gdb --args psql" through to the
end of the output printed by "bt", put it on http://pastebin.com/  and
send a link to that in your reply email here.

I've created a sample to give you the idea, by starting psql then
intentionally crashing it by sending it a manual SIGSEGV. See:

  http://pastebin.com/b8D9i2tb


--
System & Network Administrator
POST Newspapers

Re: BUG #5862: Postgres dumps core upon a connection attempt

From
Matt Zinicola
Date:
As far as the configure options -- Originally, they were merely --with-perl and
--with-python, but just to rule out problems there, I've since just been going
with a straight compile (not additional options).

I will get the backtrace, etc. within the next hour or so.  Thanks!

- Matt


Quoting Craig Ringer <craig@postnewspapers.com.au>:

> On 03/02/11 11:11, Matt Zinicola wrote:
>
> OK, it doesn't seem to be a simple problem of linking to the wrong
> library then. psql is linking to the correct libpq. initdb isn't linking
> to anything much at all, but still crashes for no apparent reason.
> Something else may be going on. Please supply the full command line you
> used to "./configure" when compiling postgres. If you're not sure what
> it was, you can find it in the top of "config.log" in your compile
> directory.
>
> Is there any chance you can get us a backtrace of one of the crashing
> programs? Try this:
>
>   gdb --args psql
>
> Once it loads, it'll drop you to a
>
>   (gdb)
>
> prompt. Enter "run" then press enter.
>
>   (gdb) run
>
> Psql will then load for a while, crash, and drop you back to a (gdb)
> prompt after printing out a message like:
>
>  Program received signal SIGSEGV, Segmentation fault.
>
> Enter the "bt" command at the (gdb) prompt and press enter.
>
>   (gdb) bt
>
> ... then copy and paste everything from "gdb --args psql" through to the
> end of the output printed by "bt", put it on http://pastebin.com/  and
> send a link to that in your reply email here.
>
> I've created a sample to give you the idea, by starting psql then
> intentionally crashing it by sending it a manual SIGSEGV. See:
>
>   http://pastebin.com/b8D9i2tb
>
>
> --
> System & Network Administrator
> POST Newspapers
>

Re: BUG #5862: Postgres dumps core upon a connection attempt

From
Craig Ringer
Date:
On 02/03/2011 11:15 PM, Matt Zinicola wrote:
>
> I re-compiled with '--enable-debug' and got the symbols.  The pastebin is at
> http://pastebin.com/xMhEHFdT

That's really interesting. It's getting a NULL path pointer when - I
think - it tries to determine the location of the executables.

Presumably this is something bizarre in your environment - but I have no
idea what it might be. Maybe someone else reading will have an idea.

(Please reply-to-all on further messages so the -bugs list sees things)

--
Craig Ringer

Re: BUG #5862: Postgres dumps core upon a connection attempt

From
Tom Lane
Date:
Craig Ringer <craig@postnewspapers.com.au> writes:
> On 02/03/2011 11:15 PM, Matt Zinicola wrote:
>> I re-compiled with '--enable-debug' and got the symbols.  The pastebin is at
>> http://pastebin.com/xMhEHFdT

> That's really interesting. It's getting a NULL path pointer when - I
> think - it tries to determine the location of the executables.

Hmm ... gdb is evidently lying to us to some extent, because some of
those variables can't possibly be NULL, and control wouldn't have got
to where it says if others of them were.  However, it seems clear that
it's dying while trying to determine the actual location of the initdb
executable.  Are there any symlinks involved in the path
/usr/local/pgsql/bin/initdb ?  Is that located on an unusual filesystem?

            regards, tom lane

Re: BUG #5862: Postgres dumps core upon a connection attempt

From
Matthew Zinicola
Date:
On Thu, 2011-02-03 at 22:23 -0500, Tom Lane wrote:
> Craig Ringer <craig@postnewspapers.com.au> writes:
> > On 02/03/2011 11:15 PM, Matt Zinicola wrote:
> >> I re-compiled with '--enable-debug' and got the symbols.  The pastebin is at
> >> http://pastebin.com/xMhEHFdT
>
> > That's really interesting. It's getting a NULL path pointer when - I
> > think - it tries to determine the location of the executables.
>
> Hmm ... gdb is evidently lying to us to some extent, because some of
> those variables can't possibly be NULL, and control wouldn't have got
> to where it says if others of them were.  However, it seems clear that
> it's dying while trying to determine the actual location of the initdb
> executable.  Are there any symlinks involved in the path
> /usr/local/pgsql/bin/initdb ?  Is that located on an unusual filesystem?
>
>             regards, tom lane

It wasn't an unusual filesystem (other than being within a logical
volume).  Nothing out of the ordinary -- a local /ext3 filesystem.  I
did a clean re-install of Fedora from scratch, and boom!  Postgres
compiled and installed just fine.

Two interesting tidits here (perhaps of note)  -- 1) Against my
judgment, I had been using Fedora's upgrade process the last two times I
updated (from F12 to F13, and from F13 to F14).  I wonder if that
botched something in my environment and 2) Nothing else on the system
seemed to have trouble (at least up until that point in time).   I
suspect it was definitely something underneath Postgres, as when I
deleted everything in /usr/local/pgsql and my cluster (/data/postgres)
and started anew, it still had the very same problem.  Just some
additional points of info.

In any case... a "clean" install of Fedora 14 did not yield this
problem, so feel free to close this issue if/when you feel appropriate.

Thanks to everyone that was lending assistance.  It's much appreciated.

- Matt

Re: BUG #5862: Postgres dumps core upon a connection attempt

From
Mark Kirkwood
Date:
On 04/02/11 15:11, Craig Ringer wrote:
> On 02/03/2011 11:15 PM, Matt Zinicola wrote:
>>
>> I re-compiled with '--enable-debug' and got the symbols.  The
>> pastebin is at
>> http://pastebin.com/xMhEHFdT
>
> That's really interesting. It's getting a NULL path pointer when - I
> think - it tries to determine the location of the executables.
>
> Presumably this is something bizarre in your environment - but I have
> no idea what it might be. Maybe someone else reading will have an idea.
>

(Coming in too late, but...)

I'd be interested to see what happens if you do:

$ export PATH=/usr/local/pgsql/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/pgsql/lib
$ initdb -D /data/postgres
$ pg_ctl -D /data/postgres start;
$ psql

I'm guessing that there are older libraries or binaries earlier in your
various env paths, and these are tripping up postgres.

Cheers

Mark

Re: BUG #5862: Postgres dumps core upon a connection attempt

From
Tom Lane
Date:
Matthew Zinicola <matt@zinicola.com> writes:
> It wasn't an unusual filesystem (other than being within a logical
> volume).  Nothing out of the ordinary -- a local /ext3 filesystem.  I
> did a clean re-install of Fedora from scratch, and boom!  Postgres
> compiled and installed just fine.

> Two interesting tidits here (perhaps of note)  -- 1) Against my
> judgment, I had been using Fedora's upgrade process the last two times I
> updated (from F12 to F13, and from F13 to F14).  I wonder if that
> botched something in my environment and 2) Nothing else on the system
> seemed to have trouble (at least up until that point in time).

Hmm.  Given that you couldn't reproduce it on a clean system, I'd have
to agree that it sounds like something was a bit wacko about the
upgraded system.  One does hear of people having trouble with that
process from time to time.

            regards, tom lane