Thread: Postgresql Segfault in 8.1

Postgresql Segfault in 8.1

From
Benjamin Smith
Date:
I'm running PostgreSQL 8.1 on CentOS 4.2, Dual proc Athlon 64 w/4 GB RAM.

I'm trying to get a PHP app to work, but the failure happens when the command
is copy/pasted into pgsql. Trying to run a large insert statement, and I get:

server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
#

in /var/log/messages, I see

Jan 24 17:00:04 kepler kernel: postmaster[26185]: segfault at 000000002516d728
rip 000000000043c82c rsp 0000007fbfffddd0 error 4

The insert statement is long, but doesn't seem to violate anything strange -
no weird characters, and all the fields have been properly escaped with
pg_escape().

What information do you need to help figure this out?

-Ben
--
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

Re: Postgresql Segfault in 8.1

From
Doug McNaught
Date:
Benjamin Smith <lists@benjamindsmith.com> writes:

> in /var/log/messages, I see
>
> Jan 24 17:00:04 kepler kernel: postmaster[26185]: segfault at 000000002516d728
> rip 000000000043c82c rsp 0000007fbfffddd0 error 4
>
> The insert statement is long, but doesn't seem to violate anything strange -
> no weird characters, and all the fields have been properly escaped with
> pg_escape().
>
> What information do you need to help figure this out?

Reproduce it with gdb attached to the backend process and post the
backtrace...  You may need to recompile PG with debugging symbols to
get the most info.

-Doug

Re: Postgresql Segfault in 8.1

From
Tom Lane
Date:
Benjamin Smith <lists@benjamindsmith.com> writes:
> What information do you need to help figure this out?

Best is to provide a recipe by which someone else can reproduce it
from a standing start.  You can find some useful hints at
http://www.postgresql.org/docs/8.1/static/bug-reporting.html

            regards, tom lane

Re: Postgresql Segfault in 8.1

From
Benjamin Smith
Date:
Thanks,

What's the best way to do this? Take PG down (normally started as a service)
and run directly in a single-user mode?

I've never reallly worked with gdb...

-Ben

On Tuesday 24 January 2006 17:27, you wrote:
> > What information do you need to help figure this out?
>
> Reproduce it with gdb attached to the backend process and post the
> backtrace...  You may need to recompile PG with debugging symbols to
> get the most info.

--
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978


Re: Postgresql Segfault in 8.1

From
Tom Lane
Date:
Benjamin Smith <lists@benjamindsmith.com> writes:
> What's the best way to do this? Take PG down (normally started as a service)
> and run directly in a single-user mode?

No, just start a psql session in one window, then in another window
determine the PID of the backend process it's connected to, and attach
gdb to that process.  Something like

        ps auxww | grep postgres:
        ... eyeball determination of correct PID ...
        gdb /path/to/postgres-executable PID
        gdb> continue

Now, in the psql window, do what's needed to provoke the crash.  gdb
should trap at the instant of the segfault and give you another gdb>
prompt.  Type "bt" to get the backtrace, then "q" to disconnect.

            regards, tom lane

Re: Postgresql Segfault in 8.1

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Benjamin Smith <lists@benjamindsmith.com> writes:
> > What's the best way to do this? Take PG down (normally started as a service)
> > and run directly in a single-user mode?
>
> No, just start a psql session in one window, then in another window
> determine the PID of the backend process it's connected to, and attach
> gdb to that process.  Something like
>
>         ps auxww | grep postgres:
>         ... eyeball determination of correct PID ...

You can also do 'select pg_backend_pid();' from psql...  I know that's
there on 8.1, though I recall this was an 8.0 discussion and I *think*
it's there too but not 100% sure.

    Enjoy,

        Stephen

Attachment

Re: Postgresql Segfault in 8.1

From
Benjamin Smith
Date:
OK, here's the output:

(gdb) continue
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x000000000043c82c in heap_modifytuple ()
(gdb)

// not very hopeful, I'd think //

Can I get a RHES/CENTOS PG 8.1 RPM with the symbols enabled? Unfortunately, I
don't think I can give out a dump of the DB (heavily constrained) because of
private customer information... and the query works *FINE* with different
datasets. There's something specific about THIS QUERY that's causing the
failure.

I'm going to try to get this to fail on another system that's not in
production use, though it's a uniprocessor P4.

-Ben

On Wednesday 25 January 2006 07:52, you wrote:
> Benjamin Smith <lists@benjamindsmith.com> writes:
> > What's the best way to do this? Take PG down (normally started as a
service)
> > and run directly in a single-user mode?
>
> No, just start a psql session in one window, then in another window
> determine the PID of the backend process it's connected to, and attach
> gdb to that process.  Something like
>
>         ps auxww | grep postgres:
>         ... eyeball determination of correct PID ...
>         gdb /path/to/postgres-executable PID
>         gdb> continue
>
> Now, in the psql window, do what's needed to provoke the crash.  gdb
> should trap at the instant of the segfault and give you another gdb>
> prompt.  Type "bt" to get the backtrace, then "q" to disconnect.
>
>             regards, tom lane
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>

--
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

Re: Postgresql Segfault in 8.1

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> ps auxww | grep postgres:
>> ... eyeball determination of correct PID ...

> You can also do 'select pg_backend_pid();' from psql...  I know that's
> there on 8.1, though I recall this was an 8.0 discussion and I *think*
> it's there too but not 100% sure.

Good thought.  I've been doing it via ps since forever, but
pg_backend_pid is more foolproof (and it seems to have been in since
7.3, so that's not a problem).

            regards, tom lane

Re: Postgresql Segfault in 8.1

From
Tom Lane
Date:
Benjamin Smith <lists@benjamindsmith.com> writes:
> OK, here's the output:
> (gdb) continue
> Continuing.

> Program received signal SIGSEGV, Segmentation fault.
> 0x000000000043c82c in heap_modifytuple ()
> (gdb)

> // not very hopeful, I'd think //

You forgot the "bt" part ... although I'm not sure we'd learn a whole
lot more without debug symbols.

> Can I get a RHES/CENTOS PG 8.1 RPM with the symbols enabled?

Current Red Hat practice is to put the debug symbols into separate
"debuginfo" RPMs.  Hopefully you can find the debuginfo RPM wherever
you got the postgres RPM from.

            regards, tom lane

Re: Postgresql Segfault in 8.1

From
Benjamin Smith
Date:
Tom,

Since we host customer data, I have to get OK from the company attorney before
I can give you a full "howto create". I've been unable to recreate it without
a full database dump. I'm waiting for a call back on that. I also can't
recreate it on IA32.

I tried to replicate the issue on a uniproc P4/32, but it worked fine there,
so it does seem to be something specific about the fact that it's either
X86/64 or that it's dual proc. The production server has 4GB of ECC RAM.

I can consistently create the problem by dumping and reloading the database to
a different PG database, and running it there, so AFAICT I'm not bugging
anybody when I run this query.

In the meantime, I found the "debuginfo" rpm, and installed it without a
hitch. Luckily, it seems to "take effect" without having to restart the PG
daemon. (which is busy serving 10-20 people at any given moment...)

Again, here's the output from gdb. This looks a bit more useful, I hope this
helps!

Program received signal SIGSEGV, Segmentation fault.
slot_deform_tuple (slot=0xa669c8, natts=36) at heaptuple.c:1262
1262                    off = att_addlength(off, thisatt->attlen, tp + off);
(gdb) bt
#0  slot_deform_tuple (slot=0xa669c8, natts=36) at heaptuple.c:1262
#1  0x000000000043c8f5 in slot_getattr (slot=0xa669c8, attnum=36,
isnull=0x7fbfffde87 "")
    at heaptuple.c:1367
#2  0x000000000047a50a in FormIndexDatum (indexInfo=0xa66b60, slot=0xa669c8,
    estate=0xa61190, values=0x7fbfffdf10, isnull=0x7fbfffdef0 "") at
index.c:962
#3  0x00000000004ebee3 in ExecInsertIndexTuples (slot=0xa669c8,
tupleid=0xa6efc4,
    estate=0xa61190, is_vacuum=0 '\0') at execUtils.c:925
#4  0x00000000004e5265 in ExecutorRun (queryDesc=Variable "queryDesc" is not
available.
) at execMain.c:1437
#5  0x0000000000564312 in ProcessQuery (parsetree=Variable "parsetree" is not
available.
) at pquery.c:174
#6  0x0000000000565287 in PortalRun (portal=0xa5ed70,
count=9223372036854775807,
    dest=0xa596f8, altdest=0xa596f8, completionTag=0x7fbfffe380 "") at
pquery.c:1076
#7  0x0000000000560f8b in exec_simple_query (
    query_string=0xa440e0 "INSERT INTO lcclasses (id, schoolyear, modified,
entrydate, creator, status, name, location, city, maxclasssize,
prerequisites, cost, costnote, coursecode, section, credits, whytake,
materialsnote, te"...) at postgres.c:1014
#8  0x0000000000562e0e in PostgresMain (argc=4, argv=0xa0cca0,
    username=0xa0cc60 "cworksdev") at postgres.c:3168
#9  0x000000000053d316 in ServerLoop () at postmaster.c:2852
#10 0x000000000053ea59 in PostmasterMain (argc=5, argv=0x9ea510) at
postmaster.c:943
#11 0x00000000005033c3 in main (argc=5, argv=0x9ea510) at main.c:256
(gdb) continue
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

##################################################
Postgresql.conf
listen_addresses = '127.0.0.1'
port = 5432
max_connections = 96
shared_buffers=250000
temp_buffers = 10000
max_prepared_transactions = 0
work_mem = 1024                 # min 64, size in KB
maintenance_work_mem = 16384            # min 1024, size in KB
max_stack_depth = 9240
redirect_stderr = on                    # Enable capturing of stderr into log
log_directory = 'pg_log'                # Directory where log files are
written
log_truncate_on_rotation = on           # If on, any existing log file of the
same
log_rotation_age = 1440                 # Automatic rotation of logfiles will
log_rotation_size = 0                   # Automatic rotation of logfiles will
autovacuum = on
autovacuum_naptime = 600
lc_messages = 'en_US.UTF-8'                     # locale for system error
message
lc_monetary = 'en_US.UTF-8'                     # locale for monetary
formatting
lc_numeric = 'en_US.UTF-8'                      # locale for number formatting
lc_time = 'en_US.UTF-8'                         # locale for time formatting
add_missing_from = on

-Ben

On Wednesday 25 January 2006 11:18, you wrote:
> Benjamin Smith <lists@benjamindsmith.com> writes:
> > OK, here's the output:
> > (gdb) continue
> > Continuing.
>
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x000000000043c82c in heap_modifytuple ()
> > (gdb)
>
> > // not very hopeful, I'd think //
>
> You forgot the "bt" part ... although I'm not sure we'd learn a whole
> lot more without debug symbols.
>
> > Can I get a RHES/CENTOS PG 8.1 RPM with the symbols enabled?
>
> Current Red Hat practice is to put the debug symbols into separate
> "debuginfo" RPMs.  Hopefully you can find the debuginfo RPM wherever
> you got the postgres RPM from.
>
>             regards, tom lane
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>

--
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

Re: Postgresql Segfault in 8.1

From
Tom Lane
Date:
Benjamin Smith <lists@benjamindsmith.com> writes:
> Aha, yep. Sorry:
> Program received signal SIGSEGV, Segmentation fault.
> 0x000000000043c82c in heap_modifytuple ()
> (gdb) bt
> #0  0x000000000043c82c in heap_modifytuple ()
> #1  0x000000000043c8f5 in slot_getattr ()
> #2  0x000000000047a50a in FormIndexDatum ()
> #3  0x00000000004ebee3 in ExecInsertIndexTuples ()
> #4  0x00000000004e5265 in ExecutorRun ()
> #5  0x0000000000564312 in FreeQueryDesc ()
> #6  0x0000000000565287 in PortalRun ()
> #7  0x0000000000560f8b in pg_parse_query ()
> #8  0x0000000000562e0e in PostgresMain ()
> #9  0x000000000053d316 in ClosePostmasterPorts ()
> #10 0x000000000053ea59 in PostmasterMain ()
> #11 0x00000000005033c3 in main ()

Oh, so this is happening during index entry creation?  (The reference to
heap_modifytuple is misleading, but in a debug-symbol-free backend it's
not so surprising.)

This suddenly looks a whole lot like a known bug:
http://archives.postgresql.org/pgsql-hackers/2005-11/msg01016.php

Which version did you say you were using exactly?  That bug is fixed
in 8.1.1 ...

            regards, tom lane

Re: Postgresql Segfault in 8.1

From
Benjamin Smith
Date:
Version: postgresql-8.1.0-4.c4

I'll have to see about getting an update...

Thanks a TON,

-Ben

On Wednesday 25 January 2006 13:11, you wrote:
> Benjamin Smith <lists@benjamindsmith.com> writes:
> > Aha, yep. Sorry:
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x000000000043c82c in heap_modifytuple ()
> > (gdb) bt
> > #0  0x000000000043c82c in heap_modifytuple ()
> > #1  0x000000000043c8f5 in slot_getattr ()
> > #2  0x000000000047a50a in FormIndexDatum ()
> > #3  0x00000000004ebee3 in ExecInsertIndexTuples ()
> > #4  0x00000000004e5265 in ExecutorRun ()
> > #5  0x0000000000564312 in FreeQueryDesc ()
> > #6  0x0000000000565287 in PortalRun ()
> > #7  0x0000000000560f8b in pg_parse_query ()
> > #8  0x0000000000562e0e in PostgresMain ()
> > #9  0x000000000053d316 in ClosePostmasterPorts ()
> > #10 0x000000000053ea59 in PostmasterMain ()
> > #11 0x00000000005033c3 in main ()
>
> Oh, so this is happening during index entry creation?  (The reference to
> heap_modifytuple is misleading, but in a debug-symbol-free backend it's
> not so surprising.)
>
> This suddenly looks a whole lot like a known bug:
> http://archives.postgresql.org/pgsql-hackers/2005-11/msg01016.php
>
> Which version did you say you were using exactly?  That bug is fixed
> in 8.1.1 ...
>
>             regards, tom lane
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>

--
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

Re: Postgresql Segfault in 8.1

From
Aly Dharshi
Date:
Hello Ben,

    Unless I am wrong there is a set of RPMs available via the PostgreSQL site for
8.1.1/2 http://www.postgresql.org/ftp/binary/v8.1.2/linux/rpms/

    Aly.

Benjamin Smith wrote:
> Version: postgresql-8.1.0-4.c4
>
> I'll have to see about getting an update...
>
> Thanks a TON,
>
> -Ben
>
> On Wednesday 25 January 2006 13:11, you wrote:
>> Benjamin Smith <lists@benjamindsmith.com> writes:
>>> Aha, yep. Sorry:
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x000000000043c82c in heap_modifytuple ()
>>> (gdb) bt
>>> #0  0x000000000043c82c in heap_modifytuple ()
>>> #1  0x000000000043c8f5 in slot_getattr ()
>>> #2  0x000000000047a50a in FormIndexDatum ()
>>> #3  0x00000000004ebee3 in ExecInsertIndexTuples ()
>>> #4  0x00000000004e5265 in ExecutorRun ()
>>> #5  0x0000000000564312 in FreeQueryDesc ()
>>> #6  0x0000000000565287 in PortalRun ()
>>> #7  0x0000000000560f8b in pg_parse_query ()
>>> #8  0x0000000000562e0e in PostgresMain ()
>>> #9  0x000000000053d316 in ClosePostmasterPorts ()
>>> #10 0x000000000053ea59 in PostmasterMain ()
>>> #11 0x00000000005033c3 in main ()
>> Oh, so this is happening during index entry creation?  (The reference to
>> heap_modifytuple is misleading, but in a debug-symbol-free backend it's
>> not so surprising.)
>>
>> This suddenly looks a whole lot like a known bug:
>> http://archives.postgresql.org/pgsql-hackers/2005-11/msg01016.php
>>
>> Which version did you say you were using exactly?  That bug is fixed
>> in 8.1.1 ...
>>
>>             regards, tom lane
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>

--
Aly S.P Dharshi
aly.dharshi@telus.net

     "A good speech is like a good dress
      that's short enough to be interesting
      and long enough to cover the subject"

Re: Postgresql Segfault in 8.1

From
Benjamin Smith
Date:
// FIXED //

Tom, thank you so much for your help! Now running 8.1.2, the query now works
quickly and properly.

-Ben

On Wednesday 25 January 2006 13:17, Benjamin Smith wrote:
> Version: postgresql-8.1.0-4.c4
>
> I'll have to see about getting an update...
>
> Thanks a TON,
>
> -Ben
>
> On Wednesday 25 January 2006 13:11, you wrote:
> > Benjamin Smith <lists@benjamindsmith.com> writes:
> > > Aha, yep. Sorry:
> > > Program received signal SIGSEGV, Segmentation fault.
> > > 0x000000000043c82c in heap_modifytuple ()
> > > (gdb) bt
> > > #0  0x000000000043c82c in heap_modifytuple ()
> > > #1  0x000000000043c8f5 in slot_getattr ()
> > > #2  0x000000000047a50a in FormIndexDatum ()
> > > #3  0x00000000004ebee3 in ExecInsertIndexTuples ()
> > > #4  0x00000000004e5265 in ExecutorRun ()
> > > #5  0x0000000000564312 in FreeQueryDesc ()
> > > #6  0x0000000000565287 in PortalRun ()
> > > #7  0x0000000000560f8b in pg_parse_query ()
> > > #8  0x0000000000562e0e in PostgresMain ()
> > > #9  0x000000000053d316 in ClosePostmasterPorts ()
> > > #10 0x000000000053ea59 in PostmasterMain ()
> > > #11 0x00000000005033c3 in main ()
> >
> > Oh, so this is happening during index entry creation?  (The reference to
> > heap_modifytuple is misleading, but in a debug-symbol-free backend it's
> > not so surprising.)
> >
> > This suddenly looks a whole lot like a known bug:
> > http://archives.postgresql.org/pgsql-hackers/2005-11/msg01016.php
> >
> > Which version did you say you were using exactly?  That bug is fixed
> > in 8.1.1 ...
> >
> >             regards, tom lane
> >
> > --
> > This message has been scanned for viruses and
> > dangerous content by MailScanner, and is
> > believed to be clean.
> >
>
> --
> "The best way to predict the future is to invent it."
> - XEROX PARC slogan, circa 1978
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>

--
"I kept looking around for somebody to solve the problem.
Then I realized I am somebody"
   -Anonymous

--
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978