Thread: Postgresql Segfault in 8.1
I'm running PostgreSQL 8.1 on CentOS 4.2, Dual proc Athlon 64 w/4 GB RAM. I'm trying to get a PHP app to work, but the failure happens when the command is copy/pasted into pgsql. Trying to run a large insert statement, and I get: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. # in /var/log/messages, I see Jan 24 17:00:04 kepler kernel: postmaster[26185]: segfault at 000000002516d728 rip 000000000043c82c rsp 0000007fbfffddd0 error 4 The insert statement is long, but doesn't seem to violate anything strange - no weird characters, and all the fields have been properly escaped with pg_escape(). What information do you need to help figure this out? -Ben -- "The best way to predict the future is to invent it." - XEROX PARC slogan, circa 1978
Benjamin Smith <lists@benjamindsmith.com> writes: > in /var/log/messages, I see > > Jan 24 17:00:04 kepler kernel: postmaster[26185]: segfault at 000000002516d728 > rip 000000000043c82c rsp 0000007fbfffddd0 error 4 > > The insert statement is long, but doesn't seem to violate anything strange - > no weird characters, and all the fields have been properly escaped with > pg_escape(). > > What information do you need to help figure this out? Reproduce it with gdb attached to the backend process and post the backtrace... You may need to recompile PG with debugging symbols to get the most info. -Doug
Benjamin Smith <lists@benjamindsmith.com> writes: > What information do you need to help figure this out? Best is to provide a recipe by which someone else can reproduce it from a standing start. You can find some useful hints at http://www.postgresql.org/docs/8.1/static/bug-reporting.html regards, tom lane
Thanks, What's the best way to do this? Take PG down (normally started as a service) and run directly in a single-user mode? I've never reallly worked with gdb... -Ben On Tuesday 24 January 2006 17:27, you wrote: > > What information do you need to help figure this out? > > Reproduce it with gdb attached to the backend process and post the > backtrace... You may need to recompile PG with debugging symbols to > get the most info. -- "The best way to predict the future is to invent it." - XEROX PARC slogan, circa 1978
Benjamin Smith <lists@benjamindsmith.com> writes: > What's the best way to do this? Take PG down (normally started as a service) > and run directly in a single-user mode? No, just start a psql session in one window, then in another window determine the PID of the backend process it's connected to, and attach gdb to that process. Something like ps auxww | grep postgres: ... eyeball determination of correct PID ... gdb /path/to/postgres-executable PID gdb> continue Now, in the psql window, do what's needed to provoke the crash. gdb should trap at the instant of the segfault and give you another gdb> prompt. Type "bt" to get the backtrace, then "q" to disconnect. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Benjamin Smith <lists@benjamindsmith.com> writes: > > What's the best way to do this? Take PG down (normally started as a service) > > and run directly in a single-user mode? > > No, just start a psql session in one window, then in another window > determine the PID of the backend process it's connected to, and attach > gdb to that process. Something like > > ps auxww | grep postgres: > ... eyeball determination of correct PID ... You can also do 'select pg_backend_pid();' from psql... I know that's there on 8.1, though I recall this was an 8.0 discussion and I *think* it's there too but not 100% sure. Enjoy, Stephen
Attachment
OK, here's the output: (gdb) continue Continuing. Program received signal SIGSEGV, Segmentation fault. 0x000000000043c82c in heap_modifytuple () (gdb) // not very hopeful, I'd think // Can I get a RHES/CENTOS PG 8.1 RPM with the symbols enabled? Unfortunately, I don't think I can give out a dump of the DB (heavily constrained) because of private customer information... and the query works *FINE* with different datasets. There's something specific about THIS QUERY that's causing the failure. I'm going to try to get this to fail on another system that's not in production use, though it's a uniprocessor P4. -Ben On Wednesday 25 January 2006 07:52, you wrote: > Benjamin Smith <lists@benjamindsmith.com> writes: > > What's the best way to do this? Take PG down (normally started as a service) > > and run directly in a single-user mode? > > No, just start a psql session in one window, then in another window > determine the PID of the backend process it's connected to, and attach > gdb to that process. Something like > > ps auxww | grep postgres: > ... eyeball determination of correct PID ... > gdb /path/to/postgres-executable PID > gdb> continue > > Now, in the psql window, do what's needed to provoke the crash. gdb > should trap at the instant of the segfault and give you another gdb> > prompt. Type "bt" to get the backtrace, then "q" to disconnect. > > regards, tom lane > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- "The best way to predict the future is to invent it." - XEROX PARC slogan, circa 1978
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> ps auxww | grep postgres: >> ... eyeball determination of correct PID ... > You can also do 'select pg_backend_pid();' from psql... I know that's > there on 8.1, though I recall this was an 8.0 discussion and I *think* > it's there too but not 100% sure. Good thought. I've been doing it via ps since forever, but pg_backend_pid is more foolproof (and it seems to have been in since 7.3, so that's not a problem). regards, tom lane
Benjamin Smith <lists@benjamindsmith.com> writes: > OK, here's the output: > (gdb) continue > Continuing. > Program received signal SIGSEGV, Segmentation fault. > 0x000000000043c82c in heap_modifytuple () > (gdb) > // not very hopeful, I'd think // You forgot the "bt" part ... although I'm not sure we'd learn a whole lot more without debug symbols. > Can I get a RHES/CENTOS PG 8.1 RPM with the symbols enabled? Current Red Hat practice is to put the debug symbols into separate "debuginfo" RPMs. Hopefully you can find the debuginfo RPM wherever you got the postgres RPM from. regards, tom lane
Tom, Since we host customer data, I have to get OK from the company attorney before I can give you a full "howto create". I've been unable to recreate it without a full database dump. I'm waiting for a call back on that. I also can't recreate it on IA32. I tried to replicate the issue on a uniproc P4/32, but it worked fine there, so it does seem to be something specific about the fact that it's either X86/64 or that it's dual proc. The production server has 4GB of ECC RAM. I can consistently create the problem by dumping and reloading the database to a different PG database, and running it there, so AFAICT I'm not bugging anybody when I run this query. In the meantime, I found the "debuginfo" rpm, and installed it without a hitch. Luckily, it seems to "take effect" without having to restart the PG daemon. (which is busy serving 10-20 people at any given moment...) Again, here's the output from gdb. This looks a bit more useful, I hope this helps! Program received signal SIGSEGV, Segmentation fault. slot_deform_tuple (slot=0xa669c8, natts=36) at heaptuple.c:1262 1262 off = att_addlength(off, thisatt->attlen, tp + off); (gdb) bt #0 slot_deform_tuple (slot=0xa669c8, natts=36) at heaptuple.c:1262 #1 0x000000000043c8f5 in slot_getattr (slot=0xa669c8, attnum=36, isnull=0x7fbfffde87 "") at heaptuple.c:1367 #2 0x000000000047a50a in FormIndexDatum (indexInfo=0xa66b60, slot=0xa669c8, estate=0xa61190, values=0x7fbfffdf10, isnull=0x7fbfffdef0 "") at index.c:962 #3 0x00000000004ebee3 in ExecInsertIndexTuples (slot=0xa669c8, tupleid=0xa6efc4, estate=0xa61190, is_vacuum=0 '\0') at execUtils.c:925 #4 0x00000000004e5265 in ExecutorRun (queryDesc=Variable "queryDesc" is not available. ) at execMain.c:1437 #5 0x0000000000564312 in ProcessQuery (parsetree=Variable "parsetree" is not available. ) at pquery.c:174 #6 0x0000000000565287 in PortalRun (portal=0xa5ed70, count=9223372036854775807, dest=0xa596f8, altdest=0xa596f8, completionTag=0x7fbfffe380 "") at pquery.c:1076 #7 0x0000000000560f8b in exec_simple_query ( query_string=0xa440e0 "INSERT INTO lcclasses (id, schoolyear, modified, entrydate, creator, status, name, location, city, maxclasssize, prerequisites, cost, costnote, coursecode, section, credits, whytake, materialsnote, te"...) at postgres.c:1014 #8 0x0000000000562e0e in PostgresMain (argc=4, argv=0xa0cca0, username=0xa0cc60 "cworksdev") at postgres.c:3168 #9 0x000000000053d316 in ServerLoop () at postmaster.c:2852 #10 0x000000000053ea59 in PostmasterMain (argc=5, argv=0x9ea510) at postmaster.c:943 #11 0x00000000005033c3 in main (argc=5, argv=0x9ea510) at main.c:256 (gdb) continue Continuing. Program terminated with signal SIGSEGV, Segmentation fault. The program no longer exists. ################################################## Postgresql.conf listen_addresses = '127.0.0.1' port = 5432 max_connections = 96 shared_buffers=250000 temp_buffers = 10000 max_prepared_transactions = 0 work_mem = 1024 # min 64, size in KB maintenance_work_mem = 16384 # min 1024, size in KB max_stack_depth = 9240 redirect_stderr = on # Enable capturing of stderr into log log_directory = 'pg_log' # Directory where log files are written log_truncate_on_rotation = on # If on, any existing log file of the same log_rotation_age = 1440 # Automatic rotation of logfiles will log_rotation_size = 0 # Automatic rotation of logfiles will autovacuum = on autovacuum_naptime = 600 lc_messages = 'en_US.UTF-8' # locale for system error message lc_monetary = 'en_US.UTF-8' # locale for monetary formatting lc_numeric = 'en_US.UTF-8' # locale for number formatting lc_time = 'en_US.UTF-8' # locale for time formatting add_missing_from = on -Ben On Wednesday 25 January 2006 11:18, you wrote: > Benjamin Smith <lists@benjamindsmith.com> writes: > > OK, here's the output: > > (gdb) continue > > Continuing. > > > Program received signal SIGSEGV, Segmentation fault. > > 0x000000000043c82c in heap_modifytuple () > > (gdb) > > > // not very hopeful, I'd think // > > You forgot the "bt" part ... although I'm not sure we'd learn a whole > lot more without debug symbols. > > > Can I get a RHES/CENTOS PG 8.1 RPM with the symbols enabled? > > Current Red Hat practice is to put the debug symbols into separate > "debuginfo" RPMs. Hopefully you can find the debuginfo RPM wherever > you got the postgres RPM from. > > regards, tom lane > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- "The best way to predict the future is to invent it." - XEROX PARC slogan, circa 1978
Benjamin Smith <lists@benjamindsmith.com> writes: > Aha, yep. Sorry: > Program received signal SIGSEGV, Segmentation fault. > 0x000000000043c82c in heap_modifytuple () > (gdb) bt > #0 0x000000000043c82c in heap_modifytuple () > #1 0x000000000043c8f5 in slot_getattr () > #2 0x000000000047a50a in FormIndexDatum () > #3 0x00000000004ebee3 in ExecInsertIndexTuples () > #4 0x00000000004e5265 in ExecutorRun () > #5 0x0000000000564312 in FreeQueryDesc () > #6 0x0000000000565287 in PortalRun () > #7 0x0000000000560f8b in pg_parse_query () > #8 0x0000000000562e0e in PostgresMain () > #9 0x000000000053d316 in ClosePostmasterPorts () > #10 0x000000000053ea59 in PostmasterMain () > #11 0x00000000005033c3 in main () Oh, so this is happening during index entry creation? (The reference to heap_modifytuple is misleading, but in a debug-symbol-free backend it's not so surprising.) This suddenly looks a whole lot like a known bug: http://archives.postgresql.org/pgsql-hackers/2005-11/msg01016.php Which version did you say you were using exactly? That bug is fixed in 8.1.1 ... regards, tom lane
Version: postgresql-8.1.0-4.c4 I'll have to see about getting an update... Thanks a TON, -Ben On Wednesday 25 January 2006 13:11, you wrote: > Benjamin Smith <lists@benjamindsmith.com> writes: > > Aha, yep. Sorry: > > Program received signal SIGSEGV, Segmentation fault. > > 0x000000000043c82c in heap_modifytuple () > > (gdb) bt > > #0 0x000000000043c82c in heap_modifytuple () > > #1 0x000000000043c8f5 in slot_getattr () > > #2 0x000000000047a50a in FormIndexDatum () > > #3 0x00000000004ebee3 in ExecInsertIndexTuples () > > #4 0x00000000004e5265 in ExecutorRun () > > #5 0x0000000000564312 in FreeQueryDesc () > > #6 0x0000000000565287 in PortalRun () > > #7 0x0000000000560f8b in pg_parse_query () > > #8 0x0000000000562e0e in PostgresMain () > > #9 0x000000000053d316 in ClosePostmasterPorts () > > #10 0x000000000053ea59 in PostmasterMain () > > #11 0x00000000005033c3 in main () > > Oh, so this is happening during index entry creation? (The reference to > heap_modifytuple is misleading, but in a debug-symbol-free backend it's > not so surprising.) > > This suddenly looks a whole lot like a known bug: > http://archives.postgresql.org/pgsql-hackers/2005-11/msg01016.php > > Which version did you say you were using exactly? That bug is fixed > in 8.1.1 ... > > regards, tom lane > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- "The best way to predict the future is to invent it." - XEROX PARC slogan, circa 1978
Hello Ben, Unless I am wrong there is a set of RPMs available via the PostgreSQL site for 8.1.1/2 http://www.postgresql.org/ftp/binary/v8.1.2/linux/rpms/ Aly. Benjamin Smith wrote: > Version: postgresql-8.1.0-4.c4 > > I'll have to see about getting an update... > > Thanks a TON, > > -Ben > > On Wednesday 25 January 2006 13:11, you wrote: >> Benjamin Smith <lists@benjamindsmith.com> writes: >>> Aha, yep. Sorry: >>> Program received signal SIGSEGV, Segmentation fault. >>> 0x000000000043c82c in heap_modifytuple () >>> (gdb) bt >>> #0 0x000000000043c82c in heap_modifytuple () >>> #1 0x000000000043c8f5 in slot_getattr () >>> #2 0x000000000047a50a in FormIndexDatum () >>> #3 0x00000000004ebee3 in ExecInsertIndexTuples () >>> #4 0x00000000004e5265 in ExecutorRun () >>> #5 0x0000000000564312 in FreeQueryDesc () >>> #6 0x0000000000565287 in PortalRun () >>> #7 0x0000000000560f8b in pg_parse_query () >>> #8 0x0000000000562e0e in PostgresMain () >>> #9 0x000000000053d316 in ClosePostmasterPorts () >>> #10 0x000000000053ea59 in PostmasterMain () >>> #11 0x00000000005033c3 in main () >> Oh, so this is happening during index entry creation? (The reference to >> heap_modifytuple is misleading, but in a debug-symbol-free backend it's >> not so surprising.) >> >> This suddenly looks a whole lot like a known bug: >> http://archives.postgresql.org/pgsql-hackers/2005-11/msg01016.php >> >> Which version did you say you were using exactly? That bug is fixed >> in 8.1.1 ... >> >> regards, tom lane >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is >> believed to be clean. >> > -- Aly S.P Dharshi aly.dharshi@telus.net "A good speech is like a good dress that's short enough to be interesting and long enough to cover the subject"
// FIXED // Tom, thank you so much for your help! Now running 8.1.2, the query now works quickly and properly. -Ben On Wednesday 25 January 2006 13:17, Benjamin Smith wrote: > Version: postgresql-8.1.0-4.c4 > > I'll have to see about getting an update... > > Thanks a TON, > > -Ben > > On Wednesday 25 January 2006 13:11, you wrote: > > Benjamin Smith <lists@benjamindsmith.com> writes: > > > Aha, yep. Sorry: > > > Program received signal SIGSEGV, Segmentation fault. > > > 0x000000000043c82c in heap_modifytuple () > > > (gdb) bt > > > #0 0x000000000043c82c in heap_modifytuple () > > > #1 0x000000000043c8f5 in slot_getattr () > > > #2 0x000000000047a50a in FormIndexDatum () > > > #3 0x00000000004ebee3 in ExecInsertIndexTuples () > > > #4 0x00000000004e5265 in ExecutorRun () > > > #5 0x0000000000564312 in FreeQueryDesc () > > > #6 0x0000000000565287 in PortalRun () > > > #7 0x0000000000560f8b in pg_parse_query () > > > #8 0x0000000000562e0e in PostgresMain () > > > #9 0x000000000053d316 in ClosePostmasterPorts () > > > #10 0x000000000053ea59 in PostmasterMain () > > > #11 0x00000000005033c3 in main () > > > > Oh, so this is happening during index entry creation? (The reference to > > heap_modifytuple is misleading, but in a debug-symbol-free backend it's > > not so surprising.) > > > > This suddenly looks a whole lot like a known bug: > > http://archives.postgresql.org/pgsql-hackers/2005-11/msg01016.php > > > > Which version did you say you were using exactly? That bug is fixed > > in 8.1.1 ... > > > > regards, tom lane > > > > -- > > This message has been scanned for viruses and > > dangerous content by MailScanner, and is > > believed to be clean. > > > > -- > "The best way to predict the future is to invent it." > - XEROX PARC slogan, circa 1978 > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- "I kept looking around for somebody to solve the problem. Then I realized I am somebody" -Anonymous -- "The best way to predict the future is to invent it." - XEROX PARC slogan, circa 1978