Thread: backend died
<p><font size="2">Our customer running Postgres v. 7.3.2 reported a problem, occurring</font><br /><font size="2">coupletimes a week on three different servers, all on Solaris 9.</font><br /><font size="2">We enabled debuggingin postgresql.conf, now it happened again;</font><br /><font size="2">here's the excerpt from the database log:</font><p><fontsize="2">2004-11-13 10:01:06 [10456] DEBUG: child process (pid 19285) was terminated by signal 10</font><br/><font size="2">2004-11-13 10:01:06 [10456] LOG: server process (pid 19285) was terminated by signal 10</font><br/><font size="2">2004-11-13 10:01:06 [10456] LOG: terminating any other active server processes</font><br /><fontsize="2">2004-11-13 10:01:06 [10456] DEBUG: CleanupProc: sending SIGQUIT to process 10876</font><br /><font size="2">2004-11-1310:01:06 [10456] DEBUG: CleanupProc: sending SIGQUIT to process 10482</font><br /><font size="2">2004-11-1310:01:06 [10456] DEBUG: CleanupProc: sending SIGQUIT to process 10481</font><br /><font size="2">2004-11-1310:01:06 [10456] DEBUG: CleanupProc: sending SIGQUIT to process 10478</font><br /><font size="2">2004-11-1310:01:06 [10456] DEBUG: CleanupProc: sending SIGQUIT to process 10472</font><br /><font size="2">2004-11-1310:01:06 [10876] WARNING: Message from PostgreSQL backend:</font><br /><font size="2"> The Postmasterhas informed me that some other backend</font><br /><font size="2"> died abnormally and possibly corruptedshared memory.</font><br /><font size="2"> I have rolled back the current transaction and am</font><br /><fontsize="2"> going to terminate your database system connection and exit.</font><br /><font size="2"> Pleasereconnect to the database system and repeat your query.</font><br /><font size="2">2004-11-13 10:01:06 [10478] WARNING: Message from PostgreSQL backend:</font><br /><font size="2"> The Postmaster has informed me that some otherbackend</font><br /><font size="2"> died abnormally and possibly corrupted shared memory.</font><br /><font size="2"> I have rolled back the current transaction and am</font><br /><font size="2"> going to terminate yourdatabase system connection and exit.</font><br /><font size="2"> Please reconnect to the database system and repeatyour query.</font><br /><font size="2">2004-11-13 10:01:06 [10482] WARNING: Message from PostgreSQL backend:</font><br/><font size="2"> The Postmaster has informed me that some other backend</font><br /><font size="2"> ... ...</font><p><font size="2">There's no other references to process 19285 in the log file.</font><br /><fontsize="2">If it helps the servers are configured to use UDS.</font><br /><font size="2">The socket files are placedin different directories (each db's PGDATA)</font><p><font size="2">Would it be helful to change the debug level fromDEBUG1 to a higher value?</font><br /><font size="2">What else should I look at?</font><p><font size="2">Thank you,</font><br/><font size="2">Mike</font><br />
"Brusser, Michael" <Michael.Brusser@matrixone.com> writes: > Our customer running Postgres v. 7.3.2 reported a problem, occurring > couple times a week on three different servers, all on Solaris 9. > 2004-11-13 10:01:06 [10456] DEBUG: child process (pid 19285) was > terminated by signal 10 SIGBUS iirc. > What else should I look at? Find out what query is causing the crash --- enable query logging if you have no other way. And get a debugger stack trace from the core file that the crashed backend left behind. regards, tom lane
<p><font size="2">> "Brusser, Michael" <Michael.Brusser@matrixone.com> writes:</font><br /><font size="2">> >Our customer running Postgres v. 7.3.2 reported a problem, occurring</font><br /><font size="2">> > couple timesa week on three different servers, all on Solaris 9.</font><br /><font size="2">> </font><br /><font size="2">>> 2004-11-13 10:01:06 [10456] DEBUG: child process (pid 19285) was</font><br /><font size="2">> >terminated by signal 10</font><br /><font size="2">> </font><br /><font size="2">> SIGBUS iirc.</font><p><fontsize="2">> > What else should I look at?</font><br /><font size="2">> </font><br /><font size="2">>Find out what query is causing the crash --- enable query </font><br /><font size="2">> logging if you haveno other way. And get a debugger stack trace from the core file</font><br /><font size="2">> that the crashed backendleft behind.</font><br /><font size="2">> regards, tom lane</font><p><font size="2">==================================================</font><br/><font size="2">The log-statements option was alreadyenabled,</font><br /><font size="2">here's what I see prior to the crash:</font><p><font size="2">2004-11-13 09:49:46[10876] DEBUG: StartTransactionCommand</font><br /><font size="2">2004-11-13 09:49:46 [10876] LOG: statement:SELECT SUM(C0),SUM(C1),SUM(C2),SUM(C3),SUM(C4),SUM(C5),SUM(C6),SUM(C7),SUM(C8),SUM(C9)</font><p><font size="2">,SUM(C10)FROM cache_refreshes</font><br /><font size="2">2004-11-13 09:49:46 [10876] LOG: query: SELECT SUM(C0),SUM(C1),SUM(C2),SUM(C3),SUM(C4),SUM(C5),SUM(C6),SUM(C7),SUM(C8),SUM(C9),SUM</font><p><fontsize="2">(C10) FROM cache_refreshes</font><br/><font size="2">2004-11-13 09:49:46 [10876] DEBUG: CommitTransactionCommand</font><br /><fontsize="2">2004-11-13 09:49:46 [10876] LOG: statement: SELECT SUM(C0),SUM(C1),SUM(C2),SUM(C3),SUM(C4),SUM(C5),SUM(C6),SUM(C7),SUM(C8),SUM(C9)</font><p><fontsize="2">,SUM(C10) FROM cache_refreshes</font><br/><font size="2">2004-11-13 09:49:46 [10876] DEBUG: StartTransactionCommand</font><br /><fontsize="2">2004-11-13 09:49:46 [10876] LOG: statement: commit</font><br /><font size="2">2004-11-13 09:49:46 [10876] LOG: query: commit</font><br /><font size="2">2004-11-13 09:49:46 [10876] DEBUG: CommitTransactionCommand</font><br/><font size="2">2004-11-13 09:49:46 [10876] LOG: statement: commit</font><br /><fontsize="2">2004-11-13 09:49:46 [10876] DEBUG: StartTransactionCommand</font><br /><font size="2">2004-11-13 09:49:46[10876] LOG: statement: begin</font><br /><font size="2">2004-11-13 09:49:46 [10876] LOG: query: begin</font><br/><font size="2">2004-11-13 09:49:46 [10876] DEBUG: CommitTransactionCommand</font><br /><font size="2">2004-11-1309:49:46 [10876] LOG: statement: begin</font><br /><font size="2">2004-11-13 09:49:46 [10876] DEBUG: StartTransactionCommand</font><br /><font size="2">2004-11-13 09:49:46 [10876] LOG: statement: commit</font><br/><font size="2">2004-11-13 09:49:46 [10876] LOG: query: commit</font><br /><font size="2">2004-11-13 09:49:46[10876] DEBUG: CommitTransactionCommand</font><br /><font size="2">2004-11-13 09:49:46 [10876] LOG: statement:commit</font><br /><font size="2">2004-11-13 09:49:46 [10876] DEBUG: StartTransactionCommand</font><br /><fontsize="2">2004-11-13 09:49:46 [10876] LOG: statement: begin</font><br /><font size="2">2004-11-13 09:49:46 [10876] LOG: query: begin</font><br /><font size="2">2004-11-13 09:49:46 [10876] DEBUG: CommitTransactionCommand</font><br/><font size="2">2004-11-13 09:49:46 [10876] LOG: statement: begin</font><br /><fontsize="2">2004-11-13 09:49:46 [10876] DEBUG: StartTransactionCommand</font><br /><font size="2">2004-11-13 09:49:46[10876] LOG: statement: commit</font><br /><font size="2">2004-11-13 09:49:46 [10876] LOG: query: commit</font><br/><font size="2">2004-11-13 09:49:46 [10876] DEBUG: CommitTransactionCommand</font><br /><font size="2">2004-11-1309:49:46 [10876] LOG: statement: commit</font><br /><font size="2">2004-11-13 09:54:21 [10456] DEBUG: child process (pid 19282) exited with exit code 0</font><br /><font size="2">2004-11-13 10:01:06 [10456] DEBUG: child process (pid 19285) was terminated by signal 10</font><br /><font size="2">2004-11-13 10:01:06 [10456] LOG: server process (pid 19285) was terminated by signal 10</font><br /><font size="2">2004-11-13 10:01:06 [10456] LOG: terminating any other active server processes</font><br /><font size="2">2004-11-13 10:01:06 [10456] DEBUG: CleanupProc:sending SIGQUIT to process 10876</font><br /><font size="2">... ... ...</font><p><font size="2">The same app.is running for other customers, seemingly steady...</font><br /><font size="2">I will ask for the core file.</font><br/><font size="2">This is a brand new machine. Is it likely that a bad memory chip may cause this?</font><br/><font size="2">Thank you,</font><br /><font size="2">Mike</font>
"Brusser, Michael" <Michael.Brusser@matrixone.com> writes: > 2004-11-13 10:01:06 [10456] DEBUG: child process (pid 19285) was > terminated by signal 10 > The log-statements option was already enabled, > here's what I see prior to the crash: That's no help. What were the last few lines from process 19285? > This is a brand new machine. Is it likely that a bad memory chip may cause > this? Possibly, but it would not do to point fingers at the hardware when you're running an obsolete version of PG ;-). At least get it updated to 7.3.8. regards, tom lane
<p><font size="2">> -----Original Message-----</font><br /><font size="2">> From: Tom Lane [<a href="mailto:tgl@sss.pgh.pa.us">mailto:tgl@sss.pgh.pa.us</a>]</font><br/><font size="2">> Sent: Monday, November 15, 20043:34 PM</font><br /><font size="2">> To: Brusser, Michael</font><br /><font size="2">> Cc: Pgsql-Hackers (E-mail)</font><br/><font size="2">> Subject: Re: [HACKERS] backend died </font><br /><font size="2">> </font><br /><fontsize="2">> </font><br /><font size="2">> "Brusser, Michael" <Michael.Brusser@matrixone.com> writes:</font><br/><font size="2">> > 2004-11-13 10:01:06 [10456] DEBUG: child process (pid 19285) was</font><br/><font size="2">> > terminated by signal 10</font><br /><font size="2">> </font><br /><font size="2">>> The log-statements option was already enabled,</font><br /><font size="2">> > here's what I see priorto the crash:</font><br /><font size="2">> </font><br /><font size="2">> That's no help. What were the last fewlines from process 19285?</font><p><font size="2">That's the strangest thing: the log file begins with </font><br /><fontsize="2">2004-11-12 11:49:14 [10456] DEBUG: FindExec: found </font><br /><font size="2">"/lsi/soft/synchronicity/latest/syncinc/bin.sol2/postgres"using argv[0]</font><p><font size="2"> - it continueswith all SQL statements until it crashes; but the only</font><br /><font size="2">reference to pid 19285 is:</font><p><fontsize="2">2004-11-13 10:01:06 [10456] DEBUG: child process (pid 19285) was terminated by signal 10</font><br/><font size="2">2004-11-13 10:01:06 [10456] LOG: server process (pid 19285) was terminated by signal 10</font><p><fontsize="2">there are no prior occurrences of token 19285 in the file.</font><p><font size="2">> ... you'rerunning an obsolete version of PG ;-). </font><br /><font size="2">> At least get it updated to 7.3.8.</font><p><fontsize="2">I'd love to, but this is not something I can do. Have to live with that,</font><br /><fontsize="2">as well as with the fact that many of our customers are running on NFS</font><br /><font size="2">(yes, Iknow...)</font>
"Brusser, Michael" <Michael.Brusser@matrixone.com> writes: >> That's no help. What were the last few lines from process 19285? > there are no prior occurrences of token 19285 in the file. Hmm, so it seems 19285 died during startup. That does make a hardware problem seem a bit plausible --- the backend start sequence is pretty well tested ;-). regards, tom lane