Thread: backend died

backend died

From
"Brusser, Michael"
Date:
<p><font size="2">Our customer running Postgres v. 7.3.2 reported a problem, occurring</font><br /><font
size="2">coupletimes a week on three different servers, all on Solaris 9.</font><br /><font size="2">We enabled
debuggingin postgresql.conf, now it happened again;</font><br /><font size="2">here's the excerpt from the database
log:</font><p><fontsize="2">2004-11-13 10:01:06 [10456]  DEBUG:  child process (pid 19285) was terminated by signal
10</font><br/><font size="2">2004-11-13 10:01:06 [10456]  LOG:  server process (pid 19285) was terminated by signal
10</font><br/><font size="2">2004-11-13 10:01:06 [10456]  LOG:  terminating any other active server processes</font><br
/><fontsize="2">2004-11-13 10:01:06 [10456]  DEBUG:  CleanupProc: sending SIGQUIT to process 10876</font><br /><font
size="2">2004-11-1310:01:06 [10456]  DEBUG:  CleanupProc: sending SIGQUIT to process 10482</font><br /><font
size="2">2004-11-1310:01:06 [10456]  DEBUG:  CleanupProc: sending SIGQUIT to process 10481</font><br /><font
size="2">2004-11-1310:01:06 [10456]  DEBUG:  CleanupProc: sending SIGQUIT to process 10478</font><br /><font
size="2">2004-11-1310:01:06 [10456]  DEBUG:  CleanupProc: sending SIGQUIT to process 10472</font><br /><font
size="2">2004-11-1310:01:06 [10876]  WARNING:  Message from PostgreSQL backend:</font><br /><font size="2">        The
Postmasterhas informed me that some other backend</font><br /><font size="2">        died abnormally and possibly
corruptedshared memory.</font><br /><font size="2">        I have rolled back the current transaction and am</font><br
/><fontsize="2">        going to terminate your database system connection and exit.</font><br /><font size="2">       
Pleasereconnect to the database system and repeat your query.</font><br /><font size="2">2004-11-13 10:01:06 [10478] 
WARNING: Message from PostgreSQL backend:</font><br /><font size="2">        The Postmaster has informed me that some
otherbackend</font><br /><font size="2">        died abnormally and possibly corrupted shared memory.</font><br /><font
size="2">       I have rolled back the current transaction and am</font><br /><font size="2">        going to terminate
yourdatabase system connection and exit.</font><br /><font size="2">        Please reconnect to the database system and
repeatyour query.</font><br /><font size="2">2004-11-13 10:01:06 [10482]  WARNING:  Message from PostgreSQL
backend:</font><br/><font size="2">        The Postmaster has informed me that some other backend</font><br /><font
size="2">       ... ...</font><p><font size="2">There's no other references to process 19285 in the log file.</font><br
/><fontsize="2">If it helps the servers are configured to use UDS.</font><br /><font size="2">The socket files are
placedin different directories (each db's PGDATA)</font><p><font size="2">Would it be helful to change the debug level
fromDEBUG1 to a higher value?</font><br /><font size="2">What else should I look at?</font><p><font size="2">Thank
you,</font><br/><font size="2">Mike</font><br /> 

Re: backend died

From
Tom Lane
Date:
"Brusser, Michael" <Michael.Brusser@matrixone.com> writes:
> Our customer running Postgres v. 7.3.2 reported a problem, occurring
> couple times a week on three different servers, all on Solaris 9.

> 2004-11-13 10:01:06 [10456]  DEBUG:  child process (pid 19285) was
> terminated by signal 10

SIGBUS iirc.

> What else should I look at?

Find out what query is causing the crash --- enable query logging if you
have no other way.  And get a debugger stack trace from the core file
that the crashed backend left behind.
        regards, tom lane


Re: backend died

From
"Brusser, Michael"
Date:
<p><font size="2">> "Brusser, Michael" <Michael.Brusser@matrixone.com> writes:</font><br /><font size="2">>
>Our customer running Postgres v. 7.3.2 reported a problem, occurring</font><br /><font size="2">> > couple
timesa week on three different servers, all on Solaris 9.</font><br /><font size="2">> </font><br /><font
size="2">>> 2004-11-13 10:01:06 [10456]  DEBUG:  child process (pid 19285) was</font><br /><font size="2">>
>terminated by signal 10</font><br /><font size="2">> </font><br /><font size="2">> SIGBUS
iirc.</font><p><fontsize="2">> > What else should I look at?</font><br /><font size="2">> </font><br /><font
size="2">>Find out what query is causing the crash --- enable query </font><br /><font size="2">> logging if you
haveno other way.  And get a debugger stack trace from the core file</font><br /><font size="2">> that the crashed
backendleft behind.</font><br /><font size="2">>                       regards, tom lane</font><p><font
size="2">==================================================</font><br/><font size="2">The log-statements option was
alreadyenabled,</font><br /><font size="2">here's what I see prior to the crash:</font><p><font size="2">2004-11-13
09:49:46[10876]  DEBUG:  StartTransactionCommand</font><br /><font size="2">2004-11-13 09:49:46 [10876]  LOG: 
statement:SELECT SUM(C0),SUM(C1),SUM(C2),SUM(C3),SUM(C4),SUM(C5),SUM(C6),SUM(C7),SUM(C8),SUM(C9)</font><p><font
size="2">,SUM(C10)FROM cache_refreshes</font><br /><font size="2">2004-11-13 09:49:46 [10876]  LOG:  query: SELECT
SUM(C0),SUM(C1),SUM(C2),SUM(C3),SUM(C4),SUM(C5),SUM(C6),SUM(C7),SUM(C8),SUM(C9),SUM</font><p><fontsize="2">(C10) FROM
cache_refreshes</font><br/><font size="2">2004-11-13 09:49:46 [10876]  DEBUG:  CommitTransactionCommand</font><br
/><fontsize="2">2004-11-13 09:49:46 [10876]  LOG:  statement: SELECT
SUM(C0),SUM(C1),SUM(C2),SUM(C3),SUM(C4),SUM(C5),SUM(C6),SUM(C7),SUM(C8),SUM(C9)</font><p><fontsize="2">,SUM(C10) FROM
cache_refreshes</font><br/><font size="2">2004-11-13 09:49:46 [10876]  DEBUG:  StartTransactionCommand</font><br
/><fontsize="2">2004-11-13 09:49:46 [10876]  LOG:  statement: commit</font><br /><font size="2">2004-11-13 09:49:46
[10876] LOG:  query: commit</font><br /><font size="2">2004-11-13 09:49:46 [10876]  DEBUG: 
CommitTransactionCommand</font><br/><font size="2">2004-11-13 09:49:46 [10876]  LOG:  statement: commit</font><br
/><fontsize="2">2004-11-13 09:49:46 [10876]  DEBUG:  StartTransactionCommand</font><br /><font size="2">2004-11-13
09:49:46[10876]  LOG:  statement: begin</font><br /><font size="2">2004-11-13 09:49:46 [10876]  LOG:  query:
begin</font><br/><font size="2">2004-11-13 09:49:46 [10876]  DEBUG:  CommitTransactionCommand</font><br /><font
size="2">2004-11-1309:49:46 [10876]  LOG:  statement: begin</font><br /><font size="2">2004-11-13 09:49:46 [10876] 
DEBUG: StartTransactionCommand</font><br /><font size="2">2004-11-13 09:49:46 [10876]  LOG:  statement:
commit</font><br/><font size="2">2004-11-13 09:49:46 [10876]  LOG:  query: commit</font><br /><font size="2">2004-11-13
09:49:46[10876]  DEBUG:  CommitTransactionCommand</font><br /><font size="2">2004-11-13 09:49:46 [10876]  LOG: 
statement:commit</font><br /><font size="2">2004-11-13 09:49:46 [10876]  DEBUG:  StartTransactionCommand</font><br
/><fontsize="2">2004-11-13 09:49:46 [10876]  LOG:  statement: begin</font><br /><font size="2">2004-11-13 09:49:46
[10876] LOG:  query: begin</font><br /><font size="2">2004-11-13 09:49:46 [10876]  DEBUG: 
CommitTransactionCommand</font><br/><font size="2">2004-11-13 09:49:46 [10876]  LOG:  statement: begin</font><br
/><fontsize="2">2004-11-13 09:49:46 [10876]  DEBUG:  StartTransactionCommand</font><br /><font size="2">2004-11-13
09:49:46[10876]  LOG:  statement: commit</font><br /><font size="2">2004-11-13 09:49:46 [10876]  LOG:  query:
commit</font><br/><font size="2">2004-11-13 09:49:46 [10876]  DEBUG:  CommitTransactionCommand</font><br /><font
size="2">2004-11-1309:49:46 [10876]  LOG:  statement: commit</font><br /><font size="2">2004-11-13 09:54:21 [10456] 
DEBUG: child process (pid 19282) exited with exit code 0</font><br /><font size="2">2004-11-13 10:01:06 [10456] 
DEBUG: child process (pid 19285) was terminated by signal 10</font><br /><font size="2">2004-11-13 10:01:06 [10456] 
LOG: server process (pid 19285) was terminated by signal 10</font><br /><font size="2">2004-11-13 10:01:06 [10456] 
LOG: terminating any other active server processes</font><br /><font size="2">2004-11-13 10:01:06 [10456]  DEBUG: 
CleanupProc:sending SIGQUIT to process 10876</font><br /><font size="2">... ... ...</font><p><font size="2">The same
app.is running for other customers, seemingly steady...</font><br /><font size="2">I will ask for the core
file.</font><br/><font size="2">This is a brand new machine. Is it likely that a bad memory chip may cause
this?</font><br/><font size="2">Thank you,</font><br /><font size="2">Mike</font> 

Re: backend died

From
Tom Lane
Date:
"Brusser, Michael" <Michael.Brusser@matrixone.com> writes:
> 2004-11-13 10:01:06 [10456]  DEBUG:  child process (pid 19285) was
> terminated by signal 10

> The log-statements option was already enabled,
> here's what I see prior to the crash:

That's no help.  What were the last few lines from process 19285?

> This is a brand new machine. Is it likely that a bad memory chip may cause
> this?

Possibly, but it would not do to point fingers at the hardware when
you're running an obsolete version of PG ;-).  At least get it updated
to 7.3.8.
        regards, tom lane


Re: backend died

From
"Brusser, Michael"
Date:
<p><font size="2">> -----Original Message-----</font><br /><font size="2">> From: Tom Lane [<a
href="mailto:tgl@sss.pgh.pa.us">mailto:tgl@sss.pgh.pa.us</a>]</font><br/><font size="2">> Sent: Monday, November 15,
20043:34 PM</font><br /><font size="2">> To: Brusser, Michael</font><br /><font size="2">> Cc: Pgsql-Hackers
(E-mail)</font><br/><font size="2">> Subject: Re: [HACKERS] backend died </font><br /><font size="2">> </font><br
/><fontsize="2">> </font><br /><font size="2">> "Brusser, Michael" <Michael.Brusser@matrixone.com>
writes:</font><br/><font size="2">> > 2004-11-13 10:01:06 [10456]  DEBUG:  child process (pid 19285)
was</font><br/><font size="2">> > terminated by signal 10</font><br /><font size="2">> </font><br /><font
size="2">>> The log-statements option was already enabled,</font><br /><font size="2">> > here's what I see
priorto the crash:</font><br /><font size="2">> </font><br /><font size="2">> That's no help.  What were the last
fewlines from process 19285?</font><p><font size="2">That's the strangest thing: the log file begins with </font><br
/><fontsize="2">2004-11-12 11:49:14 [10456]  DEBUG:  FindExec: found </font><br /><font
size="2">"/lsi/soft/synchronicity/latest/syncinc/bin.sol2/postgres"using argv[0]</font><p><font size="2"> - it
continueswith all SQL statements until it crashes; but the only</font><br /><font size="2">reference to pid 19285
is:</font><p><fontsize="2">2004-11-13 10:01:06 [10456]  DEBUG:  child process (pid 19285) was terminated by signal
10</font><br/><font size="2">2004-11-13 10:01:06 [10456]  LOG:  server process (pid 19285) was terminated by signal
10</font><p><fontsize="2">there are no prior occurrences of token 19285 in the file.</font><p><font size="2">> ...
you'rerunning an obsolete version of PG ;-). </font><br /><font size="2">>     At least get it updated to
7.3.8.</font><p><fontsize="2">I'd love to, but this is not something I can do. Have to live with that,</font><br
/><fontsize="2">as well as with the fact that many of our customers are running on NFS</font><br /><font size="2">(yes,
Iknow...)</font> 

Re: backend died

From
Tom Lane
Date:
"Brusser, Michael" <Michael.Brusser@matrixone.com> writes:
>> That's no help.  What were the last few lines from process 19285?

> there are no prior occurrences of token 19285 in the file.

Hmm, so it seems 19285 died during startup.  That does make a hardware
problem seem a bit plausible --- the backend start sequence is pretty
well tested ;-).
        regards, tom lane