Thread: How to deal with crashes?

How to deal with crashes?

From
"Andrey Mishchenko"
Date:
Hello,

I'm posting to pgsql-general since I'm not sure that this should be a bug
report.
Please correct me if I'm wrong.

I have PostgreSQL 7.2, compiled by GCC egcs-2.91.66, Linux version 2.2.20,
i686

Periodically, Postgres crashes. The following lines are added to the log
file:
2002-04-17 13:55:18 [17524]  DEBUG:  server process (pid 23600) was
terminated by signal 11
2002-04-17 13:55:18 [17524]  DEBUG:  terminating any other active server
processes
2002-04-17 13:55:18 [17524]  DEBUG:  all server processes terminated;
reinitializing shared memory and semaphores
2002-04-17 13:55:18 [17524]  DEBUG:  startup process (pid 23605) was
terminated by signal 11
2002-04-17 13:55:18 [17524]  DEBUG:  aborting startup due to startup process
failure

When I do pg_ctl start, the following is logged:
2002-04-17 14:00:04 [26188]  DEBUG:  database system was interrupted at
2002-04-17 13:48:34 MSD
2002-04-17 14:00:04 [26188]  DEBUG:  checkpoint record is at 0/7F9D0C4
2002-04-17 14:00:04 [26188]  DEBUG:  redo record is at 0/7F9D0C4; undo
record is at 0/0; shutdown FALSE
2002-04-17 14:00:04 [26188]  DEBUG:  next transaction id: 551189; next oid:
210621
2002-04-17 14:00:04 [26188]  DEBUG:  database system was not properly shut
down; automatic recovery in progress
2002-04-17 14:00:04 [26188]  DEBUG:  ReadRecord: record with zero length at
0/7F9D104
2002-04-17 14:00:04 [26188]  DEBUG:  redo is not required
2002-04-17 14:00:08 [26258]  FATAL 1:  The database system is starting up
2002-04-17 14:00:08 [26262]  FATAL 1:  The database system is starting up
2002-04-17 14:00:13 [26365]  FATAL 1:  The database system is starting up
2002-04-17 14:00:13 [26375]  FATAL 1:  The database system is starting up
2002-04-17 14:00:24 [26188]  DEBUG:  database system is ready

I do not run debug version, so core dump is not available (I'm not sure
that it's a good idea to run debug version on the real web site, and this
problem occures only on that particular server).

Please tell me, what should I do solve my problem?
Is it safe to run debug version on the public web server?
What will be the penalties of doing that? (server runs ~10000 queries
daily)?

Thank you in advance,
Andrey


Re: How to deal with crashes?

From
Martijn van Oosterhout
Date:
On Wed, Apr 24, 2002 at 05:30:33PM +0400, Andrey Mishchenko wrote:
> Periodically, Postgres crashes. The following lines are added to the log
> file:
> 2002-04-17 13:55:18 [17524]  DEBUG:  server process (pid 23600) was
> terminated by signal 11
> 2002-04-17 13:55:18 [17524]  DEBUG:  terminating any other active server
> processes

etc...

Are you logging the queries? It would be helpful if you could identify the
query actually causing the problem.

> I do not run debug version, so core dump is not available (I'm not sure
> that it's a good idea to run debug version on the real web site, and this
> problem occures only on that particular server).

Debug info doesn't actually cost anything speed-wise or memory-wise. And
that's all you need to get useful info out of a core dump.

> Please tell me, what should I do solve my problem?
> Is it safe to run debug version on the public web server?
> What will be the penalties of doing that? (server runs ~10000 queries
> daily)?

Logging the queries is only really an issue if you don't rotate the logs on
a regular basis.

HTH,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Canada, Mexico, and Australia form the Axis of Nations That
> Are Actually Quite Nice But Secretly Have Nasty Thoughts About America

Re: How to deal with crashes?

From
"Andrey"
Date:
Yes, I'm logging queries.

Here are log records that precede Postgres crash.
Postgres crashes only after new session is opened (DEBUG:  connection: ...)
It seems to me that there is no correlation between crash and queries
executed before
new session is opened (crash occurs after different SELECT or UPDATE
queries).

As for debug version, I know what Windows programs may run number of times
slower in debug build. Isn't this a case for Postgres on Linux?

2002-04-17 13:55:00 [23382]  DEBUG:  connection: host=[local] user=spa-www
database=spa
2002-04-17 13:55:01 [23382]  DEBUG:  query: SELECT count(*) FROM skus
2002-04-17 13:55:01 [23382]  DEBUG:  query: COMMIT
2002-04-17 13:55:01 [23382]  DEBUG:  ProcessUtility: COMMIT
2002-04-17 13:55:01 [23382]  NOTICE:  COMMIT: no transaction in progress
2002-04-17 13:55:01 [23386]  DEBUG:  connection: host=[local] user=spa-www
database=spa
2002-04-17 13:55:01 [23386]  DEBUG:  query: SELECT count(*) FROM skus
2002-04-17 13:55:01 [23386]  DEBUG:  query: COMMIT
2002-04-17 13:55:01 [23386]  DEBUG:  ProcessUtility: COMMIT
2002-04-17 13:55:01 [23386]  NOTICE:  COMMIT: no transaction in progress
2002-04-17 13:55:18 [23600]  DEBUG:  connection: host=[local] user=spa-www
database=spa
2002-04-17 13:55:18 [17524]  DEBUG:  server process (pid 23600) was
terminated by signal 11
2002-04-17 13:55:18 [17524]  DEBUG:  terminating any other active server
processes
2002-04-17 13:55:18 [17524]  DEBUG:  all server processes terminated;
reinitializing shared memory and semaphores
2002-04-17 13:55:18 [17524]  DEBUG:  startup process (pid 23605) was
terminated by signal 11
2002-04-17 13:55:18 [17524]  DEBUG:  aborting startup due to startup process
failure
2002-04-17 14:00:04 [26188]  DEBUG:  database system was interrupted at
2002-04-17 13:48:34 MSD
2002-04-17 14:00:04 [26188]  DEBUG:  checkpoint record is at 0/7F9D0C4
2002-04-17 14:00:04 [26188]  DEBUG:  redo record is at 0/7F9D0C4; undo
record is at 0/0; shutdown FALSE
2002-04-17 14:00:04 [26188]  DEBUG:  next transaction id: 551189; next oid:
210621
2002-04-17 14:00:04 [26188]  DEBUG:  database system was not properly shut
down; automatic recovery in progress
2002-04-17 14:00:04 [26188]  DEBUG:  ReadRecord: record with zero length at
0/7F9D104
2002-04-17 14:00:04 [26188]  DEBUG:  redo is not required

Thank you,

Andrey

> -----Original Message-----
> From: Martijn van Oosterhout [mailto:kleptog@svana.org]
> Sent: Thursday, April 25, 2002 05:35 PM
> To: Andrey Mishchenko
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] How to deal with crashes?
>
>
> On Wed, Apr 24, 2002 at 05:30:33PM +0400, Andrey Mishchenko wrote:
> > Periodically, Postgres crashes. The following lines are added to the log
> > file:
> > 2002-04-17 13:55:18 [17524]  DEBUG:  server process (pid 23600) was
> > terminated by signal 11
> > 2002-04-17 13:55:18 [17524]  DEBUG:  terminating any other active server
> > processes
>
> etc...
>
> Are you logging the queries? It would be helpful if you could identify the
> query actually causing the problem.
>
> > I do not run debug version, so core dump is not available (I'm not sure
> > that it's a good idea to run debug version on the real web
> site, and this
> > problem occures only on that particular server).
>
> Debug info doesn't actually cost anything speed-wise or memory-wise. And
> that's all you need to get useful info out of a core dump.
>
> > Please tell me, what should I do solve my problem?
> > Is it safe to run debug version on the public web server?
> > What will be the penalties of doing that? (server runs ~10000 queries
> > daily)?
>
> Logging the queries is only really an issue if you don't rotate
> the logs on
> a regular basis.
>
> HTH,
> --
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> > Canada, Mexico, and Australia form the Axis of Nations That
> > Are Actually Quite Nice But Secretly Have Nasty Thoughts About America
>


Re: How to deal with crashes?

From
Tom Lane
Date:
"Andrey" <am@NOSPAM.netactor.net> writes:
> As for debug version, I know what Windows programs may run number of times
> slower in debug build. Isn't this a case for Postgres on Linux?

The gcc boys say that -g makes no difference in the generated code.
I've never bothered to test the assertion; I assume they know what
they're talking about.

> 2002-04-17 13:55:18 [23600]  DEBUG:  connection: host=[local] user=spa-www
> database=spa
> 2002-04-17 13:55:18 [17524]  DEBUG:  server process (pid 23600) was
> terminated by signal 11

This is most curious --- PID 23600 evidently crashed during startup,
before it had received any queries.

You definitely need to prepare a debug version and get a stack backtrace
from one of the coredumps before we'll be able to say much more than
that.

            regards, tom lane

Re: How to deal with crashes?

From
Curt Sampson
Date:
On Wed, 24 Apr 2002, Andrey Mishchenko wrote:

> I do not run debug version, so core dump is not available (I'm not sure
> that it's a good idea to run debug version on the real web site, and this
> problem occures only on that particular server).

If by "debug version" we mean a version compiled with -g, it will
be fine.  It shouldn't affect performance at all. All it will do
is make the binary bigger on the disk (not in memory) and make it
take longer to compile.

> What will be the penalties of doing that? (server runs ~10000 queries
> daily)?

This server runs ten thousand queries per day? This is a very low
volume of queries, so unless the queries are hideously complex,
you shouldn't need to worry about performance.

86,400 seconds per day / 10,000 queries works out to one query
every 8.64 seconds. Of course the load will not be perfectly even,
but even so I doubt you'll be seeing a short-term average of more
than one query every two seconds. I recommend you use at least a
386SX-16 system for this. :-)

BTW, there are tools like "top" and "iostat" that can tell you how
loaded your system is. Or you an probably even just look at the
disk lights.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC