Thread: FATAL 1

FATAL 1

From
newsreader@mediaone.net
Date:
I found a couple of entries
in my dmesg which stated
that linux (kernel 2.4) killed
two postmasters because the system
ran out of memory.

I think that postmaster should
log such instances as FATAL 1. I am
trying to recover the approximate
date and time these FATAL 1 entries are made
so that I can figure out what went
wrong.  But the postgres log file I have
don't have time stamps.

Does anyone know how I can recover
these time stamps?  If not is there a log
level for which time stamps will be made?

Thanks


Re: FATAL 1

From
Peter Eisentraut
Date:
newsreader@mediaone.net writes:

> I found a couple of entries in my dmesg which stated that linux
> (kernel 2.4) killed two postmasters because the system ran out of
> memory.
>
> I think that postmaster should log such instances as FATAL 1.

IIRC, the kernel sends a SIGKILL signal in that case, so the affected
application doesn't have a chance to react, it just gets terminated
immediately.  If you want to monitor these events better you need to ask
your kernel for help.  If the postmaster gets terminated normally, for
various definitions of normal, you will get log entries.

> Does anyone know how I can recover these time stamps?  If not is there
> a log level for which time stamps will be made?

You can turn on time stamps in the postgresql.conf file, but that won't
help you in this case.

--
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter


Re: FATAL 1

From
newsreader@mediaone.net
Date:
On Fri, Aug 10, 2001 at 12:42:33AM +0200, Peter Eisentraut wrote:
> newsreader@mediaone.net writes:
>
> > I think that postmaster should log such instances as FATAL 1.
>
> IIRC, the kernel sends a SIGKILL signal in that case, so the affected
> application doesn't have a chance to react, it just gets terminated
> immediately.  If you want to monitor these events better you need to ask

Ok here is what I find in dmesg

------------
Out of Memory: Killed process 17534 (postmaster).
Out of Memory: Killed process 18228 (postmaster)
-----------

I think backends got killed instead of postmaster
Fact is postmaster did not die; it is still
running now and apparently survived the
out of memory event


Re: FATAL 1

From
newsreader@mediaone.net
Date:
On Thu, Aug 09, 2001 at 11:19:14PM -0400, Tom Lane wrote:
> newsreader@mediaone.net writes:
> > I think backends got killed instead of postmaster
>
>
> Assuming that it was a backend that got killed, the postmaster should
> certainly have seen and logged that event.  What are you doing with
> the postmaster log?

pg 7.1.2.

kernel 2.4.4

I have the log.  But they don't make
any sense to me.

I deliberately kill a backend
on my development box and notice
that FATAL 1 entry in the log
So I did

$ grep 'FATAL 1' log

and get a number of entries
but it is not very informative

I start my postmaster like

$ /usr/local/pg7.1/bin/pg_ctl -o "-F -i -h 192.168.0.1" start -l log

and I did not particularly adjusted the debug/log level
Only default...

Thanks

kz

Re: FATAL 1

From
newsreader@mediaone.net
Date:
After carefull looking at the pg log
I don't think any of FATAL 1 entries
do not coincide with those in
dmesg

But I am not sure



Re: FATAL 1

From
Peter Eisentraut
Date:
Tom Lane writes:

> Probably.  If your *postmaster* is running out of memory then you have
> some serious problems.  (But: we have had memory-leak problems in the
> past in certain authentication paths.  What PG version are you running,
> anyway?)

I think the behaviour is that if the system as a whole runs out of memory
(physical + swap or whatever) the kernel randomly kills processes to make
room.  So a disappearing postmaster is not necessarily a sign of a fault
on PostgreSQL's part.  I'm not sure how to configure the kernel in this
area, I just recall the discussion on the kernel mailing list about
whether the init process would be allowed to be randomly killed as well.
(I kid you not.)

--
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter