Thread: PostgreSQL 7.3.4 gets killed by SIG_KILL

PostgreSQL 7.3.4 gets killed by SIG_KILL

From

"Magnus Naeslund(t)"

Date:

03 December 2003, 22:01:36

I have this big table running on an old linux install (kernel 2.2.25).
I've COPYed some tcpip logs into a table created as such:

create table ipstats (time timestamp, src inet, dst inet, npackets int8, 
nbytes int8);

Big:
select count(*) from ipstats; 
  count
---------- 99173733

When i do two selects some from that table multiple times, the the 
backend doing the selects is getting killed by signal 9.

The select pair look like:
select sum(nbytes) from ipstats where dst = '10.10.10.170';
select sum(nbytes) from ipstats where src = '10.10.10.170';

This is what the serverlog says:

LOG:  server process (pid 20308) was terminated by signal 9
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing shared memory and 
semaphores
FATAL:  The database system is starting up
LOG:  database system was interrupted at 2003-12-03 23:21:49 CET
FATAL:  The database system is starting up
LOG:  checkpoint record is at 3/9095BC20
LOG:  redo record is at 3/9095BC20; undo record is at 0/0; shutdown TRUE
LOG:  next transaction id: 8716399; next oid: 141842933
LOG:  database system was not properly shut down; automatic recovery in 
progress
LOG:  ReadRecord: record with zero length at 3/9095BC60
LOG:  redo is not required
LOG:  database system is ready

When i attach a gdb to the process it doesn't help, it exits immediatly 
anyways.
This i believe is because SIG_KILL is "unstoppable"...

Any ideas as of what to do?

Regards
Magnus

Re: PostgreSQL 7.3.4 gets killed by SIG_KILL

From

Doug McNaught

Date:

03 December 2003, 22:37:19

"Magnus Naeslund(t)" <mag@fbab.net> writes:

> I have this big table running on an old linux install (kernel 2.2.25).
> I've COPYed some tcpip logs into a table created as such:

Linux is probably killing your process because it (the kernel) is low
on memory.  Unfortunately, this happens more often with older versions
of the kernel.  Add more RAM/swap or figure out how to make your query
use less memory...

-Doug

Re: PostgreSQL 7.3.4 gets killed by SIG_KILL

From

"Magnus Naeslund(t)"

Date:

04 December 2003, 01:36:20

Doug McNaught wrote:
> "Magnus Naeslund(t)" <mag@fbab.net> writes:
> 
> 
>>I have this big table running on an old linux install (kernel 2.2.25).
>>I've COPYed some tcpip logs into a table created as such:
> 
> 
> Linux is probably killing your process because it (the kernel) is low
> on memory.  Unfortunately, this happens more often with older versions
> of the kernel.  Add more RAM/swap or figure out how to make your query
> use less memory...
> 
> -Doug

Well this just isn't the case.
There is no printout in kernel logs/dmesg (as it would be if the kernel 
killed it in an OOM situation).
I have 1 GB of RAM, and 1.5 GB of swap (swap never touched).

When running the query i have about 850 MB sitting in kernel cache, the 
postgres process takes about 40MB of memory, and the ipcs -m command 
shows that postgresql is taking 41508864 bytes of shared memory.

There is no sorting or index lookups going on, the query is simple.
I just had an power outage, i'll check if it maybe wised up after reboot  or something, but i doubt it.

Is it possible to somehow find out what process sent the KILL (or if 
it's the kernel) ?

I find this very weird to say the least...

Magnus

Re: PostgreSQL 7.3.4 gets killed by SIG_KILL

From

Jeff

Date:

04 December 2003, 12:01:10

On Thu, 04 Dec 2003 03:35:49 +0100
"Magnus Naeslund(t)" <mag@fbab.net> wrote:

> 
> Well this just isn't the case.
> There is no printout in kernel logs/dmesg (as it would be if the
> kernel killed it in an OOM situation).
> I have 1 GB of RAM, and 1.5 GB of swap (swap never touched).
> 

Do you have any system monitoring scripts that may be killing it as it
may look like a "runaway" process?

We've had this happen to us before. You tend to forget about things like
that.

-- 
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Re: PostgreSQL 7.3.4 gets killed by SIG_KILL

From

Doug McNaught

Date:

04 December 2003, 12:03:19

"Magnus Naeslund(t)" <mag@fbab.net> writes:

> Doug McNaught wrote:

>> Linux is probably killing your process because it (the kernel) is low
>> on memory.  Unfortunately, this happens more often with older versions
>> of the kernel.  Add more RAM/swap or figure out how to make your query
>> use less memory...
>> -Doug
>
> Well this just isn't the case.
> There is no printout in kernel logs/dmesg (as it would be if the
> kernel killed it in an OOM situation).
> I have 1 GB of RAM, and 1.5 GB of swap (swap never touched).

Ahh, that's an additional piece of information hat you didn't supply
earlier.  ;)  

Though your system memory is ample, is it possible that you're hitting
a ulimit() on the stack size or heap size or something?  I'm not sure
what signal you'd get in such a case, though.

> Is it possible to somehow find out what process sent the KILL (or if
> it's the kernel) ?

Not that I know of, unless it's in a logfile somewhere.  You could try
strace(8) on the backend running the query--that might give you some
more info.

>
> I find this very weird to say the least...

Yah.  You might also consider running a more recent kernel, especially
with such a big machine.  2.2.X never did play that well with large
amounts of RAM...

-Doug

Re: PostgreSQL 7.3.4 gets killed by SIG_KILL [SOLVED]

From

"Magnus Naeslund(t)"

Date:

04 December 2003, 19:49:38

Jeff wrote:
> 
> 
> Do you have any system monitoring scripts that may be killing it as it
> may look like a "runaway" process?
> 
> We've had this happen to us before. You tend to forget about things like
> that.
> 

This got me thinking, and i rechecked all possibilities.
It turned out that we changed rlimit policies earlier and the "default" 
cpu time limits bleeded over to postgres since it didn't have a negating 
entry in the pam limits control.
Since the startup scripts use "su - postgres -c cmd" it "logged in" and 
so got the now default cpu time values.

So it was only a mindbug, and thats good :)

Magnus