Thread: PostgreSQL 7.3.4 gets killed by SIG_KILL
I have this big table running on an old linux install (kernel 2.2.25). I've COPYed some tcpip logs into a table created as such: create table ipstats (time timestamp, src inet, dst inet, npackets int8, nbytes int8); Big: select count(*) from ipstats; count ---------- 99173733 When i do two selects some from that table multiple times, the the backend doing the selects is getting killed by signal 9. The select pair look like: select sum(nbytes) from ipstats where dst = '10.10.10.170'; select sum(nbytes) from ipstats where src = '10.10.10.170'; This is what the serverlog says: LOG: server process (pid 20308) was terminated by signal 9 LOG: terminating any other active server processes LOG: all server processes terminated; reinitializing shared memory and semaphores FATAL: The database system is starting up LOG: database system was interrupted at 2003-12-03 23:21:49 CET FATAL: The database system is starting up LOG: checkpoint record is at 3/9095BC20 LOG: redo record is at 3/9095BC20; undo record is at 0/0; shutdown TRUE LOG: next transaction id: 8716399; next oid: 141842933 LOG: database system was not properly shut down; automatic recovery in progress LOG: ReadRecord: record with zero length at 3/9095BC60 LOG: redo is not required LOG: database system is ready When i attach a gdb to the process it doesn't help, it exits immediatly anyways. This i believe is because SIG_KILL is "unstoppable"... Any ideas as of what to do? Regards Magnus
"Magnus Naeslund(t)" <mag@fbab.net> writes: > I have this big table running on an old linux install (kernel 2.2.25). > I've COPYed some tcpip logs into a table created as such: Linux is probably killing your process because it (the kernel) is low on memory. Unfortunately, this happens more often with older versions of the kernel. Add more RAM/swap or figure out how to make your query use less memory... -Doug
Doug McNaught wrote: > "Magnus Naeslund(t)" <mag@fbab.net> writes: > > >>I have this big table running on an old linux install (kernel 2.2.25). >>I've COPYed some tcpip logs into a table created as such: > > > Linux is probably killing your process because it (the kernel) is low > on memory. Unfortunately, this happens more often with older versions > of the kernel. Add more RAM/swap or figure out how to make your query > use less memory... > > -Doug Well this just isn't the case. There is no printout in kernel logs/dmesg (as it would be if the kernel killed it in an OOM situation). I have 1 GB of RAM, and 1.5 GB of swap (swap never touched). When running the query i have about 850 MB sitting in kernel cache, the postgres process takes about 40MB of memory, and the ipcs -m command shows that postgresql is taking 41508864 bytes of shared memory. There is no sorting or index lookups going on, the query is simple. I just had an power outage, i'll check if it maybe wised up after reboot or something, but i doubt it. Is it possible to somehow find out what process sent the KILL (or if it's the kernel) ? I find this very weird to say the least... Magnus
On Thu, 04 Dec 2003 03:35:49 +0100 "Magnus Naeslund(t)" <mag@fbab.net> wrote: > > Well this just isn't the case. > There is no printout in kernel logs/dmesg (as it would be if the > kernel killed it in an OOM situation). > I have 1 GB of RAM, and 1.5 GB of swap (swap never touched). > Do you have any system monitoring scripts that may be killing it as it may look like a "runaway" process? We've had this happen to us before. You tend to forget about things like that. -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
"Magnus Naeslund(t)" <mag@fbab.net> writes: > Doug McNaught wrote: >> Linux is probably killing your process because it (the kernel) is low >> on memory. Unfortunately, this happens more often with older versions >> of the kernel. Add more RAM/swap or figure out how to make your query >> use less memory... >> -Doug > > Well this just isn't the case. > There is no printout in kernel logs/dmesg (as it would be if the > kernel killed it in an OOM situation). > I have 1 GB of RAM, and 1.5 GB of swap (swap never touched). Ahh, that's an additional piece of information hat you didn't supply earlier. ;) Though your system memory is ample, is it possible that you're hitting a ulimit() on the stack size or heap size or something? I'm not sure what signal you'd get in such a case, though. > Is it possible to somehow find out what process sent the KILL (or if > it's the kernel) ? Not that I know of, unless it's in a logfile somewhere. You could try strace(8) on the backend running the query--that might give you some more info. > > I find this very weird to say the least... Yah. You might also consider running a more recent kernel, especially with such a big machine. 2.2.X never did play that well with large amounts of RAM... -Doug
Jeff wrote: > > > Do you have any system monitoring scripts that may be killing it as it > may look like a "runaway" process? > > We've had this happen to us before. You tend to forget about things like > that. > This got me thinking, and i rechecked all possibilities. It turned out that we changed rlimit policies earlier and the "default" cpu time limits bleeded over to postgres since it didn't have a negating entry in the pam limits control. Since the startup scripts use "su - postgres -c cmd" it "logged in" and so got the now default cpu time values. So it was only a mindbug, and thats good :) Magnus