Thread: Back end dies...
On running a query which really taxes the system, the backend dies after a day or two and I get this notice... This is with PG 7.2. WHat's up? NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost --Arsalan ------------------------------------------------------------------- People often hate those things which they do not know, or cannot understand. --Ali Ibn Abi Talib (A.S.)
On Sat, 16 Mar 2002, Arsalan Zaidi wrote: > On running a query which really taxes the system, the backend dies after a > day or two and I get this notice... > > This is with PG 7.2. > > WHat's up? Does your server log have any entries at around that point for the actual backend death? Do you have a core file?
> > Does your server log have any entries at around that point for the actual > backend death? Do you have a core file? No core file, but here's the stuff from the log. Strange, I don't remember kill -9'ing the backend... > DEBUG: server process (pid 26460) was terminated by signal 9 DEBUG: terminating any other active server processes NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. DEBUG: all server processes terminated; reinitializing shared memory and semaphores DEBUG: database system was interrupted at 2002-03-15 20:56:17 IST DEBUG: checkpoint record is at 33/CBC5968 DEBUG: redo record is at 33/CBC5968; undo record is at 0/0; shutdown FALSE DEBUG: next transaction id: 5946; next oid: 783000267 DEBUG: database system was not properly shut down; automatic recovery in progress DEBUG: ReadRecord: record with zero length at 33/CBC59A8 DEBUG: redo is not required DEBUG: database system is ready
Could this be happening if I'm running out of Mem+Swap in the middle of a query? --Arsalan. > > > > Does your server log have any entries at around that point for the actual > > backend death? Do you have a core file? > > No core file, but here's the stuff from the log. Strange, I don't remember > kill -9'ing the backend... > > > DEBUG: server process (pid 26460) was terminated by signal 9 > DEBUG: terminating any other active server processes > NOTICE: Message from PostgreSQL backend: > The Postmaster has informed me that some other backend > died abnormally and possibly corrupted shared memory. > I have rolled back the current transaction and am > going to terminate your database system connection and exit. > Please reconnect to the database system and repeat your query. > DEBUG: all server processes terminated; reinitializing shared memory and > semaphores > DEBUG: database system was interrupted at 2002-03-15 20:56:17 IST > DEBUG: checkpoint record is at 33/CBC5968 > DEBUG: redo record is at 33/CBC5968; undo record is at 0/0; shutdown FALSE > DEBUG: next transaction id: 5946; next oid: 783000267 > DEBUG: database system was not properly shut down; automatic recovery in > progress > DEBUG: ReadRecord: record with zero length at 33/CBC59A8 > DEBUG: redo is not required > DEBUG: database system is ready > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
On Sat, 2002-03-16 at 07:32, Arsalan Zaidi wrote: > > > > Does your server log have any entries at around that point for the actual > > backend death? Do you have a core file? > > No core file, but here's the stuff from the log. Strange, I don't remember > kill -9'ing the backend... > And this happened after a couple of days running a single query? I would suspect that your system[*] is imposing resource limits, and the backend exceeded its limit for CPU time. (It would be better for the resource limiting to try an ordinary kill first before using kill -9, though...) Or as you suggest, you may be running out of some other resource -I would check ordinary syslog/kernel messages -out of memory process killing might be reported there. Regards John [*] or sysadmin.
> > And this happened after a couple of days running a single query? I would > suspect that your system[*] is imposing resource limits, and the backend > exceeded its limit for CPU time. (It would be better for the resource > limiting to try an ordinary kill first before using kill -9, though...) > > Or as you suggest, you may be running out of some other resource -I > would check ordinary syslog/kernel messages -out of memory process > killing might be reported there. Bang on... It seems I'm running out of mem during the query... Looks like I'll have to add some swap space :-) Thanks. --Arsalan.