More postmaster troubles - Mailing list pgsql-hackers

From Daryl W. Dunbar
Subject More postmaster troubles
Date
Msg-id 002e01be569d$e4debf50$1445e59b@ddunbar.eni.net
Whole thread Raw
Responses Re: [HACKERS] More postmaster troubles  (Tatsuo Ishii <t-ishii@sra.co.jp>)
List pgsql-hackers
Hello again,

Thanks again to those who pointed me to the semaphore problem.  I,
unfortunately have another problem:

Solaris7 on a Sparc20 running 6.4.2.  Occasionally (once or twice a
day) under a very light load, brain-dead child processes begin to
accumulate in my system.  If left unchecked, eventually the parent
process runs out of resources and dies, orphaning all the lost
processes.  (Now that I have solved the semaphore error, it appears
to be the backend limit of 64 processes.)

Here is a snapshot of truss on some of the processes:
# truss -p 5879
semop(259915776, 0xEFFFC560, 1) (sleeping...)
# truss -p 5912
semop(259915776, 0xEFFFC190, 1) (sleeping...)
# truss -p 5915
semop(259915776, 0xEFFFC190, 1) (sleeping...)
# truss -p 5931
semop(259915776, 0xEFFFC280, 1) (sleeping...)
# truss -p 5926
semop(259915776, 0xEFFFC280, 1) (sleeping...)

They all appear to be waiting on a semaphore operation which
apparently never happens.  The number of stalled processes grows
rapidly (it has gone from 12 to 21 while I wrote this e-mail).

The stalled processes all started between 6:57am PST and 7:18am PST,
here is what postmaster wrote to the log:
Feb 12 06:56:46 constantinople POSTMASTER: FATAL: pq_putnchar:
fputc() failed: errno=32
Feb 12 06:57:42 constantinople POSTMASTER: NOTICE:  Deadlock
detected -- See the lock(l) manual page for a possible cause.
Feb 12 06:57:42 constantinople POSTMASTER: ERROR:  WaitOnLock: error
on wakeup - Aborting this transaction
Feb 12 06:57:42 constantinople POSTMASTER: NOTICE:  Deadlock
detected -- See the lock(l) manual page for a possible cause.
Feb 12 06:57:42 constantinople POSTMASTER: ERROR:  WaitOnLock: error
on wakeup - Aborting this transaction
Feb 12 07:02:18 constantinople POSTMASTER: FATAL: pq_putnchar:
fputc() failed: errno=32
Feb 12 07:02:19 constantinople last message repeated 2 times

Most of the time, things just work, but it appears that once
somethins has gone awry, I experience a spiraling death.

Thoughts?  Suggestions?  Help? :)

DwD
--
Daryl W. Dunbar
http://www.com, Where the Web Begins!
mailto:daryl@www.com



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: [HACKERS] Backend problem with large objects
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Optimizer is fixed, and faster