RE: [HACKERS] Major bug, possible, with Solaris 7? - Mailing list pgsql-hackers
From | Daryl W. Dunbar |
---|---|
Subject | RE: [HACKERS] Major bug, possible, with Solaris 7? |
Date | |
Msg-id | 002201be5c58$8928c320$1445e59b@ddunbar.eni.net Whole thread Raw |
In response to | Major bug, possible, with Solaris 7? (The Hermit Hacker <scrappy@hub.org>) |
Responses |
RE: [HACKERS] Major bug, possible, with Solaris 7?
|
List | pgsql-hackers |
Oh, sorry. 6.4.2 with a backend patch to prevent the parent death in the event of MaxBackendID being reached. I know it is in semop() because I did a truss on the child processes. From a small sample, it looks like they may all be trying to operate on the same semaphore. I'm recompiling with the -g flag to gain more insight... DwD > -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of The Hermit > Hacker > Sent: Friday, February 19, 1999 12:46 PM > To: pgsql-hackers@postgreSQL.org > Cc: Daryl W. Dunbar > Subject: [HACKERS] Major bug, possible, with Solaris 7? > > > > Can someone please take a minute to look at this? > > I've gzip'd and moved his errorlog to > ftp.postgresql.org:/pub/debugging...one thing that appears to be > lacking...what version of PostgreSQL are you using? > > Marc G. Fournier > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: > scrappy@{freebsd|postgresql}.org > > ---------- Forwarded message ---------- > Date: Thu, 18 Feb 1999 18:23:25 -0500 > From: Daryl W. Dunbar <daryl@www.com> > To: The Hermit Hacker <scrappy@hub.org> > Subject: RE: Interested? > > Thanks Marc, We exchanged an e-mail or two last week, along with > Tatsuo Ishii and Tom Lane. You suggested I truss the process. > > Anyway, periodically, the backends spiral out of control with hung > up children until I hit MaxBackendID (which I compiled in to be > 128). Initially, I was running out of semaphores on Solaris 7 and > changed /etc/system to add these lines: > set shmsys:shminfo_shmmax=16777216 > set shmsys:shminfo_shmmin=1 > set shmsys:shminfo_shmmni=128 > set shmsys:shminfo_shmseg=51 > * > set semsys:seminfo_semmap=128 > set semsys:seminfo_semmni=128 > set semsys:seminfo_semmns=8192 > set semsys:seminfo_semmnu=8192 > set semsys:seminfo_semmsl=64 > set semsys:seminfo_semopm=32 > set semsys:seminfo_semume=32 > > I increased shared memory so I could start more backends... > > OK, so now, everything is running fine and boom, the > backends start > to hang on semop, eventually reaching MaxBackendID and refusing > connections. > Attached is a log file from a hang up today. Debug is set to 3. > All times are PST. I have carved out a bunch of normal operation > from the beginning (about 21,000 lines) and redundant 'too many > backends' (about 1,000 lines, while I was eating lunch :) > signified > by {SNIP SNIP}. I pick the log back up with the birth of pid 2828 > and left several 'normal' cycles in until... > > You can see that process 2840 is the first child to hang. It was > started at 11:39:23 and did not die until sent a 15 by > the parent at > 14:12:16. All of the hung processes fall between 2840 and 3454. > > Sorry the file is so big. Here are some 'keys' you can use: > Startup is the first line (obviously). > You can find child startup by looking for [2840] (pid in brackets) > You can find child exits by looking for '2480 exited' > You can find where I send the kill signal by looking for > 'pmdie 15' > > I think that's a good start. :) > > Don't hesitate to contact me if I can shed any more > light. I'm wide > open to ideas at the moment. I'm in EST, but tend to work until > 10-11 at night, so e-mail anytime. > > Thanks, > > DwD > > > -----Original Message----- > > From: The Hermit Hacker [mailto:scrappy@hub.org] > > Sent: Thursday, February 18, 1999 5:36 PM > > To: Daryl W. Dunbar > > Subject: Re: Interested? > > > > > > > > Hi Daryl... > > > > I'm not the strongest at internal code, so may not > > be of any help > > at all. I just went through my -hackers email, and can't > > seem to find > > anything from you in there. Can you tell me what your > > problem is, as well > > as version of PostgreSQL you are using, and we'll see > > what we can do? > > > > Marc > > > > On Thu, 18 Feb 1999, Daryl W. Dunbar wrote: > > > > > Marc, > > > > > > I know that you put considerable volunteer time into > > PostgreSQL. If > > > I am not too bold in asking, and you are comfortable > > with it, I am > > > prepared to compensate you for your time if you can > assist me in > > > tracking down this rather nasty bug I have been > > e-mailing Hackers > > > about. Please let me know if you are interested and if > > so, at what > > > rate. > > > > > > We are in the process of launching a pretty exciting > site and a > > > database in a integral part of it. I really want to > > use PostgreSQL, > > > but can not take it into production on Solaris with > this problem > > > going on. I'm in the process of installing a test site > > on Linux to > > > see if the problem exists there, but I expect it is limited to > > > Solaris. > > > > > > I anxiously await your response. > > > > > > Thanks, > > > > > > DwD > > > > > > -- > > > Daryl W. Dunbar > > > VP of Engineering/Chief Technology Officer > > > http://www.com, Where the Web Begins! > > > mailto:daryl@www.com > > > > > > > > > > Marc G. Fournier > > Systems Administrator @ hub.org > > primary: scrappy@hub.org secondary: > > scrappy@{freebsd|postgresql}.org > > > >
pgsql-hackers by date: