RE: [HACKERS] Major bug, possible, with Solaris 7? - Mailing list pgsql-hackers
From | The Hermit Hacker |
---|---|
Subject | RE: [HACKERS] Major bug, possible, with Solaris 7? |
Date | |
Msg-id | Pine.BSF.4.05.9902200047300.59717-100000@thelab.hub.org Whole thread Raw |
In response to | RE: [HACKERS] Major bug, possible, with Solaris 7? ("Daryl W. Dunbar" <daryl@www.com>) |
Responses |
RE: [HACKERS] Major bug, possible, with Solaris 7?
|
List | pgsql-hackers |
On Fri, 19 Feb 1999, Daryl W. Dunbar wrote: > At this point, I willing to try anything. I'm in production (live > site), but we have not announced the site. What that means is that > I have the weekend to debug/fix/decide what to do. I'll take > whatever version you suggest and load it. Apologies for the delay...there is a copy of postgresql-6.4.3beta.tar.gz available in the test directory...try that and please report back here... > > DwD > > > -----Original Message----- > > From: The Hermit Hacker [mailto:scrappy@hub.org] > > Sent: Friday, February 19, 1999 10:39 PM > > To: Daryl W. Dunbar > > Cc: pgsql-hackers@postgreSQL.org > > Subject: RE: [HACKERS] Major bug, possible, with Solaris 7? > > > > > > On Fri, 19 Feb 1999, Daryl W. Dunbar wrote: > > > > > Oh, sorry. 6.4.2 with a backend patch to prevent the > > parent death > > > in the event of MaxBackendID being reached. > > > > > > I know it is in semop() because I did a truss on the child > > > processes. From a small sample, it looks like they may all be > > > trying to operate on the same semaphore. I'm recompiling with > > > the -g flag to gain more insight... > > > > I'm just curious, but is this being used production yet? > > If not, would > > you be willing to try out the current snapshot, which is > > soon to become > > 6.5-BETA? If this apparent bug still exists there, I > > think its sufficient > > a bug to prevent v6.5 coming out until this is fixed > > > then again, > > something this reproducible will most likely hold up > > v6.4.3 from being > > released also, so if we are planning a v6.4.3 (I thought > > we were), we'll > > have to get this fixed in the 6.4 line also. > > > > Actually, with that in mind, I'm putting together a very > > quick tar ball of > > what v6.4.3 is looking like so far. this is *not* a > > release, but I'd like > > to see if this problem exists in the most current STABLE > > tree or not...I > > know there has been quite a few fixes put into it... > > > > Check in about a half hour or so, under the 'test' directory of > > ftp.postgresql.org .. should be there then... > > > > > > > > -----Original Message----- > > > > From: owner-pgsql-hackers@postgreSQL.org > > > > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf > > Of The Hermit > > > > Hacker > > > > Sent: Friday, February 19, 1999 12:46 PM > > > > To: pgsql-hackers@postgreSQL.org > > > > Cc: Daryl W. Dunbar > > > > Subject: [HACKERS] Major bug, possible, with Solaris 7? > > > > > > > > > > > > > > > > Can someone please take a minute to look at this? > > > > > > > > I've gzip'd and moved his errorlog to > > > > ftp.postgresql.org:/pub/debugging...one thing that > > appears to be > > > > lacking...what version of PostgreSQL are you using? > > > > > > > > Marc G. Fournier > > > > Systems Administrator @ hub.org > > > > primary: scrappy@hub.org secondary: > > > > scrappy@{freebsd|postgresql}.org > > > > > > > > ---------- Forwarded message ---------- > > > > Date: Thu, 18 Feb 1999 18:23:25 -0500 > > > > From: Daryl W. Dunbar <daryl@www.com> > > > > To: The Hermit Hacker <scrappy@hub.org> > > > > Subject: RE: Interested? > > > > > > > > Thanks Marc, We exchanged an e-mail or two last > > week, along with > > > > Tatsuo Ishii and Tom Lane. You suggested I truss the process. > > > > > > > > Anyway, periodically, the backends spiral out of > > control with hung > > > > up children until I hit MaxBackendID (which I > > compiled in to be > > > > 128). Initially, I was running out of semaphores on > > Solaris 7 and > > > > changed /etc/system to add these lines: > > > > set shmsys:shminfo_shmmax=16777216 > > > > set shmsys:shminfo_shmmin=1 > > > > set shmsys:shminfo_shmmni=128 > > > > set shmsys:shminfo_shmseg=51 > > > > * > > > > set semsys:seminfo_semmap=128 > > > > set semsys:seminfo_semmni=128 > > > > set semsys:seminfo_semmns=8192 > > > > set semsys:seminfo_semmnu=8192 > > > > set semsys:seminfo_semmsl=64 > > > > set semsys:seminfo_semopm=32 > > > > set semsys:seminfo_semume=32 > > > > > > > > I increased shared memory so I could start more backends... > > > > > > > > OK, so now, everything is running fine and boom, the > > > > backends start > > > > to hang on semop, eventually reaching MaxBackendID > > and refusing > > > > connections. > > > > Attached is a log file from a hang up today. Debug > > is set to 3. > > > > All times are PST. I have carved out a bunch of > > normal operation > > > > from the beginning (about 21,000 lines) and redundant > > 'too many > > > > backends' (about 1,000 lines, while I was eating lunch :) > > > > signified > > > > by {SNIP SNIP}. I pick the log back up with the > > birth of pid 2828 > > > > and left several 'normal' cycles in until... > > > > > > > > You can see that process 2840 is the first child to > > hang. It was > > > > started at 11:39:23 and did not die until sent a 15 by > > > > the parent at > > > > 14:12:16. All of the hung processes fall between > > 2840 and 3454. > > > > > > > > Sorry the file is so big. Here are some 'keys' you can use: > > > > Startup is the first line (obviously). > > > > You can find child startup by looking for [2840] (pid > > in brackets) > > > > You can find child exits by looking for '2480 exited' > > > > You can find where I send the kill signal by looking for > > > > 'pmdie 15' > > > > > > > > I think that's a good start. :) > > > > > > > > Don't hesitate to contact me if I can shed any more > > > > light. I'm wide > > > > open to ideas at the moment. I'm in EST, but tend to > > work until > > > > 10-11 at night, so e-mail anytime. > > > > > > > > Thanks, > > > > > > > > DwD > > > > > > > > > -----Original Message----- > > > > > From: The Hermit Hacker [mailto:scrappy@hub.org] > > > > > Sent: Thursday, February 18, 1999 5:36 PM > > > > > To: Daryl W. Dunbar > > > > > Subject: Re: Interested? > > > > > > > > > > > > > > > > > > > > Hi Daryl... > > > > > > > > > > I'm not the strongest at internal code, so may not > > > > > be of any help > > > > > at all. I just went through my -hackers email, and can't > > > > > seem to find > > > > > anything from you in there. Can you tell me what your > > > > > problem is, as well > > > > > as version of PostgreSQL you are using, and we'll see > > > > > what we can do? > > > > > > > > > > Marc > > > > > > > > > > On Thu, 18 Feb 1999, Daryl W. Dunbar wrote: > > > > > > > > > > > Marc, > > > > > > > > > > > > I know that you put considerable volunteer time into > > > > > PostgreSQL. If > > > > > > I am not too bold in asking, and you are comfortable > > > > > with it, I am > > > > > > prepared to compensate you for your time if you can > > > > assist me in > > > > > > tracking down this rather nasty bug I have been > > > > > e-mailing Hackers > > > > > > about. Please let me know if you are interested and if > > > > > so, at what > > > > > > rate. > > > > > > > > > > > > We are in the process of launching a pretty exciting > > > > site and a > > > > > > database in a integral part of it. I really want to > > > > > use PostgreSQL, > > > > > > but can not take it into production on Solaris with > > > > this problem > > > > > > going on. I'm in the process of installing a test site > > > > > on Linux to > > > > > > see if the problem exists there, but I expect it > > is limited to > > > > > > Solaris. > > > > > > > > > > > > I anxiously await your response. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > DwD > > > > > > > > > > > > -- > > > > > > Daryl W. Dunbar > > > > > > VP of Engineering/Chief Technology Officer > > > > > > http://www.com, Where the Web Begins! > > > > > > mailto:daryl@www.com > > > > > > > > > > > > > > > > > > > > > > Marc G. Fournier > > > > > Systems Administrator @ hub.org > > > > > primary: scrappy@hub.org secondary: > > > > > scrappy@{freebsd|postgresql}.org > > > > > > > > > > > > > > > > > > > > Marc G. Fournier > > Systems Administrator @ hub.org > > primary: scrappy@hub.org secondary: > > scrappy@{freebsd|postgresql}.org > > > Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
pgsql-hackers by date: