RE: [HACKERS] Major bug, possible, with Solaris 7? - Mailing list pgsql-hackers

From The Hermit Hacker
Subject RE: [HACKERS] Major bug, possible, with Solaris 7?
Date
Msg-id Pine.BSF.4.05.9902200047300.59717-100000@thelab.hub.org
Whole thread Raw
In response to RE: [HACKERS] Major bug, possible, with Solaris 7?  ("Daryl W. Dunbar" <daryl@www.com>)
Responses RE: [HACKERS] Major bug, possible, with Solaris 7?
List pgsql-hackers
On Fri, 19 Feb 1999, Daryl W. Dunbar wrote:

> At this point, I willing to try anything.  I'm in production (live
> site), but we have not announced the site.  What that means is that
> I have the weekend to debug/fix/decide what to do.  I'll take
> whatever version you suggest and load it.

Apologies for the delay...there is a copy of postgresql-6.4.3beta.tar.gz
available in the test directory...try that and please report back here...


> 
> DwD
> 
> > -----Original Message-----
> > From: The Hermit Hacker [mailto:scrappy@hub.org]
> > Sent: Friday, February 19, 1999 10:39 PM
> > To: Daryl W. Dunbar
> > Cc: pgsql-hackers@postgreSQL.org
> > Subject: RE: [HACKERS] Major bug, possible, with Solaris 7?
> >
> >
> > On Fri, 19 Feb 1999, Daryl W. Dunbar wrote:
> >
> > > Oh, sorry.  6.4.2 with a backend patch to prevent the
> > parent death
> > > in the event of MaxBackendID being reached.
> > >
> > > I know it is in semop() because I did a truss on the child
> > > processes.  From a small sample, it looks like they may all be
> > > trying to operate on the same semaphore.  I'm recompiling with
> > > the -g flag to gain more insight...
> >
> > I'm just curious, but is this being used production yet?
> > If not, would
> > you be willing to try out the current snapshot, which is
> > soon to become
> > 6.5-BETA?  If this apparent bug still exists there, I
> > think its sufficient
> > a bug to prevent v6.5 coming out until this is fixed
> 
> > then again,
> > something this reproducible will most likely hold up
> > v6.4.3 from being
> > released also, so if we are planning a v6.4.3 (I thought
> > we were), we'll
> > have to get this fixed in the 6.4 line also.
> >
> > Actually, with that in mind, I'm putting together a very
> > quick tar ball of
> > what v6.4.3 is looking like so far.  this is *not* a
> > release, but I'd like
> > to see if this problem exists in the most current STABLE
> > tree or not...I
> > know there has been quite a few fixes put into it...
> >
> > Check in about a half hour or so, under the 'test' directory of
> > ftp.postgresql.org .. should be there then...
> >
> >
> > > > -----Original Message-----
> > > > From: owner-pgsql-hackers@postgreSQL.org
> > > > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf
> > Of The Hermit
> > > > Hacker
> > > > Sent: Friday, February 19, 1999 12:46 PM
> > > > To: pgsql-hackers@postgreSQL.org
> > > > Cc: Daryl W. Dunbar
> > > > Subject: [HACKERS] Major bug, possible, with Solaris 7?
> > > >
> > > >
> > > >
> > > > Can someone please take a minute to look at this?
> > > >
> > > > I've gzip'd and moved his errorlog to
> > > > ftp.postgresql.org:/pub/debugging...one thing that
> > appears to be
> > > > lacking...what version of PostgreSQL are you using?
> > > >
> > > > Marc G. Fournier
> > > > Systems Administrator @ hub.org
> > > > primary: scrappy@hub.org           secondary:
> > > > scrappy@{freebsd|postgresql}.org
> > > >
> > > > ---------- Forwarded message ----------
> > > > Date: Thu, 18 Feb 1999 18:23:25 -0500
> > > > From: Daryl W. Dunbar <daryl@www.com>
> > > > To: The Hermit Hacker <scrappy@hub.org>
> > > > Subject: RE: Interested?
> > > >
> > > > Thanks Marc,  We exchanged an e-mail or two last
> > week, along with
> > > > Tatsuo Ishii and Tom Lane.  You suggested I truss the process.
> > > >
> > > > Anyway, periodically, the backends spiral out of
> > control with hung
> > > > up children until I hit MaxBackendID (which I
> > compiled in to be
> > > > 128).  Initially, I was running out of semaphores on
> > Solaris 7 and
> > > > changed /etc/system to add these lines:
> > > > set shmsys:shminfo_shmmax=16777216
> > > > set shmsys:shminfo_shmmin=1
> > > > set shmsys:shminfo_shmmni=128
> > > > set shmsys:shminfo_shmseg=51
> > > > *
> > > > set semsys:seminfo_semmap=128
> > > > set semsys:seminfo_semmni=128
> > > > set semsys:seminfo_semmns=8192
> > > > set semsys:seminfo_semmnu=8192
> > > > set semsys:seminfo_semmsl=64
> > > > set semsys:seminfo_semopm=32
> > > > set semsys:seminfo_semume=32
> > > >
> > > > I increased shared memory so I could start more backends...
> > > >
> > > > OK, so now, everything is running fine and boom, the
> > > > backends start
> > > > to hang on semop, eventually reaching MaxBackendID
> > and refusing
> > > > connections.
> > > > Attached is a log file from a hang up today.  Debug
> > is set to 3.
> > > > All times are PST.  I have carved out a bunch of
> > normal operation
> > > > from the beginning (about 21,000 lines) and redundant
> > 'too many
> > > > backends' (about 1,000 lines, while I was eating lunch :)
> > > > signified
> > > > by {SNIP SNIP}.  I pick the log back up with the
> > birth of pid 2828
> > > > and left several 'normal' cycles in until...
> > > >
> > > > You can see that process 2840 is the first child to
> > hang.  It was
> > > > started at 11:39:23 and did not die until sent a 15 by
> > > > the parent at
> > > > 14:12:16.  All of the hung processes fall between
> > 2840 and 3454.
> > > >
> > > > Sorry the file is so big.  Here are some 'keys' you can use:
> > > > Startup is the first line (obviously).
> > > > You can find child startup by looking for [2840] (pid
> > in brackets)
> > > > You can find child exits by looking for '2480 exited'
> > > > You can find where I send the kill signal by looking for
> > > > 'pmdie 15'
> > > >
> > > > I think that's a good start. :)
> > > >
> > > > Don't hesitate to contact me if I can shed any more
> > > > light.  I'm wide
> > > > open to ideas at the moment.  I'm in EST, but tend to
> > work until
> > > > 10-11 at night, so e-mail anytime.
> > > >
> > > > Thanks,
> > > >
> > > > DwD
> > > >
> > > > > -----Original Message-----
> > > > > From: The Hermit Hacker [mailto:scrappy@hub.org]
> > > > > Sent: Thursday, February 18, 1999 5:36 PM
> > > > > To: Daryl W. Dunbar
> > > > > Subject: Re: Interested?
> > > > >
> > > > >
> > > > >
> > > > > Hi Daryl...
> > > > >
> > > > >     I'm not the strongest at internal code, so may not
> > > > > be of any help
> > > > > at all.  I just went through my -hackers email, and can't
> > > > > seem to find
> > > > > anything from you in there.  Can you tell me what your
> > > > > problem is, as well
> > > > > as version of PostgreSQL you are using, and we'll see
> > > > > what we can do?
> > > > >
> > > > > Marc
> > > > >
> > > > > On Thu, 18 Feb 1999, Daryl W. Dunbar wrote:
> > > > >
> > > > > > Marc,
> > > > > >
> > > > > > I know that you put considerable volunteer time into
> > > > > PostgreSQL.  If
> > > > > > I am not too bold in asking, and you are comfortable
> > > > > with it, I am
> > > > > > prepared to compensate you for your time if you can
> > > > assist me in
> > > > > > tracking down this rather nasty bug I have been
> > > > > e-mailing Hackers
> > > > > > about.  Please let me know if you are interested and if
> > > > > so, at what
> > > > > > rate.
> > > > > >
> > > > > > We are in the process of launching a pretty exciting
> > > > site and a
> > > > > > database in a integral part of it.  I really want to
> > > > > use PostgreSQL,
> > > > > > but can not take it into production on Solaris with
> > > > this problem
> > > > > > going on.  I'm in the process of installing a test site
> > > > > on Linux to
> > > > > > see if the problem exists there, but I expect it
> > is limited to
> > > > > > Solaris.
> > > > > >
> > > > > > I anxiously await your response.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > DwD
> > > > > >
> > > > > > --
> > > > > > Daryl W. Dunbar
> > > > > > VP of Engineering/Chief Technology Officer
> > > > > > http://www.com, Where the Web Begins!
> > > > > > mailto:daryl@www.com
> > > > > >
> > > > > >
> > > > >
> > > > > Marc G. Fournier
> > > > > Systems Administrator @ hub.org
> > > > > primary: scrappy@hub.org           secondary:
> > > > > scrappy@{freebsd|postgresql}.org
> > > > >
> > > >
> > > >
> > >
> >
> > Marc G. Fournier
> > Systems Administrator @ hub.org
> > primary: scrappy@hub.org           secondary:
> > scrappy@{freebsd|postgresql}.org
> >
> 

Marc G. Fournier                                
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



pgsql-hackers by date:

Previous
From: "Daryl W. Dunbar"
Date:
Subject: RE: [HACKERS] Major bug, possible, with Solaris 7?
Next
From: Roland Roberts
Date:
Subject: Re: [HACKERS] lower() broken?