RE: [HACKERS] Major bug, possible, with Solaris 7? - Mailing list pgsql-hackers

From Daryl W. Dunbar
Subject RE: [HACKERS] Major bug, possible, with Solaris 7?
Date
Msg-id 004501be5d06$66c81c50$1445e59b@ddunbar.eni.net
Whole thread Raw
In response to RE: [HACKERS] Major bug, possible, with Solaris 7?  ("Daryl W. Dunbar" <daryl@www.com>)
Responses Re: [HACKERS] Major bug, possible, with Solaris 7?
List pgsql-hackers
Problem still exists in 6.4.3.

I am wondering, since gdb can not give me any information on the
location of my hang (I get lots of ??'s) and all I can see is
semsys(), am I spinning in a system library?  Does anyone have
access to the Solaris7 patches?  I see one kernel patch out there,
but I can not access the description, nor download the patch,
because it is not considered in the recommended or security list.
I'm talking to my rep on this on Monday!

For reference, I can provide a syslog and truss of the 6.4.3
failure, but I expect it looks just about like the 6.4.2 one.

Thanks,

DwD

> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of
> Daryl W. Dunbar
> Sent: Saturday, February 20, 1999 11:26 AM
> To: The Hermit Hacker
> Cc: pgsql-hackers@postgreSQL.org
> Subject: RE: [HACKERS] Major bug, possible, with Solaris 7?
>
>
> OK.  I'm running 6.4.3beta (after patching the code to compile -
> patches attached).  Now we wait to see if it breaks again...
>
> DwD
>
>
> > -----Original Message-----
> > From: The Hermit Hacker [mailto:scrappy@hub.org]
> > Sent: Friday, February 19, 1999 11:48 PM
> > To: Daryl W. Dunbar
> > Cc: pgsql-hackers@postgreSQL.org
> > Subject: RE: [HACKERS] Major bug, possible, with Solaris 7?
> >
> >
> > On Fri, 19 Feb 1999, Daryl W. Dunbar wrote:
> >
> > > At this point, I willing to try anything.  I'm in
> > production (live
> > > site), but we have not announced the site.  What that
> > means is that
> > > I have the weekend to debug/fix/decide what to do.  I'll take
> > > whatever version you suggest and load it.
> >
> > Apologies for the delay...there is a copy of
> > postgresql-6.4.3beta.tar.gz
> > available in the test directory...try that and please
> > report back here...
> >
> >
> > >
> > > DwD
> > >
> > > > -----Original Message-----
> > > > From: The Hermit Hacker [mailto:scrappy@hub.org]
> > > > Sent: Friday, February 19, 1999 10:39 PM
> > > > To: Daryl W. Dunbar
> > > > Cc: pgsql-hackers@postgreSQL.org
> > > > Subject: RE: [HACKERS] Major bug, possible, with Solaris 7?
> > > >
> > > >
> > > > On Fri, 19 Feb 1999, Daryl W. Dunbar wrote:
> > > >
> > > > > Oh, sorry.  6.4.2 with a backend patch to prevent the
> > > > parent death
> > > > > in the event of MaxBackendID being reached.
> > > > >
> > > > > I know it is in semop() because I did a truss on the child
> > > > > processes.  From a small sample, it looks like they
> > may all be
> > > > > trying to operate on the same semaphore.  I'm
> > recompiling with
> > > > > the -g flag to gain more insight...
> > > >
> > > > I'm just curious, but is this being used production yet?
> > > > If not, would
> > > > you be willing to try out the current snapshot, which is
> > > > soon to become
> > > > 6.5-BETA?  If this apparent bug still exists there, I
> > > > think its sufficient
> > > > a bug to prevent v6.5 coming out until this is fixed
> > >
> > > > then again,
> > > > something this reproducible will most likely hold up
> > > > v6.4.3 from being
> > > > released also, so if we are planning a v6.4.3 (I thought
> > > > we were), we'll
> > > > have to get this fixed in the 6.4 line also.
> > > >
> > > > Actually, with that in mind, I'm putting together a very
> > > > quick tar ball of
> > > > what v6.4.3 is looking like so far.  this is *not* a
> > > > release, but I'd like
> > > > to see if this problem exists in the most current STABLE
> > > > tree or not...I
> > > > know there has been quite a few fixes put into it...
> > > >
> > > > Check in about a half hour or so, under the 'test'
> > directory of
> > > > ftp.postgresql.org .. should be there then...
> > > >
> > > >
> > > > > > -----Original Message-----
> > > > > > From: owner-pgsql-hackers@postgreSQL.org
> > > > > > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf
> > > > Of The Hermit
> > > > > > Hacker
> > > > > > Sent: Friday, February 19, 1999 12:46 PM
> > > > > > To: pgsql-hackers@postgreSQL.org
> > > > > > Cc: Daryl W. Dunbar
> > > > > > Subject: [HACKERS] Major bug, possible, with Solaris 7?
> > > > > >
> > > > > >
> > > > > >
> > > > > > Can someone please take a minute to look at this?
> > > > > >
> > > > > > I've gzip'd and moved his errorlog to
> > > > > > ftp.postgresql.org:/pub/debugging...one thing that
> > > > appears to be
> > > > > > lacking...what version of PostgreSQL are you using?
> > > > > >
> > > > > > Marc G. Fournier
> > > > > > Systems Administrator @ hub.org
> > > > > > primary: scrappy@hub.org           secondary:
> > > > > > scrappy@{freebsd|postgresql}.org
> > > > > >
> > > > > > ---------- Forwarded message ----------
> > > > > > Date: Thu, 18 Feb 1999 18:23:25 -0500
> > > > > > From: Daryl W. Dunbar <daryl@www.com>
> > > > > > To: The Hermit Hacker <scrappy@hub.org>
> > > > > > Subject: RE: Interested?
> > > > > >
> > > > > > Thanks Marc,  We exchanged an e-mail or two last
> > > > week, along with
> > > > > > Tatsuo Ishii and Tom Lane.  You suggested I truss
> > the process.
> > > > > >
> > > > > > Anyway, periodically, the backends spiral out of
> > > > control with hung
> > > > > > up children until I hit MaxBackendID (which I
> > > > compiled in to be
> > > > > > 128).  Initially, I was running out of semaphores on
> > > > Solaris 7 and
> > > > > > changed /etc/system to add these lines:
> > > > > > set shmsys:shminfo_shmmax=16777216
> > > > > > set shmsys:shminfo_shmmin=1
> > > > > > set shmsys:shminfo_shmmni=128
> > > > > > set shmsys:shminfo_shmseg=51
> > > > > > *
> > > > > > set semsys:seminfo_semmap=128
> > > > > > set semsys:seminfo_semmni=128
> > > > > > set semsys:seminfo_semmns=8192
> > > > > > set semsys:seminfo_semmnu=8192
> > > > > > set semsys:seminfo_semmsl=64
> > > > > > set semsys:seminfo_semopm=32
> > > > > > set semsys:seminfo_semume=32
> > > > > >
> > > > > > I increased shared memory so I could start more
> > backends...
> > > > > >
> > > > > > OK, so now, everything is running fine and boom, the
> > > > > > backends start
> > > > > > to hang on semop, eventually reaching MaxBackendID
> > > > and refusing
> > > > > > connections.
> > > > > > Attached is a log file from a hang up today.  Debug
> > > > is set to 3.
> > > > > > All times are PST.  I have carved out a bunch of
> > > > normal operation
> > > > > > from the beginning (about 21,000 lines) and redundant
> > > > 'too many
> > > > > > backends' (about 1,000 lines, while I was
> eating lunch :)
> > > > > > signified
> > > > > > by {SNIP SNIP}.  I pick the log back up with the
> > > > birth of pid 2828
> > > > > > and left several 'normal' cycles in until...
> > > > > >
> > > > > > You can see that process 2840 is the first child to
> > > > hang.  It was
> > > > > > started at 11:39:23 and did not die until sent a 15 by
> > > > > > the parent at
> > > > > > 14:12:16.  All of the hung processes fall between
> > > > 2840 and 3454.
> > > > > >
> > > > > > Sorry the file is so big.  Here are some 'keys'
> > you can use:
> > > > > > Startup is the first line (obviously).
> > > > > > You can find child startup by looking for [2840] (pid
> > > > in brackets)
> > > > > > You can find child exits by looking for '2480 exited'
> > > > > > You can find where I send the kill signal by looking for
> > > > > > 'pmdie 15'
> > > > > >
> > > > > > I think that's a good start. :)
> > > > > >
> > > > > > Don't hesitate to contact me if I can shed any more
> > > > > > light.  I'm wide
> > > > > > open to ideas at the moment.  I'm in EST, but tend to
> > > > work until
> > > > > > 10-11 at night, so e-mail anytime.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > DwD
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: The Hermit Hacker [mailto:scrappy@hub.org]
> > > > > > > Sent: Thursday, February 18, 1999 5:36 PM
> > > > > > > To: Daryl W. Dunbar
> > > > > > > Subject: Re: Interested?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hi Daryl...
> > > > > > >
> > > > > > >     I'm not the strongest at internal code, so may not
> > > > > > > be of any help
> > > > > > > at all.  I just went through my -hackers email,
> > and can't
> > > > > > > seem to find
> > > > > > > anything from you in there.  Can you tell me what your
> > > > > > > problem is, as well
> > > > > > > as version of PostgreSQL you are using, and we'll see
> > > > > > > what we can do?
> > > > > > >
> > > > > > > Marc
> > > > > > >
> > > > > > > On Thu, 18 Feb 1999, Daryl W. Dunbar wrote:
> > > > > > >
> > > > > > > > Marc,
> > > > > > > >
> > > > > > > > I know that you put considerable volunteer time into
> > > > > > > PostgreSQL.  If
> > > > > > > > I am not too bold in asking, and you are comfortable
> > > > > > > with it, I am
> > > > > > > > prepared to compensate you for your time if you can
> > > > > > assist me in
> > > > > > > > tracking down this rather nasty bug I have been
> > > > > > > e-mailing Hackers
> > > > > > > > about.  Please let me know if you are
> > interested and if
> > > > > > > so, at what
> > > > > > > > rate.
> > > > > > > >
> > > > > > > > We are in the process of launching a pretty exciting
> > > > > > site and a
> > > > > > > > database in a integral part of it.  I really want to
> > > > > > > use PostgreSQL,
> > > > > > > > but can not take it into production on Solaris with
> > > > > > this problem
> > > > > > > > going on.  I'm in the process of installing a
> > test site
> > > > > > > on Linux to
> > > > > > > > see if the problem exists there, but I expect it
> > > > is limited to
> > > > > > > > Solaris.
> > > > > > > >
> > > > > > > > I anxiously await your response.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > DwD
> > > > > > > >
> > > > > > > > --
> > > > > > > > Daryl W. Dunbar
> > > > > > > > VP of Engineering/Chief Technology Officer
> > > > > > > > http://www.com, Where the Web Begins!
> > > > > > > > mailto:daryl@www.com
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > Marc G. Fournier
> > > > > > > Systems Administrator @ hub.org
> > > > > > > primary: scrappy@hub.org           secondary:
> > > > > > > scrappy@{freebsd|postgresql}.org
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > > Marc G. Fournier
> > > > Systems Administrator @ hub.org
> > > > primary: scrappy@hub.org           secondary:
> > > > scrappy@{freebsd|postgresql}.org
> > > >
> > >
> >
> > Marc G. Fournier
> > Systems Administrator @ hub.org
> > primary: scrappy@hub.org           secondary:
> > scrappy@{freebsd|postgresql}.org
> >
>



pgsql-hackers by date:

Previous
From: "Daryl W. Dunbar"
Date:
Subject: RE: [HACKERS] Major bug, possible, with Solaris 7?
Next
From: James Thompson
Date:
Subject: Bug in src/backend/nodes/print.c