Thread: crash help, pgsql 7.2.1 on RH7.3

crash help, pgsql 7.2.1 on RH7.3

From
"Tim Lynch"
Date:
running pgsql 7.2.1 on redhat7.3 SMP. installed a hacked glibc to fix the
mktime() timezone problem for dates < 1970
(http://rpms.arvin.dk/glibc/rh73/i686/)

three times now the backend process has unexpectedly quit. what happens is
the postmaster process and the stats processes disappear and only the client
connection processes remain.

i don't see a core file. nothing interesting is mentioned in the logs except
for the usual redo post-mortem on startup, "database system was interrupted
at ...". i'm new to postgres admin, so i'm hoping folks can give some
direction as to where to begin solving this problem. it's happened about 3
times in two months.

now that 7.2.3 is out and fixes the mktime() problem i should probably
upgrade to that and revert to stock redhat glibc stuff. my concern is that i
have no idea what caused these events and i don't know what to do to ensure
that when it happens again i'll be able to determine the cause. what type of
logging and monitoring is recommended to report the health of a running
postgres?



Re: crash help, pgsql 7.2.1 on RH7.3

From
Tom Lane
Date:
"Tim Lynch" <admin+pgsqladmin@thirdage.com> writes:
> running pgsql 7.2.1 on redhat7.3 SMP. installed a hacked glibc to fix the
> mktime() timezone problem for dates < 1970
> (http://rpms.arvin.dk/glibc/rh73/i686/)

> three times now the backend process has unexpectedly quit. what happens is
> the postmaster process and the stats processes disappear and only the client
> connection processes remain.

Really!?  That would seem to indicate a postmaster crash.  (The stats
processes are designed to quit automatically when the parent postmaster
exits, so it's no surprise they'd exit too.)  This is highly unusual,
and worth looking into more closely.

> i don't see a core file.

Check that you are starting the postmaster with "ulimit -c unlimited";
this is not the default on most Linuxen, so you may have to add that to
the start script.  Also note that the postmaster never does a chdir,
so if it drops core it will be in the same directory the start script
was running in.

> now that 7.2.3 is out and fixes the mktime() problem i should probably
> upgrade to that and revert to stock redhat glibc stuff.

Probably.  But I do not think the postmaster ever calls mktime(), so the
odds are that your glibc hack is unrelated.

            regards, tom lane

Re: crash help, pgsql 7.2.1 on RH7.3

From
Tom Lane
Date:
I said:
> "Tim Lynch" <admin+pgsqladmin@thirdage.com> writes:
>> i don't see a core file.

> Check that you are starting the postmaster with "ulimit -c unlimited";
> this is not the default on most Linuxen, so you may have to add that to
> the start script.  Also note that the postmaster never does a chdir,
> so if it drops core it will be in the same directory the start script
> was running in.

Drat, I forgot to mention an important corollary: make sure the
postmaster is started in a directory that's writable by the postgres
user, else you'll get no corefile.

(For completeness I'll mention here that when individual backends dump
core, it's in the $PGDATA/base/nnn/ directory of the database they're
connected to.  So you can easily distinguish a postmaster core from
a backend core, just by where it was dropped.)

            regards, tom lane

Re: crash help, pgsql 7.2.1 on RH7.3

From
"Tim Lynch"
Date:
okay, argh, after messing around with /etc/security/limits.conf, it would
have been nice to know that limits.conf doesn't change the default ulimit
rather the limits of user ulimit changes! mean to say, pam_limits.so and
limits.conf do not change the default ulimit, just the bounds, so then the
user can ulimit -c unlimited. i expect regular user to never be able to
increase their ulimits - call me old fasioned... what's next, regular user
negative renice?!? anyways...

but, uh, what am i going to do with a core file? i would need a non-stripped
postgres binary first, right?

i checked out the cwd in /proc, it is /var/lib/pgsql (actally i symlinked it
into another fs) which is postgres:postgres mode 700.

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Tim Lynch" <admin+pgsqladmin@thirdage.com>
Cc: <pgsql-admin@postgresql.org>
Sent: Wednesday, November 20, 2002 8:31 PM
Subject: Re: [ADMIN] crash help, pgsql 7.2.1 on RH7.3


: I said:
: > "Tim Lynch" <admin+pgsqladmin@thirdage.com> writes:
: >> i don't see a core file.
:
: > Check that you are starting the postmaster with "ulimit -c unlimited";
: > this is not the default on most Linuxen, so you may have to add that to
: > the start script.  Also note that the postmaster never does a chdir,
: > so if it drops core it will be in the same directory the start script
: > was running in.
:
: Drat, I forgot to mention an important corollary: make sure the
: postmaster is started in a directory that's writable by the postgres
: user, else you'll get no corefile.
:
: (For completeness I'll mention here that when individual backends dump
: core, it's in the $PGDATA/base/nnn/ directory of the database they're
: connected to.  So you can easily distinguish a postmaster core from
: a backend core, just by where it was dropped.)
:
: regards, tom lane
:


Re: crash help, pgsql 7.2.1 on RH7.3

From
Lamar Owen
Date:
On Thursday 21 November 2002 20:05, Tim Lynch wrote:
> increase their ulimits - call me old fasioned... what's next, regular user
> negative renice?!? anyways...

Actually.... yes.

> but, uh, what am i going to do with a core file? i would need a
> non-stripped postgres binary first, right?

If you have the RPM, you have no debugging symbols.  You can rebuild it with
debugging -- the PGDG RPMset's can have debugging symbols enabled with a
simple macro define close to the top of the spec file.

> i checked out the cwd in /proc, it is /var/lib/pgsql (actally i symlinked
> it into another fs) which is postgres:postgres mode 700.

That's the standard place for PGDATA in Red Hat.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

Re: crash help, pgsql 7.2.1 on RH7.3

From
Tom Lane
Date:
"Tim Lynch" <admin@thirdage.com> writes:
> but, uh, what am i going to do with a core file? i would need a non-stripped
> postgres binary first, right?

Yup, you would.  I'd recommend building from source so that you can add
both --enable-debug and --enable-cassert to the configure flags.  (It
may actually be possible to do that with the SRPM distro, but I don't
know how...)

            regards, tom lane

Re: crash help, pgsql 7.2.1 on RH7.3

From
Lamar Owen
Date:
On Saturday 23 November 2002 12:10, Tom Lane wrote:
> "Tim Lynch" <admin@thirdage.com> writes:
> > but, uh, what am i going to do with a core file? i would need a
> > non-stripped postgres binary first, right?

> Yup, you would.  I'd recommend building from source so that you can add
> both --enable-debug and --enable-cassert to the configure flags.  (It
> may actually be possible to do that with the SRPM distro, but I don't
> know how...)

Install the source RPM (.src.rpm), then edit
/usr/src/redhat/SPECS/postgresql.spec, changing the line near the top that
says:
%define beta 0
to
%define beta 1

Save and exit, then 'rpmbuild -ba postgresql.spec', then install the rpms to
be found in /usr/src/redhat/RPMS/arch (i386 probably).

beta=1 defines both --enable-debug and --enable-cassert and allows the full
debugging, AFAIK.  If it doesn't, then we need to look a little closer at
it...
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

Re: crash help, pgsql 7.2.1 on RH7.3

From
Tom Lane
Date:
Lamar Owen <lamar.owen@wgcr.org> writes:
> On Saturday 23 November 2002 12:10, Tom Lane wrote:
>> Yup, you would.  I'd recommend building from source so that you can add
>> both --enable-debug and --enable-cassert to the configure flags.  (It
>> may actually be possible to do that with the SRPM distro, but I don't
>> know how...)

> Install the source RPM (.src.rpm), then edit
> /usr/src/redhat/SPECS/postgresql.spec, changing the line near the top that
> says:
> %define beta 0
> to
> %define beta 1

Cool.  Thanks for the tip.

            regards, tom lane