Thread: Backend crashes in 7.0.3

Backend crashes in 7.0.3

From
Dirk Lutzebaeck
Date:
Hello,

I observe occasionaly crashes on 7.0.3 under medium load:

Backend message type 0x49 arrived while idle
Backend message type 0x44 arrived while idle
Backend message type 0x54 arrived while idle

I recently upgraded from 7.0.2 to 7.0.3 on RH6.0, Linux 2.2.10 and I
haven't observed these messages before. I have
compiled the source on my own  (egcs 2.91.66).

Can I downgrade from 7.0.3 to 7.0.2 without dump/restore?

Dirk

Re: Backend crashes in 7.0.3

From
Tom Lane
Date:
Dirk Lutzebaeck <lutzeb@aeccom.com> writes:
> I observe occasionaly crashes on 7.0.3 under medium load:

> Backend message type 0x49 arrived while idle
> Backend message type 0x44 arrived while idle
> Backend message type 0x54 arrived while idle

> I recently upgraded from 7.0.2 to 7.0.3 on RH6.0, Linux 2.2.10 and I
> haven't observed these messages before. I have
> compiled the source on my own  (egcs 2.91.66).

Strange.  I don't recall any 7.0.2 -> 7.0.3 changes that might affect
the frontend/backend protocol behavior.  Did you compile 7.0.2 the same
way as 7.0.3?

> Can I downgrade from 7.0.3 to 7.0.2 without dump/restore?

You can, but in the long run it'd be more useful to figure out what's
going wrong.  The above is not much info --- what are you doing when
this happens, and what if anything appears in the postmaster log?

            regards, tom lane

Re: Backend crashes in 7.0.3

From
Dirk Lutzebaeck
Date:
Tom Lane writes:
 > Dirk Lutzebaeck <lutzeb@aeccom.com> writes:
 > > I observe occasionaly crashes on 7.0.3 under medium load:
 >
 > > Backend message type 0x49 arrived while idle
 > > Backend message type 0x44 arrived while idle
 > > Backend message type 0x54 arrived while idle
 >
 > > I recently upgraded from 7.0.2 to 7.0.3 on RH6.0, Linux 2.2.10 and I
 > > haven't observed these messages before. I have
 > > compiled the source on my own  (egcs 2.91.66).
 >
 > You can, but in the long run it'd be more useful to figure out what's
 > going wrong.  The above is not much info --- what are you doing when
 > this happens, and what if anything appears in the postmaster log?


It may be that there is some kernel corruption appearing here. I'm
using kernel nfs on Linux 2.2.10 with a Solaris8 i86pc client. I saw
some weird NFS error messages on the Linux system which are related to
the solaris client. I suspect the kernel nfs daemon corrupting memory
areas where postgres shared mem resides. I'm currently trying to dig more into
the problem. Could this be possible? Strange is that stopping and
restarting the postmaster does not help. The crashes occur again. When
killing the children some still stay alive. Giving them a SIGTERM
again leaves them in a constant running state (R). strace -p to the
child is just quiet. I can only kill the child then with SIGKILL.
I haven't started the postmaster with debug on yet. I have now shut
off the solaris client and restarted the machine. Currently it looks
fine.

Dirk

Re: Backend crashes in 7.0.3

From
Tom Lane
Date:
Dirk Lutzebaeck <lutzeb@aeccom.com> writes:
> It may be that there is some kernel corruption appearing here. I'm
> using kernel nfs on Linux 2.2.10 with a Solaris8 i86pc client. I saw
> some weird NFS error messages on the Linux system which are related to
> the solaris client. I suspect the kernel nfs daemon corrupting memory
> areas where postgres shared mem resides. I'm currently trying to dig more into
> the problem. Could this be possible?

Seems like a bizarre theory.  In particular, why would 7.0.3 be affected
and not 7.0.2?

            regards, tom lane

Re: Backend crashes in 7.0.3

From
Dirk Lutzebaeck
Date:
Tom Lane writes:
 > Dirk Lutzebaeck <lutzeb@aeccom.com> writes:
 > > It may be that there is some kernel corruption appearing here. I'm
 > > using kernel nfs on Linux 2.2.10 with a Solaris8 i86pc client. I saw
 > > some weird NFS error messages on the Linux system which are related to
 > > the solaris client. I suspect the kernel nfs daemon corrupting memory
 > > areas where postgres shared mem resides. I'm currently trying to dig more into
 > > the problem. Could this be possible?
 >
 > Seems like a bizarre theory.  In particular, why would 7.0.3 be affected
 > and not 7.0.2?

Sorry, it seems I was on the wrong track. It has nothing do to with
NFS, I just panicked... I'm currently looking into ApacheDBI a modperl
utility to reuse open db connections. In any case the backend messages
are odd. It seems I misunderstand ApacheDBI loosing a DB connection
with a backend crash.

Dirk