Thread: System lockup

System lockup

From
DeJuan Jackson
Date:
Two systems both Red Hat running Postgres 7.3.2
Postgres (installed from tar) data is 1.8G total space (half of which is
static data touched once a night when I vacuum), there are web pages
(Apache 2.0.44, PHP 4.3.2), and scripts (also PHP 4.3.2) intereacting
with both boxes continuously.  The only postgres.conf settings that are
moved from default are:
tcpip_socket = true
max_connections = 75
shared_buffers = 150

log_connections = true
log_pid = false
log_statement = true
log_duration = true
log_timestamp = true

stats_command_string = true
stats_row_level = true
stats_block_level = true


System 1:
  Red Hat Linux release 8.0 (Psyche) [default server config without X]
  kernel version 2.4.18
  total system RAM 1.5G
  only two services running on the system Postgres and sshd
  OpenSSL 0.9.6b [engine] 9 Jul 2001
  output from vmstat after 20 hours of uptime (look at the memory):
     procs                      memory    swap          io     system

  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
  0  0  0      0  46080  60348 1382608   0   0    10     6  265    11   0


System 2:
  Red Hat Linux release 7.1 (Seawolf) [default server config without X]
  kernel version 2.4.2
  total system RAM 1G
  only three services running on the system Postgres, sshd, and sendmail
(hadn't turned it off yet)
  OpenSSL 0.9.6b [engine] 9 Jul 2001
  output from vmstat after 20 hours of uptime (look at the memory):
    procs                      memory    swap          io     system
  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
  0  0  0      0   5988  25544 915880   0   0     3     4   58    34   2


I've been running postgres for years on Red Hat systems without a hitch,
but I can't for the life of me figure this one out.  So, any help would
be appreciated.


Re: System lockup

From
Tom Lane
Date:
DeJuan Jackson <djackson@speedfc.com> writes:
> [ lots of details about system configuration ]

Uh, you didn't actually say what was going wrong ...

            regards, tom lane


Re: System lockup

From
DeJuan Jackson
Date:
Both systems locks up, no faults, no core dumps, it just completely stop responding and has to be hard booted...  Beyond that your guess is as good as mine.

Tom Lane wrote:
DeJuan Jackson <djackson@speedfc.com> writes: 
[ lots of details about system configuration ]   
Uh, you didn't actually say what was going wrong ...
		regards, tom lane 

Re: System lockup

From
Tom Lane
Date:
DeJuan Jackson <djackson@speedfc.com> writes:
> Both systems locks up, no faults, no core dumps, it just completely stop
> responding and has to be hard booted...  Beyond that your guess is as
> good as mine.

The whole system freezes, not only Postgres?  In that case I don't think
you need to guess very hard: you've got a hardware problem.  Well, maybe
it could be a kernel bug, but I'd bet hardware.  In any case, Postgres
is an unprivileged user process: it can *not* lock up the rest of the
system.  It could tickle a kernel bug that results in a lockup, but that
doesn't make it Postgres' fault.

            regards, tom lane


Re: System lockup

From
DeJuan Jackson
Date:
Then I think it's a kernel (Red Hat kernel 2.4) bug, because I've had the same issue on 4 different machines of different ages and configurations, using the same or similar datasets and using different ones.  I wanted to eliminate postgres as a culprit because it would be the easiest thing to correct. 

As noted in my previous post I'm currently experiencing the problem on a Red Hat 8.0 and 7.2 (fully errata patched) (and am currently testing a RH7.1 install without patching the kernel to see what I get [so far no lockups {4 days production load}]). 

So, I think it's time ti switch to FreeBSD, or move my question to a Red Hat list.  I'll give the Red Hat geeks a chance to answer before becoming a trait..., I mean moving to FreeBSD.

Tom Lane wrote:
DeJuan Jackson <djackson@speedfc.com> writes: 
Both systems locks up, no faults, no core dumps, it just completely stop 
responding and has to be hard booted...  Beyond that your guess is as 
good as mine.   
The whole system freezes, not only Postgres?  In that case I don't think
you need to guess very hard: you've got a hardware problem.  Well, maybe
it could be a kernel bug, but I'd bet hardware.  In any case, Postgres
is an unprivileged user process: it can *not* lock up the rest of the
system.  It could tickle a kernel bug that results in a lockup, but that
doesn't make it Postgres' fault.
		regards, tom lane 

Re: System lockup

From
"scott.marlowe"
Date:
On Mon, 31 Mar 2003, DeJuan Jackson wrote:

> Then I think it's a kernel (Red Hat kernel 2.4) bug, because I've had
> the same issue on 4 different machines of different ages and
> configurations, using the same or similar datasets and using different
> ones.  I wanted to eliminate postgres as a culprit because it would be
> the easiest thing to correct.
>
> As noted in my previous post I'm currently experiencing the problem on a
> Red Hat 8.0 and 7.2 (fully errata patched) (and am currently testing a
> RH7.1 install without patching the kernel to see what I get [so far no
> lockups {4 days production load}]).
>
> So, I think it's time ti switch to FreeBSD, or move my question to a Red
> Hat list.  I'll give the Red Hat geeks a chance to answer before
> becoming a trait..., I mean moving to FreeBSD.

I use RH 7.2 for VERY heavy database/apache/ldap load and it never ever
locks up.  How's your setup for things in the BIOS?  I've found that
settings like SMP mode 1.1 versus 1.4 in some BIOSes makes a difference
(lockups with 1.1, works fine in 1.4).  I'd take a look at what's common
on the boxes hardware wise, i.e. NICs, motherboards, video cards, hard
drives, scsi controllers, BIOS settings for the PCI bus or memory speed
etc...


Re: System lockup

From
DeJuan Jackson
Date:
The boxes that I've tested and gotten this problem on are as different
as night and day, it's easier to sum up the similarities [they are all
SMP systems with 2 processors (so your BIOS suggestion might fix things,
I'll try to take a look the next time one of them drops)].
  I'm not sure if their BOIS/chipset maker is the same or not (Ones a
DELL and the others an IBM).

scott.marlowe wrote:

>I use RH 7.2 for VERY heavy database/apache/ldap load and it never ever
>locks up.  How's your setup for things in the BIOS?  I've found that
>settings like SMP mode 1.1 versus 1.4 in some BIOSes makes a difference
>(lockups with 1.1, works fine in 1.4).  I'd take a look at what's common
>on the boxes hardware wise, i.e. NICs, motherboards, video cards, hard
>drives, scsi controllers, BIOS settings for the PCI bus or memory speed
>etc...
>
>


Re: System lockup

From
"scott.marlowe"
Date:
Yeah, I've been surprised how many times things like this are caused by
having the same person setup the machines making the same assumptions.
Especially the number of times it been me setting up the machines. :-)

Also, look for things like kernel settings that may be causing this kind
of problem.  I've seen a lot of issues with dma settings, or odd cards.
do they share the same kind of, say, nics or raid controllers?  Could be a
common bug in a driver there too.  Good luck on finding and fixing it...

On Mon, 31 Mar 2003, DeJuan Jackson wrote:

> The boxes that I've tested and gotten this problem on are as different
> as night and day, it's easier to sum up the similarities [they are all
> SMP systems with 2 processors (so your BIOS suggestion might fix things,
> I'll try to take a look the next time one of them drops)].
>   I'm not sure if their BOIS/chipset maker is the same or not (Ones a
> DELL and the others an IBM).
>
> scott.marlowe wrote:
>
> >I use RH 7.2 for VERY heavy database/apache/ldap load and it never ever
> >locks up.  How's your setup for things in the BIOS?  I've found that
> >settings like SMP mode 1.1 versus 1.4 in some BIOSes makes a difference
> >(lockups with 1.1, works fine in 1.4).  I'd take a look at what's common
> >on the boxes hardware wise, i.e. NICs, motherboards, video cards, hard
> >drives, scsi controllers, BIOS settings for the PCI bus or memory speed
> >etc...
> >
> >
>
>


Re: System lockup

From
Tom Lane
Date:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> Also, look for things like kernel settings that may be causing this kind
> of problem.

Ny ears pricked up immediately at the mention of SMP.  Look hard for
kernel errata affecting SMP, driver problems, etc.

            regards, tom lane


Re: System lockup

From
Mark Kirkwood
Date:
Might be worth checking that your hardware is in the  Redhat HW
compatibility list.

Cheers

Mark