Re: Postmaster hangs - Mailing list pgsql-bugs

From Karen Pease
Subject Re: Postmaster hangs
Date
Msg-id 1256528847.25178.25.camel@localhost.localdomain
Whole thread Raw
In response to Re: Postmaster hangs  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Postmaster hangs
Re: Postmaster hangs
List pgsql-bugs
kill -9 does kill postmaster (or at least seems to).  But I can't figure
out a way to get it restarted without a reboot -- I don't know what I'm
missing.  The Fedora postgres restart scripts don't do the trick, and I
couldn't get it to work with pg_ctl either.

kill -9 doesn't work on the locked up httpd processes.  So that has to
have the system restarted.

[meme@chmmr]$ cat /proc/version
Linux version 2.6.27.37-170.2.104.fc10.i686
(mockbuild@xenbuilder4.fedora.phx.redhat.com) (gcc version 4.3.2
20081105 (Red Hat 4.3.2-7) (GCC) ) #1 SMP Mon Oct 12 22:01:53 EDT 2009

Postgres is by default in /var/lib/pgsql.  When / started running out of
space, I moved it to /scratch and symlinked:

lrwxrwxrwx 1 root       root         15 2009-09-11 16:57 pgsql
-> /scratch/pgsql//

/ is on md0 and is RAID-1.  /scratch is on md1 and is RAID-6:

[meme@chmmr]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md0               64G   42G   18G  71% /
/dev/md1              2.5T  2.2T  239G  91% /scratch
/dev/sdb1             190M   38M  143M  21% /boot
/dev/sde1             190M   86M   95M  48% /boot2
/dev/sdd1             190M   86M   95M  48% /boot3
/dev/sda1             190M   86M   95M  48% /boot4
/dev/sdc1             190M   86M   95M  48% /boot5
tmpfs                1000M     0 1000M   0% /dev/shm
[meme@chmmr]$ cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid6 sde4[0] sdc4[4] sda4[3] sdb4[2] sdd4[1]
      2722005120 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]

md0 : active raid1 sde3[0] sdc3[4] sda3[3] sdb3[2] sdd3[1]
      67119488 blocks [5/5] [UUUUU]

unused devices: <none>

Both filesystems are EXT-4.

Thanks for your help!

    - Karen

On Sun, 2009-10-25 at 23:13 -0400, Tom Lane wrote:
> Karen Pease <meme@daughtersoftiresias.org> writes:
> > It'll get through about three or four of them (out of hundreds) before
> > it locks up.  Now, before lockup, postmaster is very active.  It shows
> > up on top.  The computer's hard drives clack nonstop.  Etc.  But once it
> > locks up (without warning), all of that stop.  Postmaster does nothing.
> > The computer goes silent.  I can't ctrl-break the psql process.  If I
> > try to start a new psql process, it won't get past the password prompt
> > -- psql will hang.  All Apache processes involving postgres queries
> > hang.  The postgres server cannot be restarted by any normal means (the
> > only solution I've found that works is a reboot).  And so forth.
>
> This sounds to me like it's a kernel problem, possibly triggered by
> misbehaving disk hardware.  What you might try to confirm is a kill -9
> on whichever postgres backend seems to be stuck.  If that fails to
> remove the process, then it's definitely a kernel issue --- try googling
> "uninterruptible disk wait" and similar phrases.
>
> The cases that I've run into personally have been due to poor error
> handling for a disk failure condition in a kernel-level disk driver.
> If that's what it is for you, the bottom-level problem might be an
> unreadable disk block somewhere.  Or it might just be a garden variety
> kernel bug.  What's the platform?
>
>             regards, tom lane
>

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Postmaster hangs
Next
From: Pavel Stehule
Date:
Subject: Re: BUG #5136: Please drop the string literal syntax for CREATE FUNCTION ...