Severe Badness On My Server: psql: FATAL: the database system is starting up - Mailing list pgsql-admin

From Mitchell Laks
Subject Severe Badness On My Server: psql: FATAL: the database system is starting up
Date
Msg-id 200503131112.01120.mlaks@verizon.net
Whole thread Raw
Responses Re: Severe Badness On My Server: psql: FATAL: the database system is starting up
List pgsql-admin
Dear Gurus:
My Server and me have had a very bad weekend, starting Friday afternoon.

I am running Debian Sarge, Postgresql 7.4.6 with linux kernel 2.6.8.

I am running a Postgresql backed application on a remote server. The system
has a system drive, on which the Postgresql database runs and there is a raid
1 drive on which the application stores data.

Well, the raid1 failed (or is failing - or is trying its hardest to fail, not
clear yet...). This should not have affected the Postgresql database as it is
safely on a separate drive.

However, when i logged onto the system, I found that I could not turn off
postgresql. I logged in as postgres, did pg_ctl stop and it did ....... and
then could not stop (presumably because hanging client applications were not
loged off the database).

So then I killed all the application clients (kill -9 of them), and still I
tried to pg_ctl stop and it did not want to stop.

So I looked in ps aux and the client applications looked like they were in D
status in ps aux.

wustl    18232  0.0  0.2  4872 1920 ?        D    Mar11
0:00 /usr/local/ctn/bi


I then tried to reboot system remotely via login as root and shutdown -r now
and even shutdown -h now. Interestingly enough (I have never ever seen this -
system refused to shutdown!!!!!!!).

I was floored! Well what to do? I decided to sleep on it.

Well I logged in then on saturday night and system was still hanging in this
bizarre state. I now saw qued shutdown requests in the ps aux. And nothing
was happening fast.

I thought. I read a little. I tried pg_ctl stop -m fast. It did  nothing. I
prayed. I tried to do pg_dump LTA_IDB >lta_idb.dump to dump the database in
question. It didnt do anything.

I was desparate. I decided to try desparate measures I then pulled the gun

pg_ctl stop -m   i.

OK so it stopped. Then I said let me try to dump the database and so I did
pg_ctl start. It started

postgres@A1:~$ pg_ctl status
pg_ctl: postmaster is running (PID: 21195)
Command line was:
/usr/lib/postgresql/bin/postmaster

Then I tried to dump the database and i got some message about the fact that
Fatal the database was starting. I waited a while and then I tried again.
same message. I then tried as user of the database psql LTA_IDB and message
Fatal the database is starting.

Then I tried psql LTA_IDB and got Fatal database is starting.

I waited. Then I did pg_ctl stop (I dont know why i did it. Perversity I
think.)

It then said to me
................ something about unable to stop.

Then I did

postgres@A1:~$ pg_dump LTA_IDB>lta_idb.dump
2005-03-13 10:56:33 [21481] LOG:  connection received: host=[local] port=
2005-03-13 10:56:33 [21481] FATAL:  the database system is shutting down
pg_dump: [archiver (db)] connection to database "LTA_IDB" failed: FATAL:  the
dn

Now I did
pg_ctl status
postgres@A1:~$ pg_ctl status
pg_ctl: postmaster is running (PID: 21195)
Command line was:
/usr/lib/postgresql/bin/postmaster

OK I feel like I am in the twilight zone.

Next I did as root
cd /var/log
ls postg*

A1:/var/log# ls post*
postgres.log        postgres.log.2.gz  postgres.log.5.gz  postgres.log.8.gz
postgres.log.1      postgres.log.3.gz  postgres.log.6.gz  postgres.log.9.gz
postgres.log.10.gz  postgres.log.4.gz  postgres.log.7.gz
A1:/var/log# less postgres.log
postgres.log: No such file or directory

WHAT????????
df -h
/dev/sda2             9.2G  2.8G  6.0G  32% /
tmpfs                 443M     0  443M   0% /dev/shm
/dev/sda1              89M   11M   74M  13% /boot
/dev/sda3             7.4G  273M  6.7G   4% /home
/dev/sda8              11G   33M  9.9G   1% /mirror
/dev/sda7             449M  8.1M  417M   2% /tmp
/dev/sda6             7.4G  4.7G  2.4G  67% /var
/dev/md0              230G  139G   80G  64% /home/big0

I am in the twilight zone. My sanity is suspect. Any ideas on what to do next?
Pull the plug????
Mitchell

pgsql-admin by date:

Previous
From: Chen Shaopeng
Date:
Subject: Re: Unicode!
Next
From: Geoffrey
Date:
Subject: Re: Too frequent warnings for wraparound failure