pg_xlog disk full error, i need help - Mailing list pgsql-general

From Janning Vygen
Subject pg_xlog disk full error, i need help
Date
Msg-id 009001c5337f$ab294940$56a4fea9@oemcomputer
Whole thread Raw
Responses Re: pg_xlog disk full error, i need help
List pgsql-general
Hi,

i do a nightly CLUSTER and VACUUM on one of my production databases.

Yesterday in the morning the vacuum process was still running after 8 hours.
That was very unusal and i didnt know exactly what to do. So i tried to stop
the process. After it didnt work i killed -9 the Vacuum process. I restarted
the database and everything worked fine again. I did know that this was NOT
a good idea but i had to fined a quick solution and it did work at least.

Tonight know something very strange did happen before or while the
clustering did run:

PANIC:  could not write to file "/home/postgres/data/pg_xlog/xlogtemp.6434":
No space left on device
server closed the connection unexpectedly
This probably means the server terminated abnormally before or while
processing the request. connection to server was lost
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally before or while
processing the request. connection to server was lost

My disk was running full with 100 GB (!) of data/pg_xlog/ files. I tried to
delete some files on the same partition after which i had 3 GB free space
again. the i tried to start the postmaster:

the startup process logged this:

LOG:  database system shutdown was interrupted at 2005-03-28 09:33:15 CEST
LOG:  checkpoint record is at F/EE0F0010
LOG:  redo record is at F/EC007900; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 46558173; next OID: 58970
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  redo starts at F/EC007900

looks fine as it says "automatic recovery in progress" but there are no more
log entries since startup and my process table says:
 8495 pts/0    S      0:00 /usr/local/pgsql/bin/postmaster -D
/home/postgres/data
 8498 pts/0    S      0:00 postgres: stats buffer process
 8499 pts/0    S      0:00 postgres: stats collector process
 8500 pts/0    D      5:15 postgres: startup subprocess

and top says
 8500 postgres  15   0  131m 131m 131m D 18.9  6.5   5:18.26 postmaster

so the postmaster is still working.

How long will it work on this problem? Can i expect to have everything
working correctly after this startup process or shoul di stop it and use a
backup (which i hope is useful and not corrupt)

I am kind of helpless in this situation as i dont know much of all the
underlying storage, WAL and xlog things. Maybe i could just delete all files
in this directory?

Can anybody give me some hints what to do or how to ask?

i am really desperate at the moment.

kind regards,
Janning

please excuse bad english and typos. i am kind of nervous at the moment.


pgsql-general by date:

Previous
From: "dbalinglung"
Date:
Subject: about blob type
Next
From: "Sim Zacks"
Date:
Subject: Re: 8.0.2 Beta Available