Hi Tom,
Yeah, that's what I suspect it seems more like a hardware/os issue. Since, I really have no proof against PostgreSQL, other
than my daily dump that crashes *sometimes*.
I didn't know about badblocks, I'll try this one. Last time I did a full fsck check on all the volumes, and everything was clean.
But I have a sata raid on this server, I never knew if a raid would actually replicate badblocks to the other disk ?
Thanks for your advice!
Tom Lane a écrit :
Eric Rousse <eric.rousse@telmatik.com> writes:
...
2006-11-16 04:00:39 [8763] LOG: connection received: host=10.1.1.54
port=4894
2006-11-16 04:00:40 [8763] LOG: pq_recvbuf: unexpected EOF on client
connection
2006-11-16 04:00:40 [8763] LOG: incomplete startup packet
2006-11-16 04:02:26 [2534] LOG: database system was interrupted at
2006-11-16 03:57:36 EST
2006-11-16 04:02:26 [2534] LOG: checkpoint record is at C/6733EB68
...
I think what you're seeing here is probably a kernel-level crash and
system reboot. It's not any normal sort of Postgres problem, because
if it were you'd see the postmaster bleating about crash of one of its
child processes. Here it appears that the postmaster and all its
children died at once leaving no messages behind --- and that just
doesn't happen without either manual intervention or a system crash.
If it seems to be triggered by running a PG backup, it could be that
you've got a disk hardware problem that only manifests when you try to
read a particular data block :-(. Have you tried running "badblocks"?
regards, tom lane
--
Eric Rousse
514-655-1001
Telmatik inc.
204 Montarville, suite 250
Boucherville, QC, Canada
J4B 6S2
www.telmatik.com