Thread: A nightmare
Hi, I'm starting to become desperate. On saturday I dumped all databases, wiped whole postgresql installation. Installed newest rpms for Fedora 1, restored databases. Recompiled client libraries and binaries. Restarted and after five hours of operation: May 1 21:34:19 claymountain postgres[6337]: [2-1] ERROR: could not access status of transaction 4250811410 May 1 21:34:19 claymountain postgres[6337]: [2-2] DETAIL: could not open file "/var/lib/pgsql/data/pg_clog/0FD5": No such file or directory Is there anything else I could do than dump, wipe, restore again? Isn't there anything I could do to try to get more detailed meaningful information on why this is happening? All that the log shows before this happening are a few unexpected client disconnections.
Mauri Sahlberg <Mauri.Sahlberg@claymountain.com> writes: > I'm starting to become desperate. On saturday I dumped all databases, > wiped whole postgresql installation. Installed newest rpms for Fedora 1, > restored databases. Recompiled client libraries and binaries. Restarted > and after five hours of operation: > May 1 21:34:19 claymountain postgres[6337]: [2-1] ERROR: could not > access status of transaction 4250811410 > May 1 21:34:19 claymountain postgres[6337]: [2-2] DETAIL: could not > open file > "/var/lib/pgsql/data/pg_clog/0FD5": No such file or directory Which exactly are the "newest rpms for Fedora 1" ... what PG version and where did you get them from? > Is there anything else I could do than dump, wipe, restore again? Isn't > there anything I could do to try to get more detailed meaningful > information on why this is happening? It looks like a corrupt-data issue to me. You could follow the usual sorts of procedures to try to isolate and get rid of the bad data (see the list archives for details). But I think first you need to question what caused it. Could your disk drive be failing (or other hardware problem)? How much do you trust the specific kernel version you are currently running? regards, tom lane
ma, 2005-05-02 kello 10:52 -0400, Tom Lane kirjoitti: > Mauri Sahlberg <Mauri.Sahlberg@claymountain.com> writes: > > I'm starting to become desperate. On saturday I dumped all databases, > > wiped whole postgresql installation. Installed newest rpms for Fedora 1, > > restored databases. Recompiled client libraries and binaries. Restarted > > and after five hours of operation: > > May 1 21:34:19 claymountain postgres[6337]: [2-1] ERROR: could not > > access status of transaction 4250811410 > > May 1 21:34:19 claymountain postgres[6337]: [2-2] DETAIL: could not > > open file > > "/var/lib/pgsql/data/pg_clog/0FD5": No such file or directory > > Which exactly are the "newest rpms for Fedora 1" ... what PG version > and where did you get them from? > Name : postgresql-server Relocations: (not relocateable) Version : 7.4.7 Vendor: (none) Release : 2PGDG Build Date: Fri 25 Feb 2005 01:42:54 PM EET Got them from http://www.postgresql.org/ftp/binary/v7.4.7/rpms/fedora/fedora-core-1/ > It looks like a corrupt-data issue to me. You could follow the usual > sorts of procedures to try to isolate and get rid of the bad data > (see the list archives for details). But I think first you need to > question what caused it. Could your disk drive be failing (or other > hardware problem)? How much do you trust the specific kernel version > you are currently running? I have no control over the kernel version I am running. The server is located on virtual machine and the kernel version claims to be Linux claymountain.planeetta.com 2.4.20-021stab028.5.777-enterprise #1 SMP Tue Feb 22 17:44:46 MSK 2005 i686 i686 i386 GNU/Linux. I have no trust or distrust against it. I've tried to contact the virtual server provider but so far the guy who is supposed to know something about virtual servers has not been in and is not returning my calls. As far as I can tell, the hardware "looks" fine at least when looked at from a virtual server. I moved the database that seemed to cause the corruption to an another machine and now both servers have been happily running for more than 24 hours without any indication of data corruption. I am happy but scared. I would still like to know what caused the corruption. My current guess is that it could be network related. Corruption occurred when the data was collected on a different machine than where the database was located. Collector is a c++-application using libpq++-4.0. The corruption could have something to do with locales and network errors. Regards, Mauri Sahlberg
* Mauri Sahlberg <Mauri.Sahlberg@claymountain.com> wrote: <snip> > I have no control over the kernel version I am running. The server is > located on virtual machine and the kernel version claims to be Linux > claymountain.planeetta.com 2.4.20-021stab028.5.777-enterprise #1 SMP Tue > Feb 22 17:44:46 MSK 2005 i686 i686 i386 GNU/Linux. I have no trust or > distrust against it. you probably have a qouta ? cu -- --------------------------------------------------------------------- Enrico Weigelt == metux IT service phone: +49 36207 519931 www: http://www.metux.de/ fax: +49 36207 519932 email: contact@metux.de --------------------------------------------------------------------- Realtime Forex/Stock Exchange trading powered by postgresSQL :)) http://www.fxignal.net/ ---------------------------------------------------------------------