Thread: A nightmare

A nightmare

From
Mauri Sahlberg
Date:
Hi,

I'm starting to become desperate. On saturday I dumped all databases,
wiped whole postgresql installation. Installed newest rpms for Fedora 1,
restored databases. Recompiled client libraries and binaries. Restarted
and after five hours of operation:
May  1 21:34:19 claymountain postgres[6337]: [2-1] ERROR:  could not
access status of transaction 4250811410
May  1 21:34:19 claymountain postgres[6337]: [2-2] DETAIL:  could not
open file
"/var/lib/pgsql/data/pg_clog/0FD5": No such file or directory

Is there anything else I could do than dump, wipe, restore again? Isn't
there anything I could do to try to get more detailed meaningful
information on why this is happening?

All that the log shows before this happening are a few unexpected client
disconnections.


Re: A nightmare

From
Tom Lane
Date:
Mauri Sahlberg <Mauri.Sahlberg@claymountain.com> writes:
> I'm starting to become desperate. On saturday I dumped all databases,
> wiped whole postgresql installation. Installed newest rpms for Fedora 1,
> restored databases. Recompiled client libraries and binaries. Restarted
> and after five hours of operation:
> May  1 21:34:19 claymountain postgres[6337]: [2-1] ERROR:  could not
> access status of transaction 4250811410
> May  1 21:34:19 claymountain postgres[6337]: [2-2] DETAIL:  could not
> open file
> "/var/lib/pgsql/data/pg_clog/0FD5": No such file or directory

Which exactly are the "newest rpms for Fedora 1" ... what PG version
and where did you get them from?

> Is there anything else I could do than dump, wipe, restore again? Isn't
> there anything I could do to try to get more detailed meaningful
> information on why this is happening?

It looks like a corrupt-data issue to me.  You could follow the usual
sorts of procedures to try to isolate and get rid of the bad data
(see the list archives for details).  But I think first you need to
question what caused it.  Could your disk drive be failing (or other
hardware problem)?  How much do you trust the specific kernel version
you are currently running?

            regards, tom lane

Re: A nightmare

From
Mauri Sahlberg
Date:
ma, 2005-05-02 kello 10:52 -0400, Tom Lane kirjoitti:
> Mauri Sahlberg <Mauri.Sahlberg@claymountain.com> writes:
> > I'm starting to become desperate. On saturday I dumped all databases,
> > wiped whole postgresql installation. Installed newest rpms for Fedora 1,
> > restored databases. Recompiled client libraries and binaries. Restarted
> > and after five hours of operation:
> > May  1 21:34:19 claymountain postgres[6337]: [2-1] ERROR:  could not
> > access status of transaction 4250811410
> > May  1 21:34:19 claymountain postgres[6337]: [2-2] DETAIL:  could not
> > open file
> > "/var/lib/pgsql/data/pg_clog/0FD5": No such file or directory
>
> Which exactly are the "newest rpms for Fedora 1" ... what PG version
> and where did you get them from?
>
Name        : postgresql-server            Relocations: (not
relocateable)
Version     : 7.4.7                             Vendor: (none)
Release     : 2PGDG                         Build Date: Fri 25 Feb 2005
01:42:54 PM EET

Got them from
http://www.postgresql.org/ftp/binary/v7.4.7/rpms/fedora/fedora-core-1/


> It looks like a corrupt-data issue to me.  You could follow the usual
> sorts of procedures to try to isolate and get rid of the bad data
> (see the list archives for details).  But I think first you need to
> question what caused it.  Could your disk drive be failing (or other
> hardware problem)?  How much do you trust the specific kernel version
> you are currently running?

I have no control over the kernel version I am running. The server is
located on virtual machine and the kernel version claims to be Linux
claymountain.planeetta.com 2.4.20-021stab028.5.777-enterprise #1 SMP Tue
Feb 22 17:44:46 MSK 2005 i686 i686 i386 GNU/Linux. I have no trust or
distrust against it.

I've tried to contact the virtual server provider but so far the guy who
is supposed to know something about virtual servers has not been in and
is not returning my calls. As far as I can tell, the hardware "looks"
fine at least when looked at from a virtual server.

I moved the database that seemed to cause the corruption to an another
machine and now both servers have been happily running for more than 24
hours without any indication of data corruption.

I am happy but scared.

I would still like to know what caused the corruption. My current guess
is that it could be network related. Corruption occurred when the data
was collected on a different machine than where the database was
located. Collector is a c++-application using libpq++-4.0. The
corruption could have something to do with locales and network errors.

Regards,
Mauri Sahlberg


Re: A nightmare

From
Enrico Weigelt
Date:
* Mauri Sahlberg <Mauri.Sahlberg@claymountain.com> wrote:

<snip>
> I have no control over the kernel version I am running. The server is
> located on virtual machine and the kernel version claims to be Linux
> claymountain.planeetta.com 2.4.20-021stab028.5.777-enterprise #1 SMP Tue
> Feb 22 17:44:46 MSK 2005 i686 i686 i386 GNU/Linux. I have no trust or
> distrust against it.

you probably have a qouta ?


cu
--
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service
  phone:     +49 36207 519931         www:       http://www.metux.de/
  fax:       +49 36207 519932         email:     contact@metux.de
---------------------------------------------------------------------
  Realtime Forex/Stock Exchange trading powered by postgresSQL :))
                                            http://www.fxignal.net/
---------------------------------------------------------------------