Re: A nightmare - Mailing list pgsql-admin

From Mauri Sahlberg
Subject Re: A nightmare
Date
Msg-id 1115200077.11341.84.camel@localhost.localdomain
Whole thread Raw
In response to Re: A nightmare  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: A nightmare
List pgsql-admin
ma, 2005-05-02 kello 10:52 -0400, Tom Lane kirjoitti:
> Mauri Sahlberg <Mauri.Sahlberg@claymountain.com> writes:
> > I'm starting to become desperate. On saturday I dumped all databases,
> > wiped whole postgresql installation. Installed newest rpms for Fedora 1,
> > restored databases. Recompiled client libraries and binaries. Restarted
> > and after five hours of operation:
> > May  1 21:34:19 claymountain postgres[6337]: [2-1] ERROR:  could not
> > access status of transaction 4250811410
> > May  1 21:34:19 claymountain postgres[6337]: [2-2] DETAIL:  could not
> > open file
> > "/var/lib/pgsql/data/pg_clog/0FD5": No such file or directory
>
> Which exactly are the "newest rpms for Fedora 1" ... what PG version
> and where did you get them from?
>
Name        : postgresql-server            Relocations: (not
relocateable)
Version     : 7.4.7                             Vendor: (none)
Release     : 2PGDG                         Build Date: Fri 25 Feb 2005
01:42:54 PM EET

Got them from
http://www.postgresql.org/ftp/binary/v7.4.7/rpms/fedora/fedora-core-1/


> It looks like a corrupt-data issue to me.  You could follow the usual
> sorts of procedures to try to isolate and get rid of the bad data
> (see the list archives for details).  But I think first you need to
> question what caused it.  Could your disk drive be failing (or other
> hardware problem)?  How much do you trust the specific kernel version
> you are currently running?

I have no control over the kernel version I am running. The server is
located on virtual machine and the kernel version claims to be Linux
claymountain.planeetta.com 2.4.20-021stab028.5.777-enterprise #1 SMP Tue
Feb 22 17:44:46 MSK 2005 i686 i686 i386 GNU/Linux. I have no trust or
distrust against it.

I've tried to contact the virtual server provider but so far the guy who
is supposed to know something about virtual servers has not been in and
is not returning my calls. As far as I can tell, the hardware "looks"
fine at least when looked at from a virtual server.

I moved the database that seemed to cause the corruption to an another
machine and now both servers have been happily running for more than 24
hours without any indication of data corruption.

I am happy but scared.

I would still like to know what caused the corruption. My current guess
is that it could be network related. Corruption occurred when the data
was collected on a different machine than where the database was
located. Collector is a c++-application using libpq++-4.0. The
corruption could have something to do with locales and network errors.

Regards,
Mauri Sahlberg


pgsql-admin by date:

Previous
From: Christopher Browne
Date:
Subject: Re: REMOVE
Next
From: Devrim GUNDUZ
Date:
Subject: Re: Postgre 8.0 for Linux i586