Re: (Again) Datacorruption using 7.4.2 on XFS/raid1 - Mailing list pgsql-general

From Brian Hirt
Subject Re: (Again) Datacorruption using 7.4.2 on XFS/raid1
Date
Msg-id C0C38A2F-D438-11D8-9804-000D93AD2E74@mobygames.com
Whole thread Raw
In response to (Again) Datacorruption using 7.4.2 on XFS/raid1  ("Florian G. Pflug" <fgp@phlo.org>)
List pgsql-general
FYI, I have seen the SW linux raid not detect failed drives and cause
filesystem corruption on many occasions.  I would reccomend staying
away from it.  Maybe what you describe is a problem with PG but, i
doubt it.


On Jul 12, 2004, at 12:31 PM, Florian G. Pflug wrote:

> Hi
>
> We have again experienced data-corruption using 7.4.2 on an XFS
> Filesystem
> on top of a software-raid (md) raid-1.
>
> After a server crash last night (It was a rather strange crash - The
> machine
> was still pingable, but no login was possible, and postgres and apache
> didn't respond to requests any more) we hard-reset the machine. It
> came up
> again nicely, but a few hours later the following errors occured when
> trying
> to access certain tabled. (Those tables are updated heavily - each day
> about
> 2 million tuples are inserted, and the old versions of those tuples
> deleted).
>
> ERROR:  could not access status of transaction 34048
> DETAIL:  could not open file "/var/lib/postgres/data/pg_clog/0000": No
> such
> file or directory
>
> While reading linux-kernel today, I stumbled upon a description of a
> rather
> strange XFS behaviour. It seems to zero a block if the block was
> updated,
> and the corresponding metadata-update was flushed to disk, but not the
> data
> itself.
> It does not happen if the file is fsynced() after the update - but I
> was
> wondering what would happen if the machine crashed between the write()
> and
> the fsync().
>
> The lkml thread about this can be found here:
> http://www.ussg.iu.edu/hypermail/linux/kernel/0407.1/0359.html
>
> Could this XFS behaviour cause the postgres problems we are seeing?
>
> greetings, Florian Pflug
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend


pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Latitude/Longitude data types and functions
Next
From: reina_ga@hotmail.com (Tony Reina)
Date:
Subject: Can connection pointer be obtained from PGresult?