Re: Lost rows/data corruption? - Mailing list pgsql-general
From | Andrew Hall |
---|---|
Subject | Re: Lost rows/data corruption? |
Date | |
Msg-id | 021e01c513be$5b20d700$5001010a@bluereef.local Whole thread Raw |
In response to | Re: Lost rows/data corruption? ("Andrew Hall" <temp02@bluereef.com.au>) |
Responses |
Re: Lost rows/data corruption?
|
List | pgsql-general |
fsync is on for all these boxes. Our customers run their own hardware with many different specification of hardware in use. Many of our customers don't have UPS, although their power is probably pretty reliable (normal city based utilities), but of course I can't guarantee they don't get an outage once in a while with a thunderstorm etc. The problem here is that we are consistently seeing the same kind of corruption and symptoms across a fairly large number of customers (52 have reported this problem), so there is something endemic happening here that to be honest, I'm surprised no one else is seeing. Fundamentally there is nothing particularly abnormal with our application or data, but regardless, I would have thought these kind of things (application design, data representation etc) irrelevant to the reliability of the database not to allow duplicate data on a primary key. Something is causing this corruption, and one thing we do know is that it doesn't happen immediately with a new installation, it takes time (several months of usage) before we start to see this condition. I'd be really surprised if XFS is the problem as I know there are plenty of other people across the world using it reliability with PG. We're going to see if we can build a test environment that can forcibly cause this but I don't hold much hope, as we've tried to isolate it before with little success. Here's what we tried changing when we originally went searching for the problem, and it still here: - the hardware (tried single CPU instead of dual - though that maybe an issue with the OS) - the OS version (tried Linux 2.6.5, 2.6.6, 2.6.7, 2.6.8.1, 2.6.10 and 2.4.22) - all using XFS - the database table layout (tried changing the way the data is stored) - the version of Jetty (servlet engine) - the DB pool manager and PG JDBC driver versions - the version of PG (tried two or three back from the latest) - various vacuum regimes ----- Original Message ----- From: "Marco Colombo" <pgsql@esiway.net> To: "Andrew Hall" <temp02@bluereef.com.au> Cc: <pgsql-general@postgresql.org> Sent: Wednesday, February 16, 2005 2:58 AM Subject: Re: Lost rows/data corruption? > On Tue, 15 Feb 2005, Andrew Hall wrote: > >> >> >>>> It sounds like a mess, all right. Do you have a procedure to follow to >>>> replicate this havoc? Are you sure there's not a hardware problem >>>> underlying it all? >>>> >>>> regards, tom lane >>>> >> >> We haven't been able to isolate what causes it but it's unlikely to be >> hardware as it happens on quite a few of our customer's boxes. We also >> use >> XFS on linux 2.6 as a file system, so the FS should be fairly tolerant to >> power-outages. Any ideas as to how I might go about isolating this? Have >> you >> heard any other reports of this kind and suggested remedies? > > Are you running with fsync = off? and did the hosts experience any > power-outage recently? > > .TM. > -- > ____/ ____/ / > / / / Marco Colombo > ___/ ___ / / Technical Manager > / / / ESI s.r.l. > _____/ _____/ _/ Colombo@ESI.it >
pgsql-general by date: