Re: Possible causes for database corruption and solutions - Mailing list pgsql-general
From | Scott Marlowe |
---|---|
Subject | Re: Possible causes for database corruption and solutions |
Date | |
Msg-id | dcc563d10912151707x5e09214bt198f7f6c606edb68@mail.gmail.com Whole thread Raw |
In response to | Possible causes for database corruption and solutions (Michael Clark <codingninja@gmail.com>) |
Responses |
Re: Possible causes for database corruption and solutions
|
List | pgsql-general |
On Tue, Dec 15, 2009 at 3:39 PM, Michael Clark <codingninja@gmail.com> wrote: > Hello all, > Over the past 6 months or so I have posted to the list a couple times > looking for information regarding recovering databases from corruption. At > the time the incidents of corruption among our users was very low, but the > frequency is starting to increase, most likely due to the increased user > base that has upgraded to the version of our software that uses Postgres. > I have a couple questions I am hoping to get some feedback on. > The first, likely causes of the corruption. In response to my emails I > mentioned above, I was told that generally these sort of corruptions are > caused by faulty hardware/hard drives of some sort. Which seems reasonable. > When we started to see the number of corruptions per week on the rise we > did some more digging and came across some information that points directly > to hard drives, specifically their write buffers. > (Specifically this page: > http://www.mail-archive.com/sqlite-users@sqlite.org/msg06502.html) > The solution to the problem seemed to be to change the value for the > wal_sync_method setting to fsync_writethrough from the default of fsync. > I was curious if there were perhaps any other reasons that we should look > at? Or if there may be other alternatives to changing the wal_sync_method > setting. > I should note, our product runs on OS X, and I would say about 95% of the > corruptions happen in a bytea column in a given table which tends to hold > largish data (like email bodies which may or may not have embedded > attachments). OK, remember that a lot of us on this list run medium to large databases with no problems with corruption. My recipe for corruption free pgsql is: HARDWARE: Server grade SAS drives (15K Seagate, Hitachi, Samsung, WD etc) on a quality HW RAID controller (3Ware, Areca) tested for at least two weeks of memtest86 and two weeks of pgbench running full throttle. OS: Centos / RHEL 5.2 or FreeBSD 6 (haven't useed 7 but heard good things) PG version 8.3.latest I've had a few problems that I can't yet track down in pg 8.4 so we're not migrating to it until we find those problems and fix them. They're backend crashes, sig 11 btw. The reason 95% of your corruption like occurs in bytea is that they are likely the biggest columns (95% of the storage size) so it's likely that other columns could induce corruption as well. While hard drives are often the cause of corruption, bad memory / cpu / mobo / RAID controller etc can all cause these problems as well. I'd recommend very thorough and intense acceptance testing where you are trying to break the computer so to speak. Pulling power plugs while pgbench is running, things like that. Running on two of three redundant power supplies. I'd also recommend moving off of OSX as you're using a minority OS as far as databases are concerned, and you won't have a very large community to help out when things do go wrong. Apple's focus is and has been on user oriented OS experiences, not databases. While the linux kernel is maintained by someone who has made it clear what he's interested in is the single user experience, it is built into an OS by several companies who take running servers very seriously (RH, Debian, Ubuntu to a lesser extent).
pgsql-general by date: