Re: database corruption questions - Mailing list pgsql-general

From Craig Ringer
Subject Re: database corruption questions
Date
Msg-id 507A1FE3.9080305@ringerc.id.au
Whole thread Raw
In response to database corruption questions  (Heine Ferreira <heine.ferreira@gmail.com>)
Responses Re: database corruption questions  (John R Pierce <pierce@hogranch.com>)
Re: database corruption questions  (Chris Angelico <rosuav@gmail.com>)
Re: database corruption questions  (Daniel Serodio <dserodio@mandic.com.br>)
List pgsql-general
On 10/14/2012 05:53 AM, Heine Ferreira wrote:
> Hi
>
> Are there any best practices for avoiding database
> corruption?

* Maintain rolling backups with proper ageing. For example, keep one a
day for the last 7 days, then one a week for the last 4 weeks, then one
a month for the rest of the year, then one a year.

* Use warm standby with log shipping and/or replication to maintain a
live copy of the DB.

* If you want point-in-time recovery, keep a few days or weeks worth of
WAL archives and a basebackup around. That'll help you recover from
those "oops I meant DROP TABLE unimportant; not DROP TABLE
vital_financial_records;" issues.

* Keep up to date with the latest PostgreSQL patch releases. Don't be
one of those people still running 9.0.0 when 9.0.10 is out.

* Plug-pull test your system when you're testing it before going live.
Put it under load with something like pgbench, then literally pull the
plug out. If your database doesn't come back up fine you have hardware,
OS or configuration problems.

* Don't `kill -9` the postmaster. It should be fine, but it's still not
smart.

* ABSOLUTELY NEVER DELETE postmaster.pid

* Use good quality hardware with proper cooling and a good quality power
supply. If possible, ECC RAM is a nice extra.

* Never, ever, ever use cheap SSDs. Use good quality hard drives or
(after proper testing) high end SSDs. Read the SSD reviews periodically
posted on this mailing list if considering using SSDs. Make sure the SSD
has a supercapacitor or other reliable option for flushing its write
cache on power loss. Always do repeated plug-pull testing when using SSDs.

* Use a solid, reliable file system. zfs-on-linux, btrfs, etc are not
the right choices for a database you care about. Never, ever, ever use
FAT32.

* If on Windows, do not run an anti-virus program on your
database server. Nobody should be using it for other things or running
programs on it anyway.

* Avoid RAID 5, mostly because the performance is terrible, but also
because I've seen corruption issues with rebuilds from parity on failing
disks.

* Use a good quality hardware RAID controller with a battery backup
cache unit if you're using spinning disks in RAID. This is as much for
performance as reliability; a BBU will make an immense difference to
database performance.

* If you're going to have a UPS (you shouldn't need one as your system
should be crash-safe), don't waste your money on a cheap one. Get a good
online double-conversion unit that does proper power filtering. Cheap
UPSs are just a battery with a fast switch, they provide no power
filtering and what little surge protection they offer is done with a
component that wears out after absorbing a few surges, becoming totally
ineffective. Since your system should be crash-safe a cheap UPS will do
nothing for corruption protection, it'll only help with uptime.

--
Craig Ringer


pgsql-general by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Improve MMO Game Performance
Next
From: John R Pierce
Date:
Subject: Re: database corruption questions