Re: database corruption questions - Mailing list pgsql-general

From Daniel Serodio (lists)
Subject Re: database corruption questions
Date
Msg-id 507EE652.4000101@mandic.com.br
Whole thread Raw
In response to database corruption questions  (Heine Ferreira <heine.ferreira@gmail.com>)
List pgsql-general
Craig Ringer wrote:
> On 10/14/2012 05:53 AM, Heine Ferreira wrote:
>> Hi
>>
>> Are there any best practices for avoiding database
>> corruption?
>
> * Maintain rolling backups with proper ageing. For example, keep one a
> day for the last 7 days, then one a week for the last 4 weeks, then
> one a month for the rest of the year, then one a year.
What kind of "rolling backups"? From pg_basebackup?
>
> * Use warm standby with log shipping and/or replication to maintain a
> live copy of the DB.
>
> * If you want point-in-time recovery, keep a few days or weeks worth
> of WAL archives and a basebackup around. That'll help you recover from
> those "oops I meant DROP TABLE unimportant; not DROP TABLE
> vital_financial_records;" issues.
>
> * Keep up to date with the latest PostgreSQL patch releases. Don't be
> one of those people still running 9.0.0 when 9.0.10 is out.
The problem is that updating the database usually results in downtime.
Or can the downtime be avoided in a replication scenario?
>
> * Plug-pull test your system when you're testing it before going live.
> Put it under load with something like pgbench, then literally pull the
> plug out. If your database doesn't come back up fine you have
> hardware, OS or configuration problems.
>
> * Don't `kill -9` the postmaster. It should be fine, but it's still
> not smart.
>
> * ABSOLUTELY NEVER DELETE postmaster.pid
>
> * Use good quality hardware with proper cooling and a good quality
> power supply. If possible, ECC RAM is a nice extra.
>
> * Never, ever, ever use cheap SSDs. Use good quality hard drives or
> (after proper testing) high end SSDs. Read the SSD reviews
> periodically posted on this mailing list if considering using SSDs.
> Make sure the SSD has a supercapacitor or other reliable option for
> flushing its write cache on power loss. Always do repeated plug-pull
> testing when using SSDs.
>
> * Use a solid, reliable file system. zfs-on-linux, btrfs, etc are not
> the right choices for a database you care about. Never, ever, ever use
> FAT32.
>
> * If on Windows, do not run an anti-virus program on your
> database server. Nobody should be using it for other things or running
> programs on it anyway.
>
> * Avoid RAID 5, mostly because the performance is terrible, but also
> because I've seen corruption issues with rebuilds from parity on
> failing disks.
>
> * Use a good quality hardware RAID controller with a battery backup
> cache unit if you're using spinning disks in RAID. This is as much for
> performance as reliability; a BBU will make an immense difference to
> database performance.
>
> * If you're going to have a UPS (you shouldn't need one as your system
> should be crash-safe), don't waste your money on a cheap one. Get a
> good online double-conversion unit that does proper power filtering.
> Cheap UPSs are just a battery with a fast switch, they provide no
> power filtering and what little surge protection they offer is done
> with a component that wears out after absorbing a few surges, becoming
> totally ineffective. Since your system should be crash-safe a cheap
> UPS will do nothing for corruption protection, it'll only help with
> uptime.
>
> --
> Craig Ringer
>
Thanks,
Daniel Serodio


pgsql-general by date:

Previous
From: Mathew Thomas
Date:
Subject: Re: transaction log file "000000010000097600000051" could not be archived: too many failures
Next
From: "Daniel Serodio (lists)"
Date:
Subject: Re: allow servers to access to the same data