Thread: Index corruption after proper shut down

Index corruption after proper shut down

From
Strahinja Kustudić
Date:
Hi all,

Last week we migrated 200+ of our servers from one rack to another and the procedure was dead simple: power off server from the OS, unplug it, move it to a different rack, plug it in and start it. The problem was that after the boot some of the servers had corrupted indexes.

Servers are Dell PowerEdge R420 with H700 RAID controller with BBU, Centos 5.9 x64 with Postgres 9.1.9 running on two Intel 330 120GB SSDSC2CT120 (one for data, and one for indexes) on XFS (noatime,nobarrier,noquota). Relevant Postgres configuration is:

wal_level = minimal
fsync = on
wal_sync_method = fdatasync
full_page_writes = on
synchronous_commit = off
wal_buffers = -1

Also we disabled disk write cache on all drives with the MegaCli64 utility, since the RAID controller should be the one caching since it has a BBU.

Does anyone have any idea, why could we get index corruption?

Thanks in advance

Regards,
Strahinja

Re: Index corruption after proper shut down

From
Strahinja Kustudić
Date:
Sorry for a reply to myself, but does anyone have any idea what could be the problem? We would like to try do some testing from your suggestions to see what could cause this problem and how to mitigate it.

Regards,
Strahinja

On Fri, Nov 15, 2013 at 11:44 AM, Strahinja Kustudić <strahinjak@nordeus.com> wrote:
Hi all,

Last week we migrated 200+ of our servers from one rack to another and the procedure was dead simple: power off server from the OS, unplug it, move it to a different rack, plug it in and start it. The problem was that after the boot some of the servers had corrupted indexes.

Servers are Dell PowerEdge R420 with H700 RAID controller with BBU, Centos 5.9 x64 with Postgres 9.1.9 running on two Intel 330 120GB SSDSC2CT120 (one for data, and one for indexes) on XFS (noatime,nobarrier,noquota). Relevant Postgres configuration is:

wal_level = minimal
fsync = on
wal_sync_method = fdatasync
full_page_writes = on
synchronous_commit = off
wal_buffers = -1

Also we disabled disk write cache on all drives with the MegaCli64 utility, since the RAID controller should be the one caching since it has a BBU.

Does anyone have any idea, why could we get index corruption?

Thanks in advance

Regards,
Strahinja

pgpool-II

From
"Joseph Mays"
Date:
I am trying to help someone who wants a dynamic failover server. They have a very active pgsql server. I set up streaming replication to another server so there is always a live backup available, but now they have decided they want to be able to read and write to either server at any time. Essentially they are looking for a cluster, even though that is the wording they are using.
 
I have been looking at pgpool-II. How well does it work, and how hard would be it be to convert a primary server and a streaming replication server to be a load-balanced pgpoo-II cluster?
 
If the two databases get out of sync (such as one is down for a while and then back up) do they automatically sync the databases back up?
 
Joe Mays