Data corruption after SAN snapshot - Mailing list pgsql-admin

From Terry Schmitt
Subject Data corruption after SAN snapshot
Date
Msg-id CAOOcyswLYBfJDuvNBPMkiNCGNKgK=SiexUuTVHCh2O+Y1T-sLw@mail.gmail.com
Whole thread Raw
Responses Re: Data corruption after SAN snapshot  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Data corruption after SAN snapshot  (Craig Ringer <ringerc@ringerc.id.au>)
Re: Data corruption after SAN snapshot  (Stephen Frost <sfrost@snowman.net>)
List pgsql-admin
Hi All,

I have a pretty strange issue that I'm looking for ideas on.
I'm using Postgres Plus Advanced Server 9.1, but I believe this problem is relevant to Postgres Community. It is certainly possible to be a EDB bug and I am already working with them on this.

We are migrating 1TB+ from Oracle to PPAS. Our new environment consists of a primary server with two "read-only" clones. We use NetApps SAN storage and execute a NetApps consistent snapshot on the primary server and then use flex clones for the read-only servers. The clones power up, do a short recovery and all should be well. We have been doing this method for two years except using PPAS 8.4 and physical servers and ext4.
The new environment is RHEL 6.x guests running inside Redhat Virtualization using XFS and LVM.

The problem is that after the data load, we take a warm snapshot and the cloned database are coming up corrupt.
A classic example is: ERROR:  could not read block 1 in file "base/18511/13872": read only 0 of 8192 bytes. Looking at the data file, it is 8k in size, so obviously we are missing block 1 from the file. So far I identified indexes and sequences as corrupt, but I believe it could be any object.
Since the snapshot is essentially a crash, this system is not crash resistant either.

Looking through the timeline of events, it is clear that data exists in RAM on the primary server, but is not being written out to the SAN for the snapshot and hence is missing when the clone starts up. My first thought is that fsync is not working. PPAS has fsync on and is using fdatasync.

I run a rudimentary test using: dd if=/dev/zero of=dd_test2 bs=8k count=128k conv=fdatasync and crash the server immediate after dd completes.
Everything behaves as expected. with fsync or fdatasync, the file exists after the crash and reboot. Leaving out the sync results in a missing file after the crash/reboot, but that is expected. This simple tests shows that fdatasync is working, but does not prove this under load.

So, at this point, I don't know if the fdatasync is being issued, but not honored by the OS stack, or if PPAS is even issuing the sync at all.

Anyone have a solid method to test if fdatasync is working correctly or thoughts on troubleshooting this? It is extremely time consuming to replicate the problem, but even then the corruption moves around, so it's hard to know immediately if there is corruption at all. I'm hoping to utilize a tool set outside of Postgres to positively eliminate the OS stack.

Sorry for the lengthy post, but hopefully it's clear what is going on.

Thanks!
T

pgsql-admin by date:

Previous
From: "Anibal David Acosta"
Date:
Subject: Re: Timeout error on pgstat
Next
From: Simon Riggs
Date:
Subject: Re: Data corruption after SAN snapshot