Re: VMWare file system / database corruption - Mailing list pgsql-general

From Greg Smith
Subject Re: VMWare file system / database corruption
Date
Msg-id alpine.GSO.2.01.0909221839300.7275@westnet.com
Whole thread Raw
In response to VMWare file system / database corruption  (Tom Duffey <tduffey@techbydesign.com>)
List pgsql-general
On Mon, 21 Sep 2009, Tom Duffey wrote:

> Does anyone with a better understanding of PostgreSQL and VMWare know if this
> is an unreliable setup for PostgreSQL?  I see things like "NFS" and "VMWare"
> and start to get worried.

PostgreSQL requires one simple guarantee:  that when the database writes
something and then calls the OS fsync call, that call will not return
success until that write is on physical disk.  In your case, this
requires:

1) VMWare recognizes fsync and passes that request to the network storage
device
2) The NFS software passes fsync to the SAN
3) The SAN waits until the physical disk write is complete (and not just
in the hard drive's write caches) before returning from the fsync that the
operation is complete.

In theory, there's no reason this can't be made reliable (albeit slow).
But when you have so many layers of stuff in the middle it's hard to prove
that things are working correctly or find the problem part that's causing
corruption.  You'll need to audit everything from your VM down to the SAN
configuration to make sure there are no non-battery backed write-back
caches being used (and, no, a UPS doesn't count), and that none of the
software involved has turned off fsync support as a performance
optimization.

There's a bunch of additional trivia in this area at
http://www.postgresql.org/docs/current/static/wal-reliability.html and my
article at
http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-general by date:

Previous
From: Gurjeet Singh
Date:
Subject: Re: Logging statements longer than 1000ms doesn't appear to work
Next
From: Scott Marlowe
Date:
Subject: Re: Logging statements longer than 1000ms doesn't appear to work