Josh Berkus wrote:
> Folks,
>
> What mount options to people use for Ext3, particularly what do you set "data
> = " for a high-transaction database? I'm used to ReiserFS ("noatime,
> notail") and am not really sure where to go with Ext3.
For ReiserFS, I can certainly understand using "noatime", but I'm not
sure why you use "notail" except to allow LILO to operate properly on
it.
The default for ext3 is to do ordered writes: data is written before
the associated metadata transaction commits, but the data itself isn't
journalled. But because PostgreSQL synchronously writes the
transaction log (using fsync() by default, if I'm not mistaken) and
uses sync() during a savepoint, I would think that ordered writes at
the filesystem level would probably buy you very little in the way of
additional data integrity in the event of a crash.
So if I'm right about that, then you might consider using the
"data=writeback" option for the filesystem that contains the actual
data (usually /usr/local/pgsql/data), but I'd use the default
("data=ordered") at the very least (I suppose there's no harm in using
"data=journal" if you're willing to put up with the performance hit,
but it's not clear to me what benefit, if any, there is) for
everything else.
I use ReiserFS also, so I'm basing the above on what knowledge I have
of the ext3 filesystem and the way PostgreSQL writes data.
The more interesting question in my mind is: if you use PostgreSQL on
an ext3 filesystem with "data=ordered" or "data=journal", can you get
away with turning off PostgreSQL's fsync altogether and still get the
same kind of data integrity that you'd get with fsync enabled? If the
operating system is able to guarantee data integrity, is it still
necessary to worry about it at the database level?
I suspect the answer to that is that you can safely turn off fsync
only if the operating system will guarantee that write transactions
from a process are actually committed in the order they arrive from
that process. Otherwise you'd have to worry about write transactions
to the transaction log committing before the writes to the data files
during a savepoint, which would leave the overall database in an
inconsistent state if the system were to crash after the transaction
log write (which marks the savepoint as completed) committed but
before the data file writes committed. And my suspicion is that the
operating system rarely makes any such guarantee, journalled
filesystem or not.
--
Kevin Brown kevin@sysexperts.com