Re: Dangers of fsync = off - Mailing list pgsql-general

From Dawid Kuroczko
Subject Re: Dangers of fsync = off
Date
Msg-id 758d5e7f0705100016u49e0abecn78df409c7b3f4de6@mail.gmail.com
Whole thread Raw
In response to Re: Dangers of fsync = off  (Joel Dice <dicej@mailsnare.net>)
List pgsql-general
On 5/8/07, Joel Dice <dicej@mailsnare.net> wrote:
> On Tue, 8 May 2007, Andrew Sullivan wrote:
> > My real question is why you want to turn it off.  If you're using a
> > battery-backed cache on your disk controller, then fsync ought to be
> > pretty close to free.  Are you sure that turning it off will deliver
> > the benefit you think it will?
>
> You may very well be right.  I tend to think in terms of software
> solutions, but a hardware solution may be most appropriate here.  In any
> case, I'm not at all sure this will bring a significant peformance
> improvement.  I just want to understand the implications before I start
> fiddling; if fsync=off is dangerous, it doesn't matter what the
> performance benefits may be.

Well, fsync=off makes failures harder to cope with.

Normally when your operating system crashes/power fails your
master server should start up cleanly.  If it doesn't -- you've got slave.

Now, with fsync=off you should promote slave to master whenever
you experience crash/power failure, just to be safe.  Having battery
backed unit may be cheaper than cost of failovers (time of DBA
costs money, downtime also ;)).  Do some testing, do some
calculations.

> >> on Y.  Thus, database corruption on X is irrelevant since our first step
> >> is to drop them.
> >
> > Not if the corruption introduces problems for replication, which is
> > indeed possible.
>
> That's exactly what I want to understand.  How, exactly, is this possible?
> If the danger of fsync is that it may leave the on-disk state of the
> database in an inconsistent state after a crash, it would not seem to have
> any implications for activity occurring prior to the crash.  In
> particular, a trigger-based replication system would seem to be immune.
>
> In other words, while there may be ways the master could cause corruption
> on the slave, I don't see how they could be related to the fsync setting.

OK, let's assume you have machine mdb as a master database,
and sdb as slave database.  mdb has fsync=off and Slony-I is used
as a replication system.

You have a power failure/system crash/whatever.  mdb goes down.
Your sdb is consistent, but it's missing, let's say 15 seconds of last
transactions which didn't manage to replicate.
You don't do failover yet.  Your mdb starts up, PostgreSQL replays
its Write Ahead Log.  Everything seems fine, mdb is up and running,
and these 15 seconds of transactions are replicated to sdb.

Oops.  PostgreSQL seemd to be fine, but since fsync was off,
the rows in Money_Transactions weren't flushed to disk (fsync
was off), and PostgreSQL thought they should already be on disk
(WAL was replayed since last known CHECKPOINT), you didn't
actually replicated these transactions.  If you are really unlucky
you've replicated some old contents of database, and thus
now, both your mdb and sdb contain erraneous data.
Of course sdb is consistent in terms of "internal structure" but
try explaining it to the poor soul who happened to be doing
updates on Money_Transactions table. ;-)

Of course likelihood of this happening isn't very big -- PostgreSQL
really tries to safeguard your data (elephant never forgets ;)),
but only as long as you give him a chance. ;)

   Regards,
      Dawid

pgsql-general by date:

Previous
From: Richard Huxton
Date:
Subject: Re: WAL file internals and why a 64 bit will not work on a 32 bit
Next
From: Jan Strube
Date:
Subject: Invoke trigger after commit