On Thu, 2011-08-04 at 18:07 -0400, Bruce Momjian wrote:
> I am confused how generating WAL traffic that is larger than the heap
> file we are fsync'ing can possibly be slower. Are you just throwing out
> an idea to try to make me prove it?
That's worded in a slightly confusing way, but here is the trade-off:
1. If you are using WAL, then regardless of what your transaction does,
only the WAL needs to be fsync'd at commit time. Conveniently, that's
being written sequentially, so it's a single fairly cheap fsync; and all
the data page changes are deferred, collected together, and fsync'd at
checkpoint time (rather than commit time). The cost is that you
double-write the data.
2. If you aren't using WAL, you need to fsync every data file the
transaction touched, which are probably not localized with other
activity. Also, the _entire_ data files needs to be sync'd, so perhaps
many other transactions have made changes to one data file all over, and
it may require _many_ seeks to accomplish the one fsync. The benefit is
that you don't double-write the data.
So, fundamentally, WAL is (in the OLTP case, where a transaction is much
shorter than a checkpoint interval) a big performance _win_, because it
allows us to do nice sequential writing in a single place for all
activities of all transactions; and defer all those random writes to
data pages until the next checkpoint. So we shouldn't treat WAL like a
cost burden that we want to avoid in every case we can.
But in the data load case (where many checkpoints may happen during a
single transaction anyway), it happens that avoiding WAL is a
performance win, because the seeks are not the dominant cost.
Regards,Jeff Davis