Re: Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching - Mailing list pgsql-hackers

From Curtis Faith
Subject Re: Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching
Date
Msg-id DMEEJMCDOJAKPPFACMPMKEDECEAA.curtis@galtair.com
Whole thread Raw
In response to Re: Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> You are confusing WALWriteLock with WALInsertLock.  A
> transaction-committing flush operation only holds the former.
> XLogInsert only needs the latter --- at least as long as it
> doesn't need to write.

Well that make things better than I thought. We still end up with
a disk write for each transaction though and I don't see how this
can ever get better than (Disk RPM)/ 60 transactions per second,
since commit fsyncs are serialized. Every fsync will have to wait
almost a full revolution to reach the end of the log.

As a practial matter then everyone will use commit_delay to
improve this.
> This will pessimize performance except in the case where WAL traffic
> is very heavy, because it means you don't commit until the block
> containing your commit record is filled.  What if you are the only
> active backend?

We could handle this using a mechanism analogous to the current
commit delay. If there are more than commit_siblings other processes
running then do the write automatically after commit_delay seconds.

This would make things no more pessimistic than the current
implementation but provide the additional benefit of allowing the
LogWriter to write in optimal sizes if there are many transactions.

The commit_delay method won't be as good in many cases. Consider
a update scenario where a larger commit delay gives better throughput.
A given transaction will flush after commit_delay milliseconds. The
delay is very unlikely to result in a scenario where the dirty log 
buffers are the optimal size.

As a practical matter I think this would tend to make the writes
larger than they would otherwise have been and this would
unnecessarily delay the commit on the transaction.

> I do not, however, see any
> value in forcing all the WAL writes to be done by a single process;
> which is essentially what you're saying we should do.  That just adds
> extra process-switch overhead that we don't really need.

I don't think that an fsync will ever NOT cause the process to get
switched out so I don't see how another process doing the write would
result in more overhead. The fsync'ing process will block on the
fsync, so there will always be at least one process switch (probably
many) while waiting for the fsync to comlete since we are talking
many milliseconds for the fsync in every case.

> > The log file would be opened O_DSYNC, O_APPEND every time.
> 
> Keep in mind that we support platforms without O_DSYNC.  I am not
> sure whether there are any that don't have O_SYNC either, but I am
> fairly sure that we measured O_SYNC to be slower than fsync()s on
> some platforms.

Well there is no reason that the logwriter couldn't be doing fsyncs
instead of O_DSYNC writes in those cases. I'd leave this switchable
using the current flags. Just change the semantics a bit.

- Curtis


pgsql-hackers by date:

Previous
From: Doug McNaught
Date:
Subject: Re: Use of sync() [was Re: Potential Large Performance Gain in WAL synching]
Next
From: Tom Lane
Date:
Subject: Re: Use of sync() [was Re: Potential Large Performance Gain in WAL synching]