Re: Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching
Date
Msg-id 3630.1033830947@sss.pgh.pa.us
Whole thread Raw
In response to Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching  ("Curtis Faith" <curtis@galtair.com>)
Responses Re: Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching  ("Curtis Faith" <curtis@galtair.com>)
List pgsql-hackers
"Curtis Faith" <curtis@galtair.com> writes:
> Assume Transaction A which writes a lot of buffers and XLog entries,
> so the Commit forces a relatively lengthy fsynch.

> Transactions B - E block not on the kernel lock from fsync but on
> the WALWriteLock. 

You are confusing WALWriteLock with WALInsertLock.  A
transaction-committing flush operation only holds the former.
XLogInsert only needs the latter --- at least as long as it
doesn't need to write.

Thus, given adequate space in the WAL buffers, transactions B-E do not
get blocked by someone else who is writing/syncing in order to commit.

Now, as the code stands at the moment there is no event other than
commit or full-buffers that prompts a write; that means that we are
likely to run into the full-buffer case more often than is good for
performance.  But a background writer task would fix that.

> Back-end servers would not issue fsync calls. They would simply block
> waiting until the LogWriter had written their record to the disk, i.e.
> until the sync'd block # was greater than the block that contained the
> XLOG_XACT_COMMIT record. The LogWriter could wake up committed back-
> ends after its log write returns.

This will pessimize performance except in the case where WAL traffic
is very heavy, because it means you don't commit until the block
containing your commit record is filled.  What if you are the only
active backend?

My view of this is that backends would wait for the background writer
only when they encounter a full-buffer situation, or indirectly when
they are trying to do a commit write and the background guy has the
WALWriteLock.  The latter serialization is unavoidable: in that
scenario, the background guy is writing/flushing an earlier page of
the WAL log, and we *must* have that down to disk before we can declare
our transaction committed.  So any scheme that tries to eliminate the
serialization of WAL writes will fail.  I do not, however, see any
value in forcing all the WAL writes to be done by a single process;
which is essentially what you're saying we should do.  That just adds
extra process-switch overhead that we don't really need.

> The log file would be opened O_DSYNC, O_APPEND every time.

Keep in mind that we support platforms without O_DSYNC.  I am not
sure whether there are any that don't have O_SYNC either, but I am
fairly sure that we measured O_SYNC to be slower than fsync()s on
some platforms.

> The nice part is that the WALWriteLock semantics could be changed to
> allow the LogWriter to write to disk while WALWriteLocks are acquired
> by back-end servers.

As I said, we already have that; you are confusing WALWriteLock
with WALInsertLock.

> Many transactions would commit on the same fsync (now really a write
> with O_DSYNC) and we would get optimal write throughput for the log
> system.

How are you going to avoid pessimizing the few-transactions case?
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Curtis Faith"
Date:
Subject: Re: Threaded Sorting
Next
From: Doug McNaught
Date:
Subject: Re: Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching