Analysis of ganged WAL writes - Mailing list pgsql-hackers

From Tom Lane
Subject Analysis of ganged WAL writes
Date
Msg-id 6433.1033863379@sss.pgh.pa.us
Whole thread Raw
Responses Re: Analysis of ganged WAL writes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I do not think the situation for ganging of multiple commit-record
writes is quite as dire as has been painted.  There is a simple error
in the current code that is easily corrected: in XLogFlush(), the
wait to acquire WALWriteLock should occur before, not after, we try
to acquire WALInsertLock and advance our local copy of the write
request pointer.  (To be exact, xlog.c lines 1255-1269 in CVS tip
ought to be moved down to before line 1275, inside the "if" that
tests whether we are going to call XLogWrite.)

Given that change, what will happen during heavy commit activity
is like this:

1. Transaction A is ready to commit.  It calls XLogInsert to insert
its commit record into the WAL buffers (thereby transiently acquiring
WALInsertLock) and then it calls XLogFlush to write and sync the
log through the commit record.  XLogFlush acquires WALWriteLock and
begins issuing the needed I/O request(s).

2. Transaction B is ready to commit.  It gets through XLogInsert
and then blocks on WALWriteLock inside XLogFlush.

3. Transactions C, D, E likewise insert their commit records
and then block on WALWriteLock.

4. Eventually, transaction A finishes its I/O, advances the "known
flushed" pointer past its own commit record, and releases the
WALWriteLock.

5. Transaction B now acquires WALWriteLock.  Given the code change I
recommend, it will choose to flush the WAL *through the last queued
commit record as of this instant*, not the WAL endpoint as of when it
started to wait.  Therefore, this WAL write will handle all of the
so-far-queued commits.

6. More transactions F, G, H, ... arrive to be committed.  They will
likewise insert their COMMIT records into the buffer and block on
WALWriteLock.

7. When B finishes its write and releases WALWriteLock, it will have
set the "known flushed" pointer past E's commit record.  Therefore,
transactions C, D, E will fall through their tests without calling
XLogWrite at all.  When F gets the lock, it will conclude that it
should write the data queued up to that time, and so it will handle
the commit records for G, H, etc.  (The fact that lwlock.c will release
waiters in order of arrival is important here --- we want C, D, E to
get out of the queue before F decides it needs to write.)


It seems to me that this behavior will provide fairly effective
ganging of COMMIT flushes under load.  And it's self-tuning; no need
to fiddle with weird parameters like commit_siblings.  We automatically
gang as many COMMITs as arrive during the time it takes to write and
flush the previous gang of COMMITs.

Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: New lock types
Next
From: Tom Lane
Date:
Subject: Re: New lock types