Re: Analysis of ganged WAL writes - Mailing list pgsql-hackers

From Curtis Faith
Subject Re: Analysis of ganged WAL writes
Date
Msg-id DMEEJMCDOJAKPPFACMPMIEFDCEAA.curtis@galtair.com
Whole thread Raw
In response to Re: Analysis of ganged WAL writes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Analysis of ganged WAL writes  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Analysis of ganged WAL writes  (Hannu Krosing <hannu@tm.ee>)
List pgsql-hackers
Tom, first of all, excellent job improving the current algorithm. I'm glad
you look at the WALCommitLock code.

> This must be so because the backends that are
> released at the end of any given disk revolution will not be able to
> participate in the next group commit, if there is already at least
> one backend ready to commit.

This is the major reason for my original suggestion about using aio_write.
The writes don't block each other and there is no need for a kernel level
exclusive locking call like fsync or fdatasync.

Even the theoretical limit you mention of one transaction per revolution
per committing process seem like a significant bottleneck.

Is committing 1 and 4 transactions on every revolution good? It's certainly
better than 1 per revolution.

However, what if we could have done 3 transactions per process in the time
it took for a single revolution?

Then we are looking at (1 + 4)/ 2 = 2.5 transactions per revolution versus
the theoretical maximum of (3 * 5) = 15 transactions per revolution if we
can figure out a way to do non-blocking writes that we can guarantee are on
the disk platter so we can return from commit.

Separating out whether or not aio is viable. Do you not agree that
eliminating the blocking would result in potentially a 6X improvement for
the 5 process case?

>
> So this solution isn't perfect; it would still be nice to have a way to
> delay initiation of the WAL write until "just before" the disk is ready
> to accept it.  I dunno any good way to do that, though.

I still think that it would be much faster to just keep writing the WAL log
blocks when they fill up and have a separate process wake the commiting
process when the write completes. This would eliminate WAL writing as a
bottleneck.

I have yet to hear anyone say that this can't be done, only that we might
not want to do it because the code might not be clean.

I'm generally only happy when I can finally remove a bottleneck completely,
but speeding one up by 3X like you have done is pretty damn cool for a day
or two's work.

- Curtis



pgsql-hackers by date:

Previous
From: "Curtis Faith"
Date:
Subject: Dirty Buffer Writing [was Proposed LogWriter Scheme]
Next
From: Peter Eisentraut
Date:
Subject: Re: 7.2.3 patching done