Re: Analysis of ganged WAL writes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Analysis of ganged WAL writes
Date
Msg-id 24856.1034012573@sss.pgh.pa.us
Whole thread Raw
In response to Re: Analysis of ganged WAL writes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Analysis of ganged WAL writes
List pgsql-hackers
I wrote:
> That says that the best possible throughput on this test scenario is 5
> transactions per disk rotation --- the CPU is just not capable of doing
> more.  I am actually getting about 4 xact/rotation for 10 or more
> clients (in fact it seems to reach that plateau at 8 clients, and be
> close to it at 7).

After further thought I understand why it takes 8 clients to reach full
throughput in this scenario.  Assume that we have enough CPU oomph so
that we can process four transactions, but not five, in the time needed
for one revolution of the WAL disk.  If we have five active clients
then the behavior will be like this:

1. Backend A becomes ready to commit.  It locks WALWriteLock and issues
a write/flush that will only cover its own commit record.  Assume that
it has to wait one full disk revolution for the write to complete (this
will be the steady-state situation).

2. While A is waiting, there is enough time for B, C, D, and E to run
their transactions and become ready to commit.  All eventually block on
WALWriteLock.

3. When A finishes its write and releases WALWriteLock, B will acquire
the lock and initiate a write that (with my patch) will cover C, D, and
E's commit records as well as its own.

4. While B is waiting for the disk to spin, A receives a new transaction
from its client, processes it, and becomes ready to commit.  It blocks
on WALWriteLock.

5. When B releases the lock, C, D, E acquire it and quickly fall
through, seeing that they need do no work.  Then A acquires the lock.
GOTO step 1.

So with five active threads, we alternate between committing one
transaction and four transactions on odd and even disk revolutions.

It's pretty easy to see that with six or seven active threads, we
will alternate between committing two or three transactions and
committing four.  Only when we get to eight threads do we have enough
backends to ensure that four transactions are available to commit on
every disk revolution.  This must be so because the backends that are
released at the end of any given disk revolution will not be able to
participate in the next group commit, if there is already at least
one backend ready to commit.

So this solution isn't perfect; it would still be nice to have a way to
delay initiation of the WAL write until "just before" the disk is ready
to accept it.  I dunno any good way to do that, though.

I went ahead and committed the patch for 7.3, since it's simple and does
offer some performance improvement.  But maybe we can think of something
better later on...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: Parallel Executors [was RE: Threaded Sorting]
Next
From: Neil Conway
Date:
Subject: Re: Hot Backup