Thread: XLogFlush invoked about twice as much after 9.2 group commit enhancement

XLogFlush invoked about twice as much after 9.2 group commit enhancement

From
Amit Langote
Date:
Hello,

I have been trying to understand how group commit implementation works
the way it does after 9.2 group commit enhancement patch
(9b38d46d9f5517dab67dda1dd0459683fc9cda9f on REL9_2_STABLE). I admit
it's a pretty old commit though I seek some clarification as to how it
provides the performance gain as it does. Also, I have observed some
behavior in this regard that I could not understand.

Profiling results show that XLogFlush is called about twice as much
after this patch while for XLogWrite count remains about same as
before. Since, XLogFlush is the place where this patch offers the said
performance gain by alleviating the lock contention on WALWriteLock
using the new LWLockAcquireOrWait(). So far so good. I do not
understand why XLogFlush is invoked twice (as see from the profiling
results) as an effect of this patch.

I used "pgbench -c 32 -t 1000 pgbench" in both cases with TPS result
(after applying the patch) not being significantly different (as in
not twice as much on my system).

Using:

pgbench scale=10

shared_buffers=256MB
max_connections=1000
checkpoint_segments=15
synchronous_commit=on

Comments?
--

Amit Langote



On Tue, May 7, 2013 at 2:20 AM, Amit Langote <amitlangote09@gmail.com> wrote:
Hello,

I have been trying to understand how group commit implementation works
the way it does after 9.2 group commit enhancement patch
(9b38d46d9f5517dab67dda1dd0459683fc9cda9f on REL9_2_STABLE). I admit
it's a pretty old commit though I seek some clarification as to how it
provides the performance gain as it does. Also, I have observed some
behavior in this regard that I could not understand.

Profiling results show that XLogFlush is called about twice as much
after this patch while for XLogWrite count remains about same as
before.

Are you sure you properly cleared out the stats between profiling sessions?  Also, XLogFlush gets called by background processes like autovac, checkpointer and bgwriter, in addition to being called by committing processes.  If one profiled session contained a checkpoint and other did not, or one just idled a lot longer between when the benchmark finished and when you shutdown the server, perhaps that explains it.

Anyway, I don't see this behavior change when turning on wal_debug and looking in the logfiles for 'xlog flush request' messages.


I used "pgbench -c 32 -t 1000 pgbench" in both cases with TPS result
(after applying the patch) not being significantly different (as in
not twice as much on my system).

1000 is a very small number of transactions to run a benchmark for.  What was the duration?

What were the actual TPS numbers?  Does your hardware honor fsyncs?

Cheers,

Jeff
> Are you sure you properly cleared out the stats between profiling sessions?
> Also, XLogFlush gets called by background processes like autovac,
> checkpointer and bgwriter, in addition to being called by committing
> processes.  If one profiled session contained a checkpoint and other did
> not, or one just idled a lot longer between when the benchmark finished and
> when you shutdown the server, perhaps that explains it.
>
> Anyway, I don't see this behavior change when turning on wal_debug and
> looking in the logfiles for 'xlog flush request' messages.

Yes I did reset the stats between profiling sessions. Yes, probably
XLogFlush()'s done by other processes were not considered by me. Also,
after doing a few more runs (used pgbench -j 4 -c 32 -T 60) and
observing results, I think I am getting convinced there is no surprise
in getting XLogFlush() being called more often. Since, the patch
enables backends to return more quickly from XLogFlush() in more
number of cases than before, which in turn causes them to proceed to
further transactions which again call XLogFlush(). So, the increased
number of XLogFlush() should be accounted for by increase in
throughput that we see. In earlier mail, it might have been wrong of
me to conclude that the throughput rise and rise in #invocations of
XLogFlush() are not proportional. I think they are, though not
rigorously as far as I could measure. I am wondering if this line of
interpreting it is correct?


--

Amit Langote



Re: XLogFlush invoked about twice as much after 9.2 group commit enhancement

From
Peter Geoghegan
Date:
On Tue, May 7, 2013 at 12:48 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> Anyway, I don't see this behavior change when turning on wal_debug and
> looking in the logfiles for 'xlog flush request' messages.

That could have everything to do with the hardware you're using. In
general, the higher the cost of an fsync, the more useful it is to
amortize that cost among concurrently committing transactions.


-- 
Peter Geoghegan



On Tue, May 7, 2013 at 9:23 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Tue, May 7, 2013 at 12:48 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> Anyway, I don't see this behavior change when turning on wal_debug and
> looking in the logfiles for 'xlog flush request' messages.

That could have everything to do with the hardware you're using. In
general, the higher the cost of an fsync, the more useful it is to
amortize that cost among concurrently committing transactions.

True, but that is going to happen with a proportional increase in throughput, which he reported not seeing.  At least originally.  I'm not sure what to think now.

Cheers,

Jeff