Home > mailing lists

Re: PATCH: pgbench - merging transaction logs - Mailing list pgsql-hackers

From	Fabien COELHO
Subject	Re: PATCH: pgbench - merging transaction logs
Date	March 28, 2015 10:29:24
Msg-id	alpine.DEB.2.10.1503281016080.17117@sto Whole thread
In response to	Re: PATCH: pgbench - merging transaction logs (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: PATCH: pgbench - merging transaction logs
List	pgsql-hackers

Tree view

Hello Tomas,

> I do agree that fprintf is not cheap, actually when profiling pgbench
> it's often the #1 item, but the impact on the measurements is actually
> quite small. For example with a small database (scale 10) and read-only
> 30-second runs (single client), I get this:
>
>   no logging: 18672 18792 18667 18518 18613 18547
> with logging: 18170 18093 18162 18273 18307 18234
>
> So on average, that's 18634 vs. 18206, i.e. less than 2.5% difference.
> And with more expensive transactions (larger scale, writes, ...) the
> difference will be much smaller.

I did some testing with a scale 10 prepared "SELECT only" 200 seconds 
plenty of runs with local socket connections on the largest host I have 
available:
  pgbench -P 10 -T 200 -S -M prepared -j $c -c $c ...

I think that this cpu-bound bench is kind of a worst possible case for the 
detailed per transaction log.

I also implemented a quick and dirty version for a merge log based on 
sharing a file handle (append mode + sprintf + fputs).

The results are as follow:
 * 1 thread 33 runs median tps (average is consistent): - no logging:        22062 - separate logging:  19360  (-12.2%)
-merged logging:    19326  (-12.4%, not significant from previous)
 

Note that the impact of logging is much larger than with your tests.
The underlying fprintf comes from gnu libc 2.19.

The worst overhead I could trigger is with 12 threads:
 * 12 threads 35 runs median tps (average is consistent) - no logging:       155917 - separate logging: 124695 (-20.0%)
-merged logging:   119349 (-23.5%)
 

My conclusion from these figures is that although the direct merged 
logging approach adds some overhead, this overhead is small wrt detailed 
logging (it adds 3.5% to a 20% logging overhead) with 12 threads. Other 
tests, even with more threads, did not yield larger absolute or relative 
overheads. Although the direct merge approach is shown to add overheads, 
this is a small additional overhead on a quite bad situation already, 
which suggest that using detailed log on a cpu-bound pgbench run is a bad 
idea to begin with.

For a more realistic test, I ran "simple updates" which involve actual 
disk writes. It ran at around 840 tps with 24 threads. The logging 
overhead seems to be under 1%, and there is no significant difference 
between separate and merged on the 20 runs.

So my overall conclusion is:

(1) The simple thread-shared file approach would save pgbench from 
post-processing merge-sort heavy code, for a reasonable cost.

(2) The feature would not be available for the thread-emulation with this 
approach, but I do not see this as a particular issue as I think that it 
is pretty much only dead code and a maintenance burden.

(3) Optimizing doLog from its current fprintf-based implementation may be 
a good thing.

-- 
Fabien.

pgsql-hackers by date:

From: Dean Rasheed
Date: 28 March 2015, 09:01:49
Subject: Re: Rounding to even for numeric data type

From: Michael Paquier
Date: 28 March 2015, 12:19:57
Subject: Re: Rounding to even for numeric data type

Re: PATCH: pgbench - merging transaction logs - Mailing list pgsql-hackers

Previous

Next