Thread: pgbench progress report improvements

pgbench progress report improvements

From

Fabien

Date:

06 August 2013, 08:47:29

Here is a patch submission for reference to the next commitfest.



Improve pgbench measurements & progress report
 - Use progress option both under init & bench.
   Activate progress report by default, every 5 seconds.   When initializing, --quiet reverts to the old every 100,000
insertions  behavior...
 
 - Measure transaction latency instead of computing it.
   The previous computed figure does not make sense under throttling,   as sleep throttling time was included in the
figures.  The latency and its standard deviation are printed out under progress   and in the final report.
 
 - Take thread start time at the beginning of the thread.
   Otherwise it includes pretty slow thread/fork system start times in   the measurements. May help with bug #8326.
 - Reduce and compensate throttling underestimation bias.
   This is a consequence of relying  on an integer random generator.   It was about 0.5% with 1000 distinct values.
Multiplieradded to   compensate the 0.05% bias with 10000 distinct integer values.
 
 - Updated documentation & help message.


Sample output under initialization with --progress=1
  creating tables...  223000 of 1000000 tuples (22%) done (elapsed 1.01 s, remaining 3.52 s).  430200 of 1000000 tuples
(43%)done (elapsed 2.03 s, remaining 2.68 s).  507300 of 1000000 tuples (50%) done (elapsed 3.44 s, remaining 3.34 s).
571400of 1000000 tuples (57%) done (elapsed 4.05 s, remaining 3.03 s).  786800 of 1000000 tuples (78%) done (elapsed
5.03s, remaining 1.36 s).  893200 of 1000000 tuples (89%) done (elapsed 6.02 s, remaining 0.72 s).  1000000 of 1000000
tuples(100%) done (elapsed 6.93 s, remaining 0.00 s).  ...
 


Sample output under benchmarking with --progress=1
  progress: 1.0 s, 116.7 tps, 8.528 +- 1.217 ms lat  progress: 2.0 s, 117.7 tps, 8.493 +- 1.165 ms lat  progress: 3.0
s,115.5 tps, 8.654 +- 1.650 ms lat  progress: 4.0 s, 116.8 tps, 8.559 +- 1.365 ms lat  progress: 5.0 s, 111.7 tps,
8.947+- 3.475 ms lat  progress: 6.0 s, 116.7 tps, 8.563 +- 1.387 ms lat  progress: 7.0 s, 116.6 tps, 8.572 +- 1.474 ms
lat ...  # oops, things happen:  progress: 36.0 s, 10.9 tps, 91.864 +- 124.874 ms lat  progress: 37.0 s, 115.2 tps,
8.678+- 1.792 ms lat  ...
 

-- 
Fabien.

Re: pgbench progress report improvements

From

Peter Eisentraut

Date:

24 August 2013, 14:50:11

On Tue, 2013-08-06 at 10:47 +0200, Fabien wrote:
> Here is a patch submission for reference to the next commitfest.

Please replace the tabs in the SGML files with spaces.

Re: pgbench progress report improvements

From

Fabien COELHO

Date:

24 August 2013, 20:06:38

>> Here is a patch submission for reference to the next commitfest.
>
> Please replace the tabs in the SGML files with spaces.

Attached a v2 with tabs replaced by spaces.

-- 
Fabien.

Re: pgbench progress report improvements

From

Noah Misch

Date:

21 September 2013, 04:44:02

For others following along, Pavel Stehule reviewed this on a different thread:
http://www.postgresql.org/message-id/CAFj8pRAVYUhv8x3fEEwaSGGvDcijOgVLZSAVd7rmWn9Mw8RANA@mail.gmail.com

On Tue, Aug 06, 2013 at 10:47:14AM +0200, Fabien wrote:
> Improve pgbench measurements & progress report

These changes are loosely coupled; please separate them into several patch
files:

>  - Use progress option both under init & bench.
>
>    Activate progress report by default, every 5 seconds.
>    When initializing, --quiet reverts to the old every 100,000 insertions
>    behavior...
 Patch (1): Change the default from --progress=0 to --progress=5

This has a disadvantage of causing two extra gettimeofday() calls per
transaction.  That's not cheap on all platforms; a user comparing pgbench
results across versions will need to make a point of disabling this again to
make his results comparable.  Given that threat and uncertainty about which
default would be more popular, I think we should drop this part.
 Patch (2): Make --initialize mode respect --progress.

The definition you've chosen for --quiet makes that option contrary to its own
name: message-per-100k-tuples is typically more voluminous than your proposed
new default of message-per-5s.  In fact, since --quiet currently switches from
message-per-100k-tuples to message-per-5s, your patch gives it the exact
opposite of its present effect.

During the 9.3 development cycle, we _twice_ made changes pertaining to
--initialize message frequency:

http://www.postgresql.org/message-id/flat/20120620.170427.347012755716399568.t-ishii@sraoss.co.jp
http://www.postgresql.org/message-id/flat/1346472039.18010.10.camel@vanquo.pezone.net

That gives me pause about working through yet another change; we keep burning
time on this minor issue.

>  - Measure transaction latency instead of computing it.
>
>    The previous computed figure does not make sense under throttling,
>    as sleep throttling time was included in the figures.
>    The latency and its standard deviation are printed out under progress
>    and in the final report.
 Patch (3): Report the standard deviation of latency.

Seems potentially useful; see inline comments below.

In my limited testing, the interesting latency cases involved stray clusters
of transactions showing 10x-100x mean latency.  If that's typical, I'd doubt
mean/stddev is a great analytical tool.  But I have little reason to believe
that my experience here was representative.
 Patch (4): Redefine "latency" as reported by pgbench and report "lag" more.

If we do this, should we also augment the --log output to contain the figures
necessary to reconstruct this calculation?  I think that would mean adding
fields for the time when the first command of the transaction was sent.

>
>  - Take thread start time at the beginning of the thread.
>
>    Otherwise it includes pretty slow thread/fork system start times in
>    the measurements. May help with bug #8326.
 Patch (5)

That theory is sound, but I would like at least one report reproducing that
behavior and finding that this change improved it.

>  - Reduce and compensate throttling underestimation bias.
>
>    This is a consequence of relying  on an integer random generator.
>    It was about 0.5% with 1000 distinct values. Multiplier added to
>    compensate the 0.05% bias with 10000 distinct integer values.
 Patch (6)

See inline comment below.

>  - Updated documentation & help message.

Include any relevant documentation and --help updates with the patch
necessitating them.  If there are errors in the documentation today, put fixes
for those in a separate patch (7).

Additionally, this patch contains numerous whitespace-only changes.  Do not
mix those into a patch that makes a functional change.  Most of them would be
done or undone by the next pgindent run, so I suggest simply dropping them.
See the first entry here:
https://wiki.postgresql.org/wiki/Submitting_a_Patch#Reasons_your_patch_might_be_returned

Patches (5) and (6) are nicely small and free of UI questions.  At a minimum,
let's try to get those done in this CommitFest.  I propose dropping patches
(1) and (2).  Patches (3) and (4) have open questions, but I think they have
good value potential.

> Sample output under benchmarking with --progress=1

>   progress: 36.0 s, 10.9 tps, 91.864 +- 124.874 ms lat
>   progress: 37.0 s, 115.2 tps, 8.678 +- 1.792 ms lat

"x +- y" does not convey that y is a standard deviation.  I suggest getting
the term "stddev" in there somehow, maybe like this:
 progress: 37.0 s, 115.2 tps, latency avg=8.678 ms  stddev=1.792

> --- a/contrib/pgbench/pgbench.c
> +++ b/contrib/pgbench/pgbench.c

> -           "  -R, --rate=SPEC          target rate in transactions per second\n"
> +           "  -R, --rate=NUM           target rate in transactions per second\n"

This would fall under patch (7) if you feel it's needed.

> @@ -928,14 +931,18 @@ top:
>           * Use inverse transform sampling to randomly generate a delay, such
>           * that the series of delays will approximate a Poisson distribution
>           * centered on the throttle_delay time.
> -                 *
> -                 * 1000 implies a 6.9 (-log(1/1000)) to 0.0 (log 1.0) delay multiplier.
> +         *
> +         * 10000 implies a 9.2 (-log(1/10000)) to 0.0 (log 1) delay multiplier,
> +         * and results in a 0.055 % target underestimation bias:
> +         *
> +         * SELECT 1.0/AVG(-LN(i/10000.0)) FROM generate_series(1,10000) AS i;
> +         * = 1.000552717032611116335474
>           *
>           * If transactions are too slow or a given wait is shorter than
>           * a transaction, the next transaction will start right away.
>           */
> -        int64 wait = (int64)
> -            throttle_delay * -log(getrand(thread, 1, 1000)/1000.0);
> +        int64 wait = (int64) (throttle_delay *
> +            1.00055271703 * -log(getrand(thread, 1, 10000)/10000.0));

This quantizes the random number, then dequantizes it, then corrects for the
resolution lost in the round trip.  Shouldn't it just be:
    int64 wait = (int64) (throttle_delay *        -log(1 - pg_erand48(thread->random_state)));

> -        if ((!use_quiet) && (j % 100000 == 0))
> +        if (!progress && (j % 100000 == 0))

Project style is "progress == 0" rather than "!progress".

> @@ -2143,8 +2174,15 @@ printResults(int ttype, int normal_xacts, int nclients,
>           * transaction.  The measured lag may be caused by thread/client load,
>           * the database load, or the Poisson throttling process.
>           */
> -        printf("average rate limit schedule lag: %.3f ms (max %.3f ms)\n",
> -               0.001 * throttle_lag / normal_xacts, 0.001 * throttle_lag_max);
> +        double latency = 0.001 * total_latencies / normal_xacts;
> +        double sqlat = (double) total_sqlats / normal_xacts;
> +
> +        printf("average latency under rate limit: %.3f +- %.3f ms\n"
> +               "average rate limit lag: %.3f ms (max %.3f ms)\n",
> +               latency,
> +               0.001 * sqrt(sqlat - 1000000*latency*latency),
> +               0.001 * throttle_lag / normal_xacts,
> +               0.001 * throttle_lag_max);

This output only happens when using --rate.  That has made sense heretofore,
because in the absence of rate limiting, average latency can be trivially
calculated from tps.  Since we will now track stddev as well, let's print
something like this even when --rate is not in use.

> -                float tps, total_run, latency;
> +                double tps, total_run, latency, sqlat, stdev, lag;

Why these variable type changes?

nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: pgbench progress report improvements

From

Fabien COELHO

Date:

21 September 2013, 08:56:10

Hello Noah,

Thanks a lot for all these comments!

I'm not planning to apply all of them directly, especially removing 
features that I think really desirable. Please find a defense against some 
of your suggestions. I wish to wait for some more feedback about these 
arguments before spending time in heavy changes.


>> Improve pgbench measurements & progress report
>
> These changes are loosely coupled; please separate them into several patch
> files:

I thought about this. I submitted a bunch of very small pgbench patches to 
the previous commit fest, and it resulted in a lot of duplicated tests by 
reviewers, which made it a waste of their precious time.

ISTM that what you suggest will result in more reviewer time, which is the 
critical resource of the commitfest.

Moreover, splitting the patch as you suggest, especially in 7 patches, 
would result in conflicting or dependent patches, which creates yet more 
issues for the reviews. There is *some* coupling, because the reports are 
about the measurements performed, and I change both to try to have 
something consistent in the end. I can separate the doc fix, but this is 
just one line...

Basically, I tried to optimize reviewer time by a combined patch focussed 
on pgbench measurements and their reporting.

But this does not prevent discussing features separately!

>>  - Use progress option both under init & bench.
>>
>>    Activate progress report by default, every 5 seconds.
>>    When initializing, --quiet reverts to the old every 100,000 insertions
>>    behavior...
>
>  Patch (1): Change the default from --progress=0 to --progress=5
>
> This has a disadvantage of causing two extra gettimeofday() calls per 
> transaction.  That's not cheap on all platforms; a user comparing 
> pgbench results across versions will need to make a point of disabling 
> this again to make his results comparable. Given that threat and 
> uncertainty about which default would be more popular, I think we should 
> drop this part.

The gettimeofday call time is very small compared to network and database 
(disk!) accesses involved in a pgbench run. On my small laptop, 
gettimeofday takes about 0.00000002 seconds (0.02 µs -- micro seconds) per 
call, which is typically under 1/1000 of the fastest local-socket 
read-only one-table cache-hit prepared trivial transaction.

This is a little different when initializing, but I put a guard on 
gettimeofday in that case.

If this is a blocker, I can put 0, but I really do not think it is 
necessary because of the performance impact.

About the default behavior : it should be what is more useful.

I found that running pgbench to get significant results requires long 
minutes or even hours because of warmup time. Running a command for 30 
minutes without any feedback is annoying. The second point is that the 
feedback would help user notice that the figures evolve slowly but 
significantly, and realise that their too short runs are not significant 
at all. So it seems to me that it should be the desirable behavior, and 
--progress=0 is always possible anyway for the very performance 
(over-)conscious tester.

>  Patch (2): Make --initialize mode respect --progress.
>
> The definition you've chosen for --quiet makes that option contrary to its own
> name: message-per-100k-tuples is typically more voluminous than your proposed
> new default of message-per-5s.  In fact, since --quiet currently switches from
> message-per-100k-tuples to message-per-5s, your patch gives it the exact
> opposite of its present effect.
>
> During the 9.3 development cycle, we _twice_ made changes pertaining to
> --initialize message frequency:
>
> http://www.postgresql.org/message-id/flat/20120620.170427.347012755716399568.t-ishii@sraoss.co.jp
> http://www.postgresql.org/message-id/flat/1346472039.18010.10.camel@vanquo.pezone.net
>
> That gives me pause about working through yet another change; we keep burning
> time on this minor issue.

I totally agree that this "quiet" is not convincing!

My actual opinion is that quiet should just mean quiet:-), i.e. no 
progress report.

I tried to preserve the row-counting behavior because I thought that 
someone would object otherwise, but I would be really in favor of dropping 
the row-counting report behavior altogether and only keep the time 
triggered report.

>>  - Measure transaction latency instead of computing it.
>
>  Patch (3): Report the standard deviation of latency.
>
> Seems potentially useful; see inline comments below.
>
> In my limited testing, the interesting latency cases involved stray 
> clusters of transactions showing 10x-100x mean latency. If that's 
> typical, I'd doubt mean/stddev is a great analytical tool. But I have 
> little reason to believe that my experience here was representative.

The stddev measurement is sensitive to the cache hit ratio, so this is a 
good indicator of whether the steady state is reached.

Indeed, mean is the most basic tool, and stddev the second most basic...
They are better than nothing, and the stddev does add value.


>  Patch (4): Redefine "latency" as reported by pgbench and report "lag" more.
>
> If we do this, should we also augment the --log output to contain the figures
> necessary to reconstruct this calculation?  I think that would mean adding
> fields for the time when the first command of the transaction was sent.

Yep, I thought about it, but decided against for the following reason: the 
lag measure is only significant under throttling, which is not the typical 
use case for pgbench. When there is no throttling, there is no lag, so 
nothing to report. Also, having two formats would mean that existing tool 
that analyse these logs would have to deal with both.


> ...
>  Patch (5)
>
> That theory is sound, but I would like at least one report reproducing that
> behavior and finding that this change improved it.

It was really annoying when I was debugging pgbench, because I did not 
know whether flukes I was seeing were due to this issue or elsewhere. 
Having the start time taken when the test really starts help rule out this 
as a potential issue for strange figures on short runs.

The bug report I mentionned was a pretty short test with a *lot* of 
threads which took possibly several seconds to be started, but I cannot 
run that on my small laptop.


>>  - Updated documentation & help message.
>
> Include any relevant documentation and --help updates with the patch
> necessitating them.  If there are errors in the documentation today, put fixes
> for those in a separate patch (7).

I could do a separate patch for a one-liner doc update for homogeneity...

> Additionally, this patch contains numerous whitespace-only changes.

Hmmm. This is emacs fault:-) I'll (try) to remove them.

> Patches (5) and (6) are nicely small and free of UI questions.  At a minimum,
> let's try to get those done in this CommitFest.  I propose dropping patches
> (1) and (2).

I disagree: I thing that having --progress and --quiet work consistently 
both under initialization and bench is valuable. The other point is 
whether the default should be 0, and I defended why it should really not 
be 0, see above.

> Patches (3) and (4) have open questions, but I think they have
> good value potential.

See, and evaluate, my arguments and answers.

>> Sample output under benchmarking with --progress=1
>
>>   progress: 36.0 s, 10.9 tps, 91.864 +- 124.874 ms lat
>>   progress: 37.0 s, 115.2 tps, 8.678 +- 1.792 ms lat
>
> "x +- y" does not convey that y is a standard deviation.

I'm refering to the ± sign, but in ascii because I doubt that I can use a 
UTF character:-) From http://en.wikipedia.org/wiki/Plus-minus_sign:
  "In experimental sciences, the sign commonly indicates the confidence  interval or error in a measurement, often the
standarddeviation or  standard error."
 

ISTM that "pgbench" is experimental science.


> I suggest getting the term "stddev" in there somehow, maybe like this:
>
>  progress: 37.0 s, 115.2 tps, latency avg=8.678 ms  stddev=1.792

My issue is to try to keep the line width under control so as to avoid 
line breaks in the terminal. Under throttling, there is also the time lag 
appended to the line.

Moreover, using 'xxx=figure" breaks simple "cut" pipelining to extract the 
figures, so I would prefer to stick to spaces.

Maybe:
  progress: 36.0 s, 115.2 tps, lat avg 9.678 ms stddev 1.792, lag 0.143 ms

but I liked my "+-" approach:-)

>> -           "  -R, --rate=SPEC          target rate in transactions per second\n"
>> +           "  -R, --rate=NUM           target rate in transactions per second\n"
>
> This would fall under patch (7) if you feel it's needed.

This is really to be homogeneous with other options. "needed" is a strong 
word.

>> -        int64 wait = (int64)
>> -            throttle_delay * -log(getrand(thread, 1, 1000)/1000.0);
>> +        int64 wait = (int64) (throttle_delay *
>> +            1.00055271703 * -log(getrand(thread, 1, 10000)/10000.0));
>
> This quantizes the random number, then dequantizes it, then corrects for the
> resolution lost in the round trip.

Yep!

>  Shouldn't it just be:
>
>         int64 wait = (int64) (throttle_delay *
>             -log(1 - pg_erand48(thread->random_state)));

It could. However with this approach the inserted delay would be pretty 
much unbounded. The benefit of using 10000 is that the delay is bounded by 
about 10 times the expected average rate, so although it is throttled by a 
stochastic process, it stays quite active and steady. Basically using the 
quantization seemed a good compromise to me...

>> -        if ((!use_quiet) && (j % 100000 == 0))
>> +        if (!progress && (j % 100000 == 0))
>
> Project style is "progress == 0" rather than "!progress".

I could. I used "!progress" because "progress == 0" is interpreted as "no 
progress", and not "progress with value zero", which would mean a report 
at every possible time.

>> @@ -2143,8 +2174,15 @@ printResults(int ttype, int normal_xacts, int nclients,
>> [...]
> This output only happens when using --rate.  That has made sense 
> heretofore, because in the absence of rate limiting, average latency can 
> be trivially calculated from tps. Since we will now track stddev as 
> well, let's print something like this even when --rate is not in use.

It is also printed without --rate. There is a "if" above because there is 
one report with "lag" (under --rate), and one without.

>> -                float tps, total_run, latency;
>> +                double tps, total_run, latency, sqlat, stdev, lag;
>
> Why these variable type changes?

Because C computes stuff in double anyway (all maths functions such as 
sqrt above are about double), and it was an oversight from my part to have 
use float in the first place in the previous patch. Thus I fixed it.

-- 
Fabien.

Re: pgbench progress report improvements

From

Rod Taylor

Date:

21 September 2013, 13:45:55

On Sat, Sep 21, 2013 at 4:55 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

I suggest getting the term "stddev" in there somehow, maybe like this:

progress: 37.0 s, 115.2 tps, latency avg=8.678 ms stddev=1.792

My issue is to try to keep the line width under control so as to avoid line breaks in the terminal. Under throttling, there is also the time lag appended to the line.

Moreover, using 'xxx=figure" breaks simple "cut" pipelining to extract the figures, so I would prefer to stick to spaces.

Maybe:

progress: 36.0 s, 115.2 tps, lat avg 9.678 ms stddev 1.792, lag 0.143 ms

but I liked my "+-" approach:-)

100 +- 3 implies a range of 97 to 103 and no values are outside of that range.

Re: pgbench progress report improvements

From

Fabien COELHO

Date:

21 September 2013, 15:02:28

>> Moreover, using 'xxx=figure" breaks simple "cut" pipelining to extract the
>> figures, so I would prefer to stick to spaces.
>>
>> Maybe:
>>
>>   progress: 36.0 s, 115.2 tps, lat avg 9.678 ms stddev 1.792, lag 0.143 ms
>>
>> but I liked my "+-" approach:-)
>
> 100 +- 3 implies a range of 97 to 103 and no values are outside of that
> range.

This is one of the possible meaning, but ISTM that it is not exclusive. 
Anyway here we do not have a symmetric distribution of run times.

Well, it seems I'm alone in liking my +-, so I'll backtrack on that.

-- 
Fabien.

Re: pgbench progress report improvements

From

Noah Misch

Date:

21 September 2013, 20:40:14

On Sat, Sep 21, 2013 at 10:55:54AM +0200, Fabien COELHO wrote:
>>> Improve pgbench measurements & progress report
>>
>> These changes are loosely coupled; please separate them into several patch
>> files:
>
> I thought about this. I submitted a bunch of very small pgbench patches 
> to the previous commit fest, and it resulted in a lot of duplicated tests 
> by reviewers, which made it a waste of their precious time.
>
> ISTM that what you suggest will result in more reviewer time, which is 
> the critical resource of the commitfest.
>
> Moreover, splitting the patch as you suggest, especially in 7 patches,  
> would result in conflicting or dependent patches, which creates yet more  
> issues for the reviews. There is *some* coupling, because the reports are 
> about the measurements performed, and I change both to try to have  
> something consistent in the end. I can separate the doc fix, but this is  
> just one line...

Post all the patch files to the same mailing list thread and have them
reviewed under a single CommitFest entry.  A reviewer or committer preferring
less granularity can squash patches together effortlessly.  The inverse task,
splitting a patch, is not so mechanical.  Consequently, err on the side of
more, smaller patches; it almost always saves reviewer and committer time.

> Basically, I tried to optimize reviewer time by a combined patch focussed 
> on pgbench measurements and their reporting.

Relatedness of changes is a continuum, and committers sometimes disagree on
how far to subdivide before commit.  This patch is not a borderline case;
removing a bias from --rate throttle delays does not belong in the same commit
as enabling progress reports by default.

> But this does not prevent discussing features separately!

Actually, it does hamper debate.  With smaller patches, a person interested in
just one issue can read just that patch.  With a monolithic patch, one must
take in the entire thing to fully comprehend any single issue.

>>  Patch (1): Change the default from --progress=0 to --progress=5
>>
>> This has a disadvantage of causing two extra gettimeofday() calls per  
>> transaction.  That's not cheap on all platforms; a user comparing  
>> pgbench results across versions will need to make a point of disabling  
>> this again to make his results comparable. Given that threat and  
>> uncertainty about which default would be more popular, I think we 
>> should drop this part.
>
> The gettimeofday call time is very small compared to network and database 
> (disk!) accesses involved in a pgbench run. On my small laptop,  
> gettimeofday takes about 0.00000002 seconds (0.02 µs -- micro seconds) 
> per call, which is typically under 1/1000 of the fastest local-socket  
> read-only one-table cache-hit prepared trivial transaction.

Many systems do have cheap gettimeofday().  Many don't.  We recently added
contrib/pg_test_timing in response to the variability in this area.

> I found that running pgbench to get significant results requires long  
> minutes or even hours because of warmup time. Running a command for 30  
> minutes without any feedback is annoying. The second point is that the  
> feedback would help user notice that the figures evolve slowly but  
> significantly, and realise that their too short runs are not significant  
> at all. So it seems to me that it should be the desirable behavior, and  
> --progress=0 is always possible anyway for the very performance  
> (over-)conscious tester.

Those benefits aren't compelling enough to counterbalance the risk of
gettimeofday() overhead affecting results.  (Other opinions welcome.)  For a
tool like pgbench that requires considerable skill to use effectively,
changing a default only helps slightly.  It doesn't take much of a risk to
make us better off leaving the default unchanged.

>>  Patch (2): Make --initialize mode respect --progress.
>>
>> The definition you've chosen for --quiet makes that option contrary to its own
>> name: message-per-100k-tuples is typically more voluminous than your proposed
>> new default of message-per-5s.  In fact, since --quiet currently switches from
>> message-per-100k-tuples to message-per-5s, your patch gives it the exact
>> opposite of its present effect.
>>
>> During the 9.3 development cycle, we _twice_ made changes pertaining to
>> --initialize message frequency:
>>
>> http://www.postgresql.org/message-id/flat/20120620.170427.347012755716399568.t-ishii@sraoss.co.jp
>> http://www.postgresql.org/message-id/flat/1346472039.18010.10.camel@vanquo.pezone.net
>>
>> That gives me pause about working through yet another change; we keep burning
>> time on this minor issue.
>
> I totally agree that this "quiet" is not convincing!
>
> My actual opinion is that quiet should just mean quiet:-), i.e. no  
> progress report.
>
> I tried to preserve the row-counting behavior because I thought that  
> someone would object otherwise, but I would be really in favor of 
> dropping the row-counting report behavior altogether and only keep the 
> time triggered report.

I would be fine with dropping the row-counting behavior.  But why subject this
distant side issue to its third round of bikeshedding in 18 months?

>>  Patch (4): Redefine "latency" as reported by pgbench and report "lag" more.
>>
>> If we do this, should we also augment the --log output to contain the figures
>> necessary to reconstruct this calculation?  I think that would mean adding
>> fields for the time when the first command of the transaction was sent.
>
> Yep, I thought about it, but decided against for the following reason: 
> the lag measure is only significant under throttling, which is not the 
> typical use case for pgbench. When there is no throttling, there is no 
> lag, so nothing to report. Also, having two formats would mean that 
> existing tool that analyse these logs would have to deal with both.

Fair enough.

>>>  - Take thread start time at the beginning of the thread.
>>>
>>>    Otherwise it includes pretty slow thread/fork system start times in
>>>    the measurements. May help with bug #8326.
>>
>>  Patch (5)
>>
>> That theory is sound, but I would like at least one report reproducing that
>> behavior and finding that this change improved it.
>
> It was really annoying when I was debugging pgbench, because I did not  
> know whether flukes I was seeing were due to this issue or elsewhere.  
> Having the start time taken when the test really starts help rule out 
> this as a potential issue for strange figures on short runs.
>
> The bug report I mentionned was a pretty short test with a *lot* of  
> threads which took possibly several seconds to be started, but I cannot  
> run that on my small laptop.

I can reproduce the report on low-spec machines, including a 1-GiB VM with "-c
100 -j 100" instead of "-c 800 -j 100".  Your patch does not dramatically
change the spread between "including connections establishing" and "excluding
connections establishing".  Thinking about it, that's unsurprising.  The
thrust of this change is to not count client thread creation at all, as
opposed to counting it as part of connection establishment.  Establishing
database connections is more expensive than creating threads, so it will
remain the dominant cost.  Whether thread creation should be accounted
therewith seems arbitrary, so I'm inclined to leave it alone.

I'll post some further detail to the pgsql-bugs thread.

>>>  - Updated documentation & help message.
>>
>> Include any relevant documentation and --help updates with the patch
>> necessitating them.  If there are errors in the documentation today, put fixes
>> for those in a separate patch (7).
>
> I could do a separate patch for a one-liner doc update for homogeneity...

That's how we often do things.  Thanks.

>> Additionally, this patch contains numerous whitespace-only changes.
>
> Hmmm. This is emacs fault:-) I'll (try) to remove them.

Patch credit: "Emacs, with some tweaks by Fabien".

>>> Sample output under benchmarking with --progress=1
>>
>>>   progress: 36.0 s, 10.9 tps, 91.864 +- 124.874 ms lat
>>>   progress: 37.0 s, 115.2 tps, 8.678 +- 1.792 ms lat
>>
>> "x +- y" does not convey that y is a standard deviation.
>
> I'm refering to the ± sign, but in ascii because I doubt that I can use a 
> UTF character:-) From http://en.wikipedia.org/wiki/Plus-minus_sign:
>
>   "In experimental sciences, the sign commonly indicates the confidence
>   interval or error in a measurement, often the standard deviation or
>   standard error."
>
> ISTM that "pgbench" is experimental science.

Use of +-y as an abbreviation for σ=y seems defensible when the distribution
has less skew.  The "91.864 +- 124.874" quoted above need not imply negative
latency measurements, yet a plain reading of +- would imply just that.  (I
rarely read work in experimental science fields, so this objection could be
out of touch with widespread practice.)

>> I suggest getting the term "stddev" in there somehow, maybe like this:
>>
>>  progress: 37.0 s, 115.2 tps, latency avg=8.678 ms  stddev=1.792
>
> My issue is to try to keep the line width under control so as to avoid  
> line breaks in the terminal. Under throttling, there is also the time lag 
> appended to the line.
>
> Moreover, using 'xxx=figure" breaks simple "cut" pipelining to extract 
> the figures, so I would prefer to stick to spaces.
>
> Maybe:
>
>   progress: 36.0 s, 115.2 tps, lat avg 9.678 ms stddev 1.792, lag 0.143 ms

Works for me.

>>> -        int64 wait = (int64)
>>> -            throttle_delay * -log(getrand(thread, 1, 1000)/1000.0);
>>> +        int64 wait = (int64) (throttle_delay *
>>> +            1.00055271703 * -log(getrand(thread, 1, 10000)/10000.0));
>>
>> This quantizes the random number, then dequantizes it, then corrects for the
>> resolution lost in the round trip.
>
> Yep!
>
>>  Shouldn't it just be:
>>
>>         int64 wait = (int64) (throttle_delay *
>>             -log(1 - pg_erand48(thread->random_state)));
>
> It could. However with this approach the inserted delay would be pretty  
> much unbounded. The benefit of using 10000 is that the delay is bounded 
> by about 10 times the expected average rate, so although it is throttled 
> by a stochastic process, it stays quite active and steady. Basically 
> using the quantization seemed a good compromise to me...

Ah; that makes sense.  Then I understand why you want to remove the bias, but
why do you also increase the upper bound?

>>> -        if ((!use_quiet) && (j % 100000 == 0))
>>> +        if (!progress && (j % 100000 == 0))
>>
>> Project style is "progress == 0" rather than "!progress".
>
> I could. I used "!progress" because "progress == 0" is interpreted as "no 
> progress", and not "progress with value zero", which would mean a report  
> at every possible time.

That's a good argument for doing it the way you did it.

>>> @@ -2143,8 +2174,15 @@ printResults(int ttype, int normal_xacts, int nclients,
>>> [...]
>> This output only happens when using --rate.  That has made sense  
>> heretofore, because in the absence of rate limiting, average latency 
>> can be trivially calculated from tps. Since we will now track stddev as 
>> well, let's print something like this even when --rate is not in use.
>
> It is also printed without --rate. There is a "if" above because there is 
> one report with "lag" (under --rate), and one without.

The code I quoted is for the final report in printResults(), and that only
shows latency mean/stddev when using --rate.  The progress reporting in
threadRun() does have two variations as you describe.

>>> -                float tps, total_run, latency;
>>> +                double tps, total_run, latency, sqlat, stdev, lag;
>>
>> Why these variable type changes?
>
> Because C computes stuff in double anyway (all maths functions such as  
> sqrt above are about double), and it was an oversight from my part to 
> have use float in the first place in the previous patch. Thus I fixed it.

Okay.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: pgbench progress report improvements

From

Fabien COELHO

Date:

22 September 2013, 08:09:03

Dear Noah,

Thanks for your answers and remarks,

[...]

I'll split some part of the patch where there is no coupling, but I do 
not want to submit conflicting patches.

> Those benefits aren't compelling enough to counterbalance the risk of
> gettimeofday() overhead affecting results.  (Other opinions welcome.)

Yep. If I really leave gettimeofday out, I cannot measure the stddev, so 
I'll end up with:
 --rate    => gettimeofday, mean (& stddev) measured, because they    cannot be derived otherwise.
 no --rate => no (or less) gettimeofday, mean computed from total time and    no stddev report because it cannot be
computed.

That is annoying, because I do want the standard deviation, and this mean 
one "if"s complexity here and there.

ISTM that when running under a target time, the (hopefully very small) 
overhead is only one gettimeofday call, because the other one is taken 
anyway to check whether the thread should stop.

Or I can add a yet another option, say --stddev, to ask for standard 
deviation, which will imply the additional gettimeofday call...

> For a tool like pgbench that requires considerable skill to use 
> effectively, changing a default only helps slightly.  It doesn't take 
> much of a risk to make us better off leaving the default unchanged.

I can put a 0 default... but even very experienced people will be bitten 
over and over. Why should I care? ISTM that the default should be the
safe option, and experienced user can request "-quiet" if they want it.

>> [...]
>> I tried to preserve the row-counting behavior because I thought that
>> someone would object otherwise, but I would be really in favor of
>> dropping the row-counting report behavior altogether and only keep the
>> time triggered report.
>
> I would be fine with dropping the row-counting behavior.  But why subject this
> distant side issue to its third round of bikeshedding in 18 months?

I was not involved:-)

The 100000 behavior is the initial & old version, and only applies to 
initialization. Someone found it too verbose when scaling, and I agree, so 
made a quick patch which preserves the old behavior (someone must have 
said: whatever, do not change the default!) but allowed to switch to a 
less noisy version, hence the -quiet which is not really quiet. This would 
be the defective result of a compromise:-)

If I follow your request not to change any default, I cannot merge cleanly 
the -i & bench behaviors, as currenty -i does have a default progress 
report and its own -quiet, and benchmarking does not.

The current behavior is inconsistent. I would prefer something consistent, 
preferably always show a progress report, and -quiet hides it (fully), or 
if you really insist no progress report, and --progress shows it, and the 
-quiet option is removed.

>>>>  - Take thread start time at the beginning of the thread.
>>>
>>> That theory is sound, but I would like at least one report reproducing that
>>> behavior and finding that this change improved it.
>
> [...] so I'm inclined to leave it alone.

I really spent *hours* debugging stupid measures on the previous round of 
pgbench changes, when adding the throttling stuff. Having the start time 
taken when the thread really starts is just sanity, and I needed that just 
to rule out that it was the source of the "strange" measures.

-j 800 vs -j 100 : ITM that if I you create more threads, the time delay 
incurred is cumulative, so the strangeness of the result should worsen.
800 threads ~ possibly 10 seconds of creation time, to be compared to a 
few seconds of run time.

>>>  Shouldn't it just be:
>>>
>>>         int64 wait = (int64) (throttle_delay *
>>>             -log(1 - pg_erand48(thread->random_state)));
>> [...]
> Ah; that makes sense.  Then I understand why you want to remove the bias, but
> why do you also increase the upper bound?

Because the bias was significantly larger for 1000 (about 0.5%), so this 
also contributed to reduce said bias, and 9.2 times the average target 
seems as reasonnable a bound as 6.9.


>> It is also printed without --rate. There is a "if" above because there is
>> one report with "lag" (under --rate), and one without.
>
> The code I quoted is for the final report in printResults(), and that only
> shows latency mean/stddev when using --rate.  The progress reporting in
> threadRun() does have two variations as you describe.

Indeed, I took it for the progress report. I'll check. It must be 
consistent whether under --rate or not.

-- 
Fabien.

Re: pgbench progress report improvements

From

Fabien COELHO

Date:

22 September 2013, 10:07:22

>>> It is also printed without --rate. There is a "if" above because there is
>>> one report with "lag" (under --rate), and one without.
>> 
>> The code I quoted is for the final report in printResults(), and that only
>> shows latency mean/stddev when using --rate.  The progress reporting in
>> threadRun() does have two variations as you describe.
>
> Indeed, I took it for the progress report. I'll check. It must be consistent 
> whether under --rate or not.

I checked.

The current status is that stddev is currently only available under 
--progress or --rate, and displayed in the final report if --rate.

I can add stddev in the final report *if* progress was on, but not 
otherwise.

This means that under standard benchmarking (no --rate and without 
progress if it is not enabled by default) stddev cannot be shown.

This is consistent with your requirement that gettimeofday calls must be 
avoided, but results in this necessary inconsistency in reporting results.

The alternative is to always measure, but that would imply to always call 
gettimeofday, and I understood that it is a nogo for you. I think that 
your performance concerns are needless:

I confirm a 50,000,000 gettimeofday calls/s on my laptop. On a desktop 
workstation I have 43 millions calls/s. The slowest I have found on an old 
server is 3.3 millions calls/s, that is 0.3 µs per call.

Even so, this is to be compared to the fastest possible transaction 
latency I got which is about 140 µs (one local prepared in-memory table 
select on my laptop), so the overhead 1/450 of the minimum possible 
transaction time, pretty negligeable in my opinion. For transaction times 
which involve disk accesses, the latency is in ms and the overhead well 
under 1/1000.

There are some measures about gettimeofday overhead reported in:
  http://www.atl.external.lmco.com/projects/QoS/POSIX_html/index.html

The worst figure is for a Pentium 166 MHz (!), the overhead is 86.6 µs. 
However typical values are around or under 1 µs, even for old Pentium II 
350 MHz PCs, and this is less that 1% of our minimum possible transaction 
time.

Note also that under option --report-latencies, the number of gettimeofday 
calls is pretty large, and nobody complained.

So I'm of the opinion that we should not worry at all about the number of 
gettimeofday calls in pgbench.

-- 
Fabien.

Re: pgbench progress report improvements - split 1

From

Fabien COELHO

Date:

22 September 2013, 18:44:19

Split 1 of the initial submission.

pgbench: minor update of documentation & help message.

Use NUM in help message for homogeneity with other options. The target 
*start* time of the transaction is set by the stochastic process which is 
doing the throttling (--rate), not the end time.

-- 
Fabien.

Re: pgbench progress report improvements - split 2

From

Fabien

Date:

22 September 2013, 18:47:06

Split 2 of the initial submission

pgbench: reduce and compensate throttling underestimation bias.

This is a consequence of relying on an integer random generator,
which allow to ensure that delays inserted stay reasonably in
range of the target average delay.

The bias was about 0.5% with 1000 distinct values. Multiplier added
to compensate the 0.05% bias with 10000 distinct integer values,
so there is no bias now.

-- 
Fabien.

Re: pgbench progress report improvements - split 3

From

Fabien

Date:

22 September 2013, 18:58:47

Split 3 of the initial submission, which actually deal with data measured
and reported on stderr under various options.

This version currently takes into account many comments by Noah Mish. In
particular, the default "no report" behavior under benchmarking is not
changed, although I really think that it should. Also, the standard
deviation is only shown when available, which is not in under all
settings, because of concerns expressed about the cost of additional calls
to gettimeofday. ISTM that these concerned are misplaced in this
particular case.

Improve pgbench measurements & progress report

These changes are coupled because measures are changed, and their
reporting as well. Submitting separate patches for these interlinked
features would result in conflicting or dependent patches, so I wish to
avoid that. Also it would require more reviewer time.
- Use same progress and quiet options both under init & bench.
The reporting is every 5 seconds for initialization, and currently no report for benchmarking.
I strongly suggest to change this default to 5 seconds for both, if people do not veto any change of the default
behavior... The rational is that benchmarking is error prone as it must run a long time to be significant because of
warmupeffects (esp on HDD, less true on SSD). Seeing how the current performance evolve would help pgbench users
realisethat. See down below a sample output.

The previous progress report under initialization, which printed a line every 100,000 rows, is removed, as it is
muchto verbose for most hardware, and pretty long for any significant scale.

- Measure transaction latency instead of computing it, for --rate and --progress.
The previous computed figure does not make sense under throttling: as sleep throttling time was included in the
figures,the displayed results were plain false.

The latency and its standard deviation are printed out under progress and in the final report when available.
It could be made "always" available, but that would require to accept additional gettimeofday calls. I do not
thinkthat there is a performance issue here, but there is no concensus yet.

- Take thread start time at the beginning of the thread (!)
Otherwise it includes pretty slow thread/fork system start times in the measurements. May help with bug #8326.
Thisalso helps with throttling, as the start time is used to adjust the rate: if it is not the actual start time,
thereis a bit of a rush at the beginning in order to catch up. Taking the start time when the thread actually starts
is the sane thing to do to avoid surprises at possibly strange measures on short runs.

Sample output under initialization with --progress=1
creating tables... 1126000 of 3000000 tuples (37%) done (elapsed 1.00 s, remaining 1.67 s). 2106000 of 3000000
tuples(70%) done (elapsed 2.00 s, remaining 0.85 s). 2824000 of 3000000 tuples (94%) done (elapsed 3.00 s, remaining
0.19s). 3000000 of 3000000 tuples (100%) done (elapsed 3.19 s, remaining 0.00 s). vacuum... set primary keys...
done.

Sample output under benchmarking with --progress=1
progress: 1.0 s, 2626.1 tps, 0.374 stddev 0.597 ms lat progress: 2.0 s, 2766.6 tps, 0.358 stddev 0.588 ms lat
progress:3.0 s, 2567.4 tps, 0.385 stddev 0.665 ms lat progress: 4.0 s, 3014.2 tps, 0.328 stddev 0.593 ms lat
progress:5.0 s, 2959.3 tps, 0.334 stddev 0.553 ms lat ... progress: 16.0 s, 5009.2 tps, 0.197 stddev 0.381 ms lat
... progress: 24.0 s, 7051.2 tps, 0.139 stddev 0.284 ms lat ... progress: 50.0 s, 6860.5 tps, 0.143 stddev 0.052 ms
lat ...

The "warmup" is quite fast because the DB is on a SSD. In the beginning
the standard deviation is well over the average transaction time, but
when the steady state is reached (later) it is much below.

--
Fabien.

Re: pgbench progress report improvements - split 3

From

Josh Berkus

Date:

22 September 2013, 19:21:55

Fabien,

As long as you're hacking pgbench output, what about offering a JSON
option instead of the text output?  I'm working on automating pgbench
performance testing, and having the output in a proper delimited format
would be really helpful.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pgbench progress report improvements - split 3

From

Fabien COELHO

Date:

23 September 2013, 05:44:36

Hello Josh,

> As long as you're hacking pgbench output, what about offering a JSON
> option instead of the text output?  I'm working on automating pgbench
> performance testing, and having the output in a proper delimited format
> would be really helpful.

I'm more a "grep | cut | ..." person, but I do understand your point about 
automation. My favorite hack is to load benchmark data into sqlite and 
perform query to build data extract for gnuplot figures.

So I can put that in the stack of things to do when I have some time.

Due to the possibly repetitive table structure of the data, maybe CSV 
would make sense as well. It is less extensible, but it is directly usable 
by sqlite or pg.

-- 
Fabien.

Re: pgbench progress report improvements - split 3

From

Peter Eisentraut

Date:

23 September 2013, 12:06:12

On 9/22/13 2:58 PM, Fabien wrote:
> Split 3 of the initial submission, which actually deal with data
> measured and reported on stderr under various options.

It seems this patch doesn't apply.  Does it need the first two applied
first?

Re: pgbench progress report improvements - split 3

From

Fabien COELHO

Date:

23 September 2013, 12:26:10

Dear Peter,

>> Split 3 of the initial submission, which actually deal with data
>> measured and reported on stderr under various options.
>
> It seems this patch doesn't apply.  Does it need the first two applied
> first?

Oops. Indeed. The patch is fully independent of the two others. It was 
generated with "git diff -b master", because I was told not to change 
spacing, but then it must be applied with "patch -l -p1".

Please find attached a new version without this requirement.
 postgresql> git branch test postgresql> git checkout test Switched to branch 'test' postgresql> patch -p1 <
../pgbench-measurements-v4.patchpatching file contrib/pgbench/pgbench.c patching file doc/src/sgml/pgbench.sgml
 

-- 
Fabien.

Re: pgbench progress report improvements

From

Robert Haas

Date:

23 September 2013, 19:34:12

On Sat, Sep 21, 2013 at 4:55 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>>>  - Use progress option both under init & bench.
>>>
>>>    Activate progress report by default, every 5 seconds.
>>>    When initializing, --quiet reverts to the old every 100,000 insertions
>>>    behavior...
>>
>>  Patch (1): Change the default from --progress=0 to --progress=5
>>
>> This has a disadvantage of causing two extra gettimeofday() calls per
>> transaction.  That's not cheap on all platforms; a user comparing pgbench
>> results across versions will need to make a point of disabling this again to
>> make his results comparable. Given that threat and uncertainty about which
>> default would be more popular, I think we should drop this part.
>
> The gettimeofday call time is very small compared to network and database
> (disk!) accesses involved in a pgbench run. On my small laptop, gettimeofday
> takes about 0.00000002 seconds (0.02 µs -- micro seconds) per call, which is
> typically under 1/1000 of the fastest local-socket read-only one-table
> cache-hit prepared trivial transaction.
>
> This is a little different when initializing, but I put a guard on
> gettimeofday in that case.
>
> If this is a blocker, I can put 0, but I really do not think it is necessary
> because of the performance impact.

I really don't think it's a good idea to change the default behavior.
We've had many discussions about how the overhead of gettimeofday()
can sometimes be surprisingly large; I don't think we should assume
it's guaranteed to be small in this case.  Also, changing established
defaults will annoy users who like to present defaults; I don't see
any reason to assume that your preferences will be universal.  In
doubtful cases we should favor leaving the behavior the way it's been
in previous releases, because backward-compatibility is very
desirable.

>>  Patch (2): Make --initialize mode respect --progress.
>>
>> The definition you've chosen for --quiet makes that option contrary to its
>> own
>> name: message-per-100k-tuples is typically more voluminous than your
>> proposed
>> new default of message-per-5s.  In fact, since --quiet currently switches
>> from
>> message-per-100k-tuples to message-per-5s, your patch gives it the exact
>> opposite of its present effect.
>>
>> During the 9.3 development cycle, we _twice_ made changes pertaining to
>> --initialize message frequency:
>>
>> http://www.postgresql.org/message-id/flat/20120620.170427.347012755716399568.t-ishii@sraoss.co.jp
>>
>> http://www.postgresql.org/message-id/flat/1346472039.18010.10.camel@vanquo.pezone.net
>>
>> That gives me pause about working through yet another change; we keep
>> burning
>> time on this minor issue.
>
> I totally agree that this "quiet" is not convincing!
>
> My actual opinion is that quiet should just mean quiet:-), i.e. no progress
> report.
>
> I tried to preserve the row-counting behavior because I thought that someone
> would object otherwise, but I would be really in favor of dropping the
> row-counting report behavior altogether and only keep the time triggered
> report.

-1 for changing this again.  Frankly, I might have come down in a
different place if I had understood exactly how this was going to end
up being commited; it ended up being quite different from what I was
expecting.  But I really think that relitigating this just so we can
break backward compatibility again one release later is not a good use
of anyone's time, or a good idea in general.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pgbench progress report improvements - split 1

From

Noah Misch

Date:

23 September 2013, 20:08:04

On Sun, Sep 22, 2013 at 08:44:08PM +0200, Fabien COELHO wrote:
> pgbench: minor update of documentation & help message.
>
> Use NUM in help message for homogeneity with other options. The target  
> *start* time of the transaction is set by the stochastic process which is 
> doing the throttling (--rate), not the end time.

Committed.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: pgbench progress report improvements - split 2

From

Noah Misch

Date:

23 September 2013, 20:39:22

On Sun, Sep 22, 2013 at 08:46:55PM +0200, Fabien wrote:
> pgbench: reduce and compensate throttling underestimation bias.
>
> This is a consequence of relying on an integer random generator,
> which allow to ensure that delays inserted stay reasonably in
> range of the target average delay.
>
> The bias was about 0.5% with 1000 distinct values. Multiplier added
> to compensate the 0.05% bias with 10000 distinct integer values,
> so there is no bias now.

> --- a/contrib/pgbench/pgbench.c
> +++ b/contrib/pgbench/pgbench.c
> @@ -929,13 +929,17 @@ top:
>           * that the series of delays will approximate a Poisson distribution
>           * centered on the throttle_delay time.
>                   *
> -                 * 1000 implies a 6.9 (-log(1/1000)) to 0.0 (log 1.0) delay multiplier.
> +         * 10000 implies a 9.2 (-log(1/10000)) to 0.0 (log 1) delay multiplier,
> +         * and results in a 0.055 % target underestimation bias:
> +         *
> +         * SELECT 1.0/AVG(-LN(i/10000.0)) FROM generate_series(1,10000) AS i;
> +         * = 1.000552717032611116335474
>           *
>           * If transactions are too slow or a given wait is shorter than
>           * a transaction, the next transaction will start right away.
>           */
> -        int64 wait = (int64)
> -            throttle_delay * -log(getrand(thread, 1, 1000)/1000.0);
> +        int64 wait = (int64) (throttle_delay *
> +            1.00055271703 * -log(getrand(thread, 1, 10000)/10000.0));
>  
>          thread->throttle_trigger += wait;
>  

On Sun, Sep 22, 2013 at 10:08:54AM +0200, Fabien COELHO wrote:
>> Then I understand why you want to remove the bias, but
>> why do you also increase the upper bound?
>
> Because the bias was significantly larger for 1000 (about 0.5%), so this  
> also contributed to reduce said bias, and 9.2 times the average target  
> seems as reasonnable a bound as 6.9.

The magnitude of the bias is unimportant if known exactly, as you do here.
Smaller quanta do give you more possible distinct sleep durations, which is a
nice bonus.  What I'm hearing is that you don't especially want to change the
upper bound, but changing it was an acceptable side effect.  However, we can
meet all those goals simultaneously with simpler code, can we not?
    int64 wait = (int64) (throttle_delay *        Min(7.0, -log(1 - pg_erand48(thread->random_state))));

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: pgbench progress report improvements - split 2

From

Fabien COELHO

Date:

24 September 2013, 18:42:41

Hello Noah,

> meet all those goals simultaneously with simpler code, can we not?
>
>         int64 wait = (int64) (throttle_delay *
>             Min(7.0, -log(1 - pg_erand48(thread->random_state))));

If you truncate roughly the multipler, as it is done here with "min", you 
necessarily create a bias, my random guess would be a 0.5% under 
estimation, but maybe it is more... Thus you would have to compute and the 
correcting factor as well. Its computation is a little bit less easy than 
with the quantized formula where I just used a simple SUM, and you have to 
really do the maths here. So I would keep the simple solution, but it is 
fine with me if you do the maths!

-- 
Fabien.

Re: pgbench progress report improvements

From

Fabien COELHO

Date:

24 September 2013, 19:47:25

Dear Robert,

>> (1) ...
> I really don't think it's a good idea to change the default behavior. 
> We've had many discussions about how the overhead of gettimeofday() can 
> sometimes be surprisingly large; I don't think we should assume it's 
> guaranteed to be small in this case.
>
> Also, changing established defaults will annoy users who like to 
> present defaults; I don't see any reason to assume that your preferences 
> will be universal.  In doubtful cases we should favor leaving the 
> behavior the way it's been in previous releases, because 
> backward-compatibility is very desirable.

I argued in another mail precisely, with worst figures found on the net, 
about the relative overhead of gettimeofday as used by pgbench, which is 
also handling network traffic and waiting for the database to perform 
actual transactions. I do not thing that I'm "assuming", and I'm trying to 
argument with objective data.

Anyway, this particular behavior is preserved by the current version of 
the patch, so no worry. The additional gettimeofday is *not* perform under 
standard "silent" benchmarking, and the standard deviation of the latency 
is not measured, because it can't. It is however measured under --rate and 
--progress. It is necessary for rate because otherwise the computed 
latency is false. It is done for progress because if you are interested to 
know what is going on, I assume that it would make sense to provide this 
data.

I must admit that I think, un-universaly, that people should care to know 
that their transaction latency can vary by several order of magnitudes, 
but this opinion is clearly not shared.

>> I tried to preserve the row-counting behavior because I thought that 
>> someone would object otherwise, but I would be really in favor of 
>> dropping the row-counting report behavior altogether and only keep the 
>> time triggered report.

> -1 for changing this again.
> Frankly, I might have come down in a different place if I had understood 
> exactly how this was going to end up being committed; it ended up being 
> quite different from what I was expecting.  But I really think that 
> relitigating this just so we can break backward compatibility again one 
> release later is not a good use of anyone's time, or a good idea in 
> general.

The current status on my small (SSD) laptop is that "pgbench -i" throws 
about 10 lines of 100,000-rows progress report per second. I must be a 
slow reader because I can't read that fast. So either other users can read 
much faster than me, or they have slower computers:-)

ISTM that it is no big deal to change this kind of thing on a minor 
contrib tool of postgresql if it is reasonnably better and useful, and I 
would be surprise and seeing protests about changing an unreadable flow to 
a readable one.

Anyway, let us keep this default behavior, as I hope there must be people 
who like it. Well, if anyone could tell me that he/she likes better having 
10 lines a second thrown on the screen than a regular progress report 
every few seconds, I would feel less stupid at reinstating this behavior 
and re-submitting a new version of the patch.

-- 
Fabien.

Re: pgbench progress report improvements - split 3

From

Josh Berkus

Date:

24 September 2013, 20:40:29

On 09/22/2013 10:44 PM, Fabien COELHO wrote:
> Due to the possibly repetitive table structure of the data, maybe CSV
> would make sense as well. It is less extensible, but it is directly
> usable by sqlite or pg.

I'd be OK with CSV.  At least I wouldn't be regexing the output.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pgbench progress report improvements - split 3 v2

From

Fabien COELHO

Date:

24 September 2013, 20:41:27

> Split 3 of the initial submission, which actually deal with data measured and 
> reported on stderr under various options.
>
> This version currently takes into account many comments by Noah Mish. In 
> particular, the default "no report" behavior under benchmarking is not 
> changed, although I really think that it should. Also, the standard deviation 
> is only shown when available, which is not in under all settings, because of 
> concerns expressed about the cost of additional calls to gettimeofday. ISTM 
> that these concerned are misplaced in this particular case.


Yet another version to fulfill requirements by Robert Haas not to change 
current default behaviors. The initialization progress is reported every 
100-k rows, and there is no progress report for benchmarking.



Improve pgbench measurements & progress report

These changes are coupled because measures are changed, and their
reporting as well. Submitting separate patches for these different
features would result in conflicting or dependent patches, so I
wish to avoid that if possible.
 - Use same progress and quiet options both under init & bench.
   However, the default reporting is NOT changed, as required by   Robert Haas. It is currently every 100k rows when
initializing,  and nothing when benchmarking.
 
   I would suggest to change this default to a few seconds for both.   The 100-k row report is very verbose an
unreadableon good hardware.   For benchmarking, the rational is that it is error prone as it must   run a long time to
besignificant because of warmup effects (esp on HDD,   less true on SSD). Seeing how the current performance evolve
would  help pgbench users realise that. See down below a sample output.
 
 - Measure transaction latency instead of computing it,   for --rate and --progress.
   The previous computed figure does not make sense under throttling:   as sleep throttling time was included in the
figures,the displayed   results were plain false.
 
   The latency and its standard deviation are printed out under progress   and in the final report when available.
   It could be made "always" available, but that would require to accept   additional gettimeofday calls. I do not
thinkthat there is a   performance issue here, but there is significant opposition to the idea.
 
 - Take thread start time at the beginning of the thread (!)
   Otherwise it includes pretty slow thread/fork system start times in   the measurements. May help with bug #8326.
Thisalso helps with throttling,   as the start time is used to adjust the rate: if it is not the actual   start time,
thereis a bit of a rush at the beginning in order to   catch up. Taking the start time when the thread actually starts
is  the sane thing to do to avoid surprises at possibly strange measures   on short runs.
 

Sample output under initialization with --progress=1
  creating tables...  1126000 of 3000000 tuples (37%) done (elapsed 1.00 s, remaining 1.67 s).  2106000 of 3000000
tuples(70%) done (elapsed 2.00 s, remaining 0.85 s).  2824000 of 3000000 tuples (94%) done (elapsed 3.00 s, remaining
0.19s).  3000000 of 3000000 tuples (100%) done (elapsed 3.19 s, remaining 0.00 s).  vacuum...  set primary keys...
done.

Sample output under benchmarking with --progress=1
  progress: 1.0 s, 2626.1 tps, 0.374 stddev 0.597 ms lat  progress: 2.0 s, 2766.6 tps, 0.358 stddev 0.588 ms lat
progress:3.0 s, 2567.4 tps, 0.385 stddev 0.665 ms lat  progress: 4.0 s, 3014.2 tps, 0.328 stddev 0.593 ms lat
progress:5.0 s, 2959.3 tps, 0.334 stddev 0.553 ms lat  ...  progress: 16.0 s, 5009.2 tps, 0.197 stddev 0.381 ms lat
... progress: 24.0 s, 7051.2 tps, 0.139 stddev 0.284 ms lat  ...  progress: 50.0 s, 6860.5 tps, 0.143 stddev 0.052 ms
lat ...
 

The "warmup" is quite fast because the DB is on a SSD. In the beginning
the standard deviation is well over the average transaction time, but
when the steady state is reached (later) it is much below.

-- 
Fabien.

Re: pgbench progress report improvements - split 2

From

Noah Misch

Date:

26 September 2013, 03:14:20

On Tue, Sep 24, 2013 at 08:42:15PM +0200, Fabien COELHO wrote:
>> meet all those goals simultaneously with simpler code, can we not?
>>
>>         int64 wait = (int64) (throttle_delay *
>>             Min(7.0, -log(1 - pg_erand48(thread->random_state))));
>
> If you truncate roughly the multipler, as it is done here with "min", you 
> necessarily create a bias, my random guess would be a 0.5% under  
> estimation, but maybe it is more... Thus you would have to compute and 
> the correcting factor as well. Its computation is a little bit less easy 
> than with the quantized formula where I just used a simple SUM, and you 
> have to really do the maths here. So I would keep the simple solution, 
> but it is fine with me if you do the maths!

Ah, true; I guess each approach has different advantages.  I've committed your
latest version.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: pgbench progress report improvements - split 3 v2

From

Noah Misch

Date:

26 September 2013, 03:49:12

On Tue, Sep 24, 2013 at 10:41:17PM +0200, Fabien COELHO wrote:
> These changes are coupled because measures are changed, and their
> reporting as well. Submitting separate patches for these different
> features would result in conflicting or dependent patches, so I
> wish to avoid that if possible.

My feelings on the patch split haven't changed; your three bullet points call
for four separate patches.  Conflicting patches are bad, but dependent patches
are okay; just disclose the dependency order.  How about this: as a next step,
please extract just this feature that I listed last Saturday:
 Patch (4): Redefine "latency" as reported by pgbench and report "lag" more.

Once that's committed, we can move on to others.

nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: pgbench - exclude pthread_create() from connection start timing

From

Noah Misch

Date:

26 September 2013, 04:20:24

Concerning one of the eventual would-be split patches...

On Tue, Sep 24, 2013 at 10:41:17PM +0200, Fabien COELHO wrote:
>  - Take thread start time at the beginning of the thread (!)
>
>    Otherwise it includes pretty slow thread/fork system start times in
>    the measurements. May help with bug #8326. This also helps with throttling,
>    as the start time is used to adjust the rate: if it is not the actual
>    start time, there is a bit of a rush at the beginning in order to
>    catch up. Taking the start time when the thread actually starts is
>    the sane thing to do to avoid surprises at possibly strange measures
>    on short runs.

On Sun, Sep 22, 2013 at 10:08:54AM +0200, Fabien COELHO wrote:
> I really spent *hours* debugging stupid measures on the previous round of 
> pgbench changes, when adding the throttling stuff. Having the start time  
> taken when the thread really starts is just sanity, and I needed that 
> just to rule out that it was the source of the "strange" measures.

I don't get it; why is taking the time just after pthread_create() more sane
than taking it just before pthread_create()?  We report separate TPS values
for "including connections establishing" and "excluding connections
establishing" because establishing the database connection is expensive,
typically far more expensive than pthread_create().  The patch for threaded
pgbench made the decision to account for pthread_create() as though it were
part of establishing the connection.  You're proposing to not account for it
all.  Both of those designs are reasonable to me, but I do not comprehend the
benefit you anticipate from switching from one to the other.

> -j 800 vs -j 100 : ITM that if I you create more threads, the time delay  
> incurred is cumulative, so the strangeness of the result should worsen.

Not in general; we do one INSTR_TIME_SET_CURRENT() per thread, just before
calling pthread_create().  However, thread 0 is a special case; we set its
start time first and actually start it last.  Your observation of cumulative
delay fits those facts.  Initializing the thread-0 start time later, just
before calling its threadRun(), should clear this anomaly without changing
other aspects of the measurement.

While pondering this area of the code, it occurs to me -- shouldn't we
initialize the throttle rate trigger later, after establishing connections and
sending startup queries?  As it stands, we build up a schedule deficit during
those tasks.  Was that deliberate?

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: pgbench progress report improvements - split 3 v2

From

Fabien COELHO

Date:

26 September 2013, 08:31:57

> My feelings on the patch split haven't changed; your three bullet points call
> for four separate patches.  Conflicting patches are bad, but dependent patches
> are okay;

Indeed, this is the only solution if you do not want one patch. Note that 
it will not possible to backtrack one of the patch but the last one 
without conflicts.

> just disclose the dependency order.  How about this: as a next step, 
> please extract just this feature that I listed last Saturday:
>
>  Patch (4): Redefine "latency" as reported by pgbench and report "lag" more.
>
> Once that's committed, we can move on to others.

Ok, I'll submit a first part, hopefully today, possibly the one you 
suggest, about fixing and extending latency measure under --rate and 
reporting it under progress.

-- 
Fabien.

Re: pgbench - exclude pthread_create() from connection start timing

From

Fabien COELHO

Date:

26 September 2013, 11:41:08

>> pgbench changes, when adding the throttling stuff. Having the start time
>> taken when the thread really starts is just sanity, and I needed that
>> just to rule out that it was the source of the "strange" measures.
>
> I don't get it; why is taking the time just after pthread_create() more sane
> than taking it just before pthread_create()?

Thread create time seems to be expensive as well, maybe up 0.1 seconds 
under some conditions (?). Under --rate, this create delay means that 
throttling is laging behind schedule by about that time, so all the first 
transactions are trying to catch up.

> typically far more expensive than pthread_create().  The patch for threaded
> pgbench made the decision to account for pthread_create() as though it were
> part of establishing the connection.  You're proposing to not account for it
> all.  Both of those designs are reasonable to me, but I do not comprehend the
> benefit you anticipate from switching from one to the other.
>
>> -j 800 vs -j 100 : ITM that if I you create more threads, the time delay
>> incurred is cumulative, so the strangeness of the result should worsen.
>
> Not in general; we do one INSTR_TIME_SET_CURRENT() per thread, just before
> calling pthread_create().  However, thread 0 is a special case; we set its
> start time first and actually start it last.  Your observation of cumulative
> delay fits those facts.

Yep, that must be thread 0 which has a very large delay. I think it is 
simpler that each threads record its start time when it has started, 
without exception.

>  Initializing the thread-0 start time later, just before calling its 
> threadRun(), should clear this anomaly without changing other aspects of 
> the measurement.

Always taking the thread start time when the thread is started does solve 
the issue as well, and it is homogeneous for all cases, so the solution I 
suggest seems reasonable and simple.

> While pondering this area of the code, it occurs to me -- shouldn't we 
> initialize the throttle rate trigger later, after establishing 
> connections and sending startup queries?  As it stands, we build up a 
> schedule deficit during those tasks.  Was that deliberate?

On the principle, I agree with you.

The connection creation time is another thing, but it depends on the 
options set. Under some options the connection is open and closed for 
every transaction, so there is no point in avoiding it in the measure or 
in the scheduling, and I want to avoid having to distinguish those cases. 
Morover, ISTM that one of the thread reuse the existing connection while 
other recreate is. So I left it "as is".

-- 
Fabien.

Re: pgbench progress report improvements - split 3 v2 - A

From

Fabien COELHO

Date:

26 September 2013, 18:50:29

> Patch (4): Redefine "latency" as reported by pgbench and report "lag" more.

Here is a first partial patch, which focusses on measuring latency and
reporting the measure under --progress.

Improve pgbench measurements & progress report

Measure transaction latency instead under --rate and --progress.

The previous computed figure does not make sense under throttling:
as sleep throttling time was included in the figures, the displayed
results were plain false.

The latency and its standard deviation are printed out under progress and
in the final report when available.

It could be made "always" available, but that would require to accept
additional gettimeofday calls. I do not think that there is a performance
issue here, but there is significant opposition to the idea.

--
Fabien.

Re: pgbench - exclude pthread_create() from connection start timing

From

Noah Misch

Date:

30 September 2013, 22:36:28

On Thu, Sep 26, 2013 at 01:41:01PM +0200, Fabien COELHO wrote:
> >I don't get it; why is taking the time just after pthread_create() more sane
> >than taking it just before pthread_create()?
> 
> Thread create time seems to be expensive as well, maybe up 0.1
> seconds under some conditions (?). Under --rate, this create delay
> means that throttling is laging behind schedule by about that time,
> so all the first transactions are trying to catch up.

threadRun() already initializes throttle_trigger with a fresh timestamp.
Please detail how the problem remains despite that.

> >typically far more expensive than pthread_create().  The patch for threaded
> >pgbench made the decision to account for pthread_create() as though it were
> >part of establishing the connection.  You're proposing to not account for it
> >all.  Both of those designs are reasonable to me, but I do not comprehend the
> >benefit you anticipate from switching from one to the other.
> >
> >>-j 800 vs -j 100 : ITM that if I you create more threads, the time delay
> >>incurred is cumulative, so the strangeness of the result should worsen.
> >
> >Not in general; we do one INSTR_TIME_SET_CURRENT() per thread, just before
> >calling pthread_create().  However, thread 0 is a special case; we set its
> >start time first and actually start it last.  Your observation of cumulative
> >delay fits those facts.
> 
> Yep, that must be thread 0 which has a very large delay. I think it
> is simpler that each threads record its start time when it has
> started, without exception.
> 
> > Initializing the thread-0 start time later, just before calling
> >its threadRun(), should clear this anomaly without changing other
> >aspects of the measurement.
> 
> Always taking the thread start time when the thread is started does
> solve the issue as well, and it is homogeneous for all cases, so the
> solution I suggest seems reasonable and simple.

To exercise the timing semantics before and after your proposed change, I
added a "sleep(1);" before the pthread_create() call.  Here are the results
with and without "-j", with and without pgbench-measurements-v5.patch:

$ echo 'select 1' >test.sql

# just the sleep(1) addition
$ env time pgbench -c4 -t1000 -S -n -f test.sql | grep tps
tps = 6784.410104 (including connections establishing)
tps = 7094.701854 (excluding connections establishing)
0.03user 0.07system 0:00.60elapsed 16%CPU (0avgtext+0avgdata 0maxresident)k

$ env time pgbench -j4 -c4 -t1000 -S -n -f test.sql | grep tps
tps = 1224.327010 (including connections establishing)
tps = 2274.160899 (excluding connections establishing)
0.02user 0.03system 0:03.27elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k

# w/ pgbench-measurements-v5.patch
$ env time pgbench -c4 -t1000 -S -n -f test.sql | grep tps
tps = 6792.393877 (including connections establishing)
tps = 7207.142278 (excluding connections establishing)
0.08user 0.06system 0:00.60elapsed 23%CPU (0avgtext+0avgdata 0maxresident)k

$ env time pgbench -j4 -c4 -t1000 -S -n -f test.sql | grep tps
tps = 1212.040409 (including connections establishing)
tps = 1214.728830 (excluding connections establishing)
0.09user 0.06system 0:03.31elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k

Rather than, as I supposed before, excluding the cost of thread start
entirely, pgbench-measurements-v5.patch has us count pthread_create() as part
of the main runtime.  I now see the cumulative delay you mentioned, but
pgbench-measurements-v5.patch does not fix it.  The problem is centered on the
fact that pgbench.c:main() calculates a single total_time and models each
thread as having run for that entire period.  If pthread_create() is slow,
reality diverges considerably from that model; some threads start notably
late, and other threads finish notably early.  The threadRun() runtime
intervals in the last test run above are actually something like this:

thread 1: 1.0s - 1.3s
thread 2: 2.0s - 2.3s
thread 3: 3.0s - 3.3s
thread 0: 3.0s - 3.3s

Current pgbench instead models every thread as having run 0.0s - 3.3s, hence
the numbers reported.  To make the numbers less surprising, we could axe the
global total_time=end_time-start_time and instead accumulate total_time on a
per-thread basis, just as we now accumulate conn_time on a per-thread basis.
That ceases charging threads for time spent not-yet-running or
already-finished, but it can add its own inaccuracy.  Performance during a
period in which some clients have yet to start is not interchangeable with
performance when all clients are running.  pthread_create() slowness would
actually make the workload seem to perform better.

An alternate strategy would be to synchronize the actual start of command
issuance across threads.  All threads would start and make their database
connections, then signal readiness.  Once the first thread notices that every
other thread is ready, it would direct them to actually start issuing queries.
This might minimize the result skew problem of the first strategy.

A third strategy is to just add a comment and write this off as one of the
several artifacts of short benchmark runs.

Opinions, other ideas?

> >While pondering this area of the code, it occurs to me --
> >shouldn't we initialize the throttle rate trigger later, after
> >establishing connections and sending startup queries?  As it
> >stands, we build up a schedule deficit during those tasks.  Was
> >that deliberate?
> 
> On the principle, I agree with you.
> 
> The connection creation time is another thing, but it depends on the
> options set. Under some options the connection is open and closed
> for every transaction, so there is no point in avoiding it in the
> measure or in the scheduling, and I want to avoid having to
> distinguish those cases.

That's fair enough.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: pgbench - exclude pthread_create() from connection start timing

From

Fabien COELHO

Date:

01 October 2013, 13:37:58

Hello Noah,

>> Thread create time seems to be expensive as well, maybe up 0.1
>> seconds under some conditions (?). Under --rate, this create delay
>> means that throttling is laging behind schedule by about that time,
>> so all the first transactions are trying to catch up.
>
> threadRun() already initializes throttle_trigger with a fresh timestamp.
> Please detail how the problem remains despite that.

Indeed, I did this kludge because I could not rely on the "before fork" 
start time as it was (possibly) creating a "rush" at the beginning of the 
run under --rate.

The version I submitted takes the start time after the thread is created, 
and use it directly for throttling, so the start time is taken once per 
thread and used instead of retaking it because the first one cannot be 
relied on.

> [...]

Fine detailed analysis!

> Opinions, other ideas?

I do not think that there is a clean and simple way to take the start/stop 
period into account when computing global performances of a run. The TPC-C 
benchmark tells to ignore the warmup/closure period, whatever they are, 
and only perform measures within the steady state. However the full graph 
must be provided when the bench is provided.

About better measures: If I could rely on having threads, I would simply 
synchronise the threads at the beginning so that they actually start after 
they are all created, and one thread would decide when to stop and set a 
shared volatile variable to stop all transactions more or less at once. In 
this case, the thread start time would be taken just after the 
synchronization, and maybe only by thread 0 would be enough.

Note that this is pretty standard stuff with threads, ISTM that it would 
solve most of the issues, *but* this is not possible with the "thread fork 
emulation" implemented by pgbench, which really means no threads at all.

A possible compromise would be to do just that when actual threads are 
used, and let it more or less as it is when fork emulation is on...

-- 
Fabien.

Re: pgbench progress report improvements - split 3 v2 - A

From

Noah Misch

Date:

05 October 2013, 22:10:08

On Thu, Sep 26, 2013 at 08:50:15PM +0200, Fabien COELHO wrote:
> > Patch (4): Redefine "latency" as reported by pgbench and report "lag" more.
> 
> Here is a first partial patch, which focusses on measuring latency
> and reporting the measure under --progress.

This patch contains the features pertaining to both hypothetical patches (3)
and (4), not just (4) like I requested.

The sum of the squares of the latencies wraps after 2^63/(10^12 * avg_latency
* nclients) seconds.  That's unlikely to come up with the ordinary pgbench
script, but one can reach it in a few hours when benchmarking a command that
runs for many seconds.  If we care, we can track the figure as a double.  I
merely added a comment about it.

I restored applicable parts of your update to the --progress documentation
from pgbench-measurements-v5.patch.

The patch made output like this:
 progress: 7.2 s, 1.7 tps, 205.225 stddev 3.484 ms lat, 45.472 ms lag

I read that as five facts: 7.2 s 1.7 tps 205.225 stddev 3.484 ms lat 45.472 ms lag

That was a wrong reading; 205.225 is the latency average and 3.484 is the
latency standard deviation.  Let's be consistent about the placement of labels
relative to their figures.  Upthread, had you proposed this format:
 progress: 36.0 s, 115.2 tps, lat avg 9.678 ms stddev 1.792, lag 0.143 ms

I switched to that, except that I removed the word "avg" to save horizontal
space and since lag is also an average though not labelled as such.

> +        printf("latency average: %.3f ms\n",
> +               1000.0 * duration / normal_xacts);

I incorporated the "nclients" factor needed here.

Committed with those changes.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: pgbench - exclude pthread_create() from connection start timing

From

Noah Misch

Date:

05 October 2013, 23:14:23

On Tue, Oct 01, 2013 at 02:50:29PM +0200, Fabien COELHO wrote:
> I do not think that there is a clean and simple way to take the
> start/stop period into account when computing global performances of
> a run. The TPC-C benchmark tells to ignore the warmup/closure
> period, whatever they are, and only perform measures within the
> steady state. However the full graph must be provided when the bench
> is provided.

That makes sense to me.  "pgbench --progress" and "pgbench --log
--aggregate-interval" are good tools for excluding non-steady-state periods.

> About better measures: If I could rely on having threads, I would
> simply synchronise the threads at the beginning so that they
> actually start after they are all created, and one thread would
> decide when to stop and set a shared volatile variable to stop all
> transactions more or less at once. In this case, the thread start
> time would be taken just after the synchronization, and maybe only
> by thread 0 would be enough.
>
> Note that this is pretty standard stuff with threads, ISTM that it
> would solve most of the issues, *but* this is not possible with the
> "thread fork emulation" implemented by pgbench, which really means
> no threads at all.

You could do those same things in the fork emulation mode using anonymous
shared memory, like we do in the server.  That would permit removing the
current "#ifdef PTHREAD_FORK_EMULATION" wart, too.

For the time being, I propose the attached comment patch.

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Attachment

pgbench-thread-timing-model-comment-v1.patch

Re: pgbench progress report improvements - split 3 v2 - A

From

Fabien COELHO

Date:

06 October 2013, 07:40:55

Hello Noah,

>>> Patch (4): Redefine "latency" as reported by pgbench and report "lag" more.
>>
>> Here is a first partial patch, which focusses on measuring latency
>> and reporting the measure under --progress.
>
> This patch contains the features pertaining to both hypothetical patches (3)
> and (4), not just (4) like I requested.

Sorry, I misunderstood the expected scope of your request.

> The sum of the squares of the latencies wraps after 2^63/(10^12 * avg_latency
> * nclients) seconds.  That's unlikely to come up with the ordinary pgbench
> script, but one can reach it in a few hours when benchmarking a command that
> runs for many seconds.  If we care, we can track the figure as a double.  I
> merely added a comment about it.

Indeed I thought about that. I choose int64 because the overflow seemed 
very unlikely: it would required about 1 billion pretty large 100 ms 
latency (2^16.6 µs) transactions to wrap around, which is a multi-year one 
thread run. So I stayed homogeneous to the other accumulator and the 
surveyed data type. Also, the measure is exact with int64, but rounding or 
undeflows could happen with double.

Adding a comment about it is a good idea.

> I restored applicable parts of your update to the --progress documentation
> from pgbench-measurements-v5.patch.
>
> The patch made output like this:
>
>  progress: 7.2 s, 1.7 tps, 205.225 stddev 3.484 ms lat, 45.472 ms lag
>
> [...]

Indeed, the postfix English-like version is not very clear.

>  progress: 36.0 s, 115.2 tps, lat avg 9.678 ms stddev 1.792, lag 0.143 ms
>
> I switched to that, except that I removed the word "avg" to save horizontal
> space and since lag is also an average though not labelled as such.

Yep, space is a concern. That was one of the reason why I used "+-".

>> +        printf("latency average: %.3f ms\n",
>> +               1000.0 * duration / normal_xacts);
>
> I incorporated the "nclients" factor needed here.

Oops!?

Thanks a log for the fixes and the improvements.

Which part do you want as a next step?

-- 
Fabien.

Re: pgbench - exclude pthread_create() from connection start timing

From

Fabien COELHO

Date:

06 October 2013, 07:48:11

>> Note that this is pretty standard stuff with threads, ISTM that it
>> would solve most of the issues, *but* this is not possible with the
>> "thread fork emulation" implemented by pgbench, which really means
>> no threads at all.
>
> You could do those same things in the fork emulation mode using anonymous
> shared memory, like we do in the server.  That would permit removing the
> current "#ifdef PTHREAD_FORK_EMULATION" wart, too.

Yep, but I'm not sure that would reduce pgbench code complexity. Also, I'm 
not sure of any system used with pgbench that do not have threads, so ISTM 
that the thread fork-emulation hack is more or less useless, and it is 
pretty masochistic to maintain when adding features.

> For the time being, I propose the attached comment patch.

It comment seems very clear to me. I do not understand why it starts with 
XXX, though.

-- 
Fabien.

Re: pgbench - exclude pthread_create() from connection start timing

From

Noah Misch

Date:

06 October 2013, 14:10:40

On Sun, Oct 06, 2013 at 09:48:04AM +0200, Fabien COELHO wrote:
> Also, I'm not sure of any system used with pgbench that do not have
> threads, so ISTM that the thread fork-emulation hack is more or less
> useless, and it is pretty masochistic to maintain when adding
> features.

Fair point.  When added, the code pertaining to fork-emulation was
well-isolated, and that remained the case as recently as 9.3.  Your recent
--progress patch was the first to suffer contortions for the benefit of that
mode.  (The per-statement latencies feature just declined to support it.)

> >For the time being, I propose the attached comment patch.
> 
> It comment seems very clear to me. I do not understand why it starts
> with XXX, though.

PostgreSQL code uses that notation regularly.  When I add it, I typically have
in mind "the following is not fully satisfying, but it's not bad enough to
make a point of improving".

I've committed the comment patch.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: pgbench progress report improvements - split 3 v2 - A

From

Noah Misch

Date:

06 October 2013, 14:44:22

On Sun, Oct 06, 2013 at 09:40:40AM +0200, Fabien COELHO wrote:
> Which part do you want as a next step?

I think we're done; here are the distinct changes in your original patch,
along with their dispositions:

 Patch (1): Change the default from --progress=0 to --progress=5
Rejected
 Patch (2): Make --initialize mode respect --progress.
Rejected
 Patch (3): Report the standard deviation of latency.
Committed
 Patch (4): Redefine "latency" as reported by pgbench and report "lag" more.
Committed
 Patch (5): Take thread start time at the beginning of the thread.
Returned with Feedback
 Patch (6): Reduce and compensate throttling underestimation bias.
Committed
 Patch (7): Documentation and --help corrections
Committed
 Patch (8): Remove stray use of "float".
Committed

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: pgbench progress report improvements - split 3 v2 - A

From

Fabien COELHO

Date:

06 October 2013, 15:05:03

Hello Noah,

>  Patch (1): Change the default from --progress=0 to --progress=5
> Rejected

Yep. Too bad, esp as the default is meaningless and does not scale.

>  Patch (2): Make --initialize mode respect --progress.
> Rejected

I missed this one...

This is just about having the same option (--progress) to set the progress 
report delay both under benchmarking and initialization. I do not 
understand where and why it would be rejected, especially as it does not 
imply to change the current default behavior.

>  Patch (5): Take thread start time at the beginning of the thread.
> Returned with Feedback

Hmmm. I fought back on the feedback:-) I thought my arguments where good 
enough to consider an accept.

-- 
Fabien.

Re: pgbench progress report improvements - split 3 v2 - A

From

Noah Misch

Date:

06 October 2013, 16:12:09

On Sun, Oct 06, 2013 at 05:04:56PM +0200, Fabien COELHO wrote:
> > Patch (2): Make --initialize mode respect --progress.
> >Rejected
> 
> I missed this one...

See the second half of this message, including quoted material:
http://www.postgresql.org/message-id/CA+TgmoZNXkm-EtszHX=KWq34H5Ni4CS8DG31no86cmDryAqZ_w@mail.gmail.com

> > Patch (5): Take thread start time at the beginning of the thread.
> >Returned with Feedback
> 
> Hmmm. I fought back on the feedback:-) I thought my arguments where
> good enough to consider an accept.

Here is the feedback in question:
http://www.postgresql.org/message-id/20130930223621.GA125986@tornado.leadboat.com

With or without the patch, reported performance figures are uninformative when
thread start time is substantial relative to benchmark duration.  A mere time
accounting change will not help much; improving this requires tighter
synchronization around the start of actual query activity across threads.  I
didn't read anything in your response as disagreement with that conclusion.

nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: pgbench progress report improvements - split 3 v2 - A

From

Fabien COELHO

Date:

06 October 2013, 16:44:19

>>> Patch (2): Make --initialize mode respect --progress.
>>> Rejected
>>
>> I missed this one...
>
> See the second half of this message, including quoted material:
> http://www.postgresql.org/message-id/CA+TgmoZNXkm-EtszHX=KWq34H5Ni4CS8DG31no86cmDryAqZ_w@mail.gmail.com

I did not read Peter Haas comments as whether --progress should be used 
for both modes, although it is in the section about it.

All his argumentation is about *not changing any defaults*. I understood 
that his "-1" applied about the dropping the row-counting based reporting: 
Robert complains about any "break[ing] backward compatibility again one 
release later" in the paragraph, not really about --progress becoming 
significant under -i, so this would be a veto agains what you called 
"Patch (1)".

So what I've done in version 5 of the patch on this subject is that 
--progress specifies the interval of the progress report in any mode, and 
quiet just set it to five seconds for initialization as is the current 
behavior. Version 5 does not change any current default behavior, as I 
understood that this was the requirement expressed by Robert Haas.

>>> Patch (5): Take thread start time at the beginning of the thread.
>>> Returned with Feedback
>>
>> Hmmm. I fought back on the feedback:-) I thought my arguments where
>> good enough to consider an accept.
>
> Here is the feedback in question:
> http://www.postgresql.org/message-id/20130930223621.GA125986@tornado.leadboat.com
>
> With or without the patch, reported performance figures are uninformative when
> thread start time is substantial relative to benchmark duration.  A mere time
> accounting change will not help much; improving this requires tighter
> synchronization around the start of actual query activity across threads.  I
> didn't read anything in your response as disagreement with that conclusion.

In my answer to the message you mention.

http://www.postgresql.org/message-id/alpine.DEB.2.02.1310011422130.324@sto

I explained why I had to re-take gettimeofday under --rate because the 
start_time value cannot be relied on. The code I suggested simplifies the 
logic by taking the time only once, instead of ignoring the first one 
because it does not make sense. It seems to me that the logical answer to 
this argument could have been "ok". But it seems that you do not percieve 
my logic.

-- 
Fabien.

Re: pgbench progress report improvements - split 3 v2 - A

From

Noah Misch

Date:

06 October 2013, 18:44:07

On Sun, Oct 06, 2013 at 06:44:11PM +0200, Fabien COELHO wrote:
> >>>Patch (2): Make --initialize mode respect --progress.
> >>>Rejected
> >>
> >>I missed this one...
> >
> >See the second half of this message, including quoted material:
> >http://www.postgresql.org/message-id/CA+TgmoZNXkm-EtszHX=KWq34H5Ni4CS8DG31no86cmDryAqZ_w@mail.gmail.com
> 
> I did not read Peter Haas comments as whether --progress should be
> used for both modes, although it is in the section about it.
> 
> All his argumentation is about *not changing any defaults*. I
> understood that his "-1" applied about the dropping the row-counting
> based reporting: Robert complains about any "break[ing] backward
> compatibility again one release later" in the paragraph, not really
> about --progress becoming significant under -i, so this would be a
> veto agains what you called "Patch (1)".
> 
> So what I've done in version 5 of the patch on this subject is that
> --progress specifies the interval of the progress report in any
> mode, and quiet just set it to five seconds for initialization as is
> the current behavior. Version 5 does not change any current default
> behavior, as I understood that this was the requirement expressed by
> Robert Haas.

I don't know whether that accurately describes Robert's position, but I will
elaborate on my own position:

pgbench already offers two schedules of "pgbench --initialize" messaging,
message-per-100k-rows and message-per-5s.  A user too picky to find
satisfaction in either option can filter the messages through grep, sed et al.
We patched pgbench on two occasions during the 9.3 cycle to arrive at that
status quo.  Had I participated, I may well have voted for something like your
proposal over "pgbench --quiet".  Now that we've released --quiet, a proposal
for an additional message schedule option needs to depict a clear and
convincing step forward.  This proposal does not rise to that level.

> >>>Patch (5): Take thread start time at the beginning of the thread.
> >>>Returned with Feedback
> >>
> >>Hmmm. I fought back on the feedback:-) I thought my arguments where
> >>good enough to consider an accept.
> >
> >Here is the feedback in question:
> >http://www.postgresql.org/message-id/20130930223621.GA125986@tornado.leadboat.com
> >
> >With or without the patch, reported performance figures are uninformative when
> >thread start time is substantial relative to benchmark duration.  A mere time
> >accounting change will not help much; improving this requires tighter
> >synchronization around the start of actual query activity across threads.  I
> >didn't read anything in your response as disagreement with that conclusion.
> 
> In my answer to the message you mention.
> 
> http://www.postgresql.org/message-id/alpine.DEB.2.02.1310011422130.324@sto
> 
> I explained why I had to re-take gettimeofday under --rate because
> the start_time value cannot be relied on. The code I suggested
> simplifies the logic by taking the time only once, instead of
> ignoring the first one because it does not make sense. It seems to
> me that the logical answer to this argument could have been "ok".
> But it seems that you do not percieve my logic.

Yeah, perhaps I don't understand.  Throttling behaves fine, because pgbench
already does take a fresh timestamp in threadRun().  Your patch makes
thread->start_time adequate in place of that fresh timestamp.  If that were
the patch's only effect, it would be okay as a minor code cleanup.  That is
not the patch's only effect.  Its other effect, explored thoroughly in my post
referenced above, changes one bad user-visible behavior to a different bad
user-visible behavior.  When a code cleanup has such a side effect, we don't
commit it.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: pgbench progress report improvements - split 3 v2 - A

From

Fabien COELHO

Date:

07 October 2013, 12:02:43

> pgbench already offers two schedules of "pgbench --initialize" messaging,
> message-per-100k-rows and message-per-5s.  A user too picky to find
> satisfaction in either option can filter the messages through grep, sed et al.
> We patched pgbench on two occasions during the 9.3 cycle to arrive at that
> status quo.  Had I participated, I may well have voted for something like your
> proposal over "pgbench --quiet".  Now that we've released --quiet, a proposal
> for an additional message schedule option needs to depict a clear and
> convincing step forward.  This proposal does not rise to that level.

The "step forward" is to have the same options apply to *both* modes, 
instead of options for one mode (default 100k or --quiet 5s), and a 
convenient --progress (adjustable seconds) for the other.

It is "convincing" to me because I hate commands with non orthogonal 
options, as I'm sure not to notice that one option is for one mode and the 
other for another, and to get caught by it.

But this is clearly a matter of design & taste.

-- 
Fabien.

Re: pgbench progress report improvements - split 3 v2 - A

From

Robert Haas

Date:

07 October 2013, 20:07:08

On Sat, Oct 5, 2013 at 6:10 PM, Noah Misch <noah@leadboat.com> wrote:
> The sum of the squares of the latencies wraps after 2^63/(10^12 * avg_latency
> * nclients) seconds.  That's unlikely to come up with the ordinary pgbench
> script, but one can reach it in a few hours when benchmarking a command that
> runs for many seconds.  If we care, we can track the figure as a double.  I
> merely added a comment about it.

Using a double seems wise to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pgbench progress report improvements - split 3 v2 - A

From

Fabien COELHO

Date:

08 October 2013, 12:15:12

> On Sat, Oct 5, 2013 at 6:10 PM, Noah Misch <noah@leadboat.com> wrote:
>> The sum of the squares of the latencies wraps after 2^63/(10^12 * avg_latency
>> * nclients) seconds.  That's unlikely to come up with the ordinary pgbench
>> script, but one can reach it in a few hours when benchmarking a command that
>> runs for many seconds.  If we care, we can track the figure as a double.  I
>> merely added a comment about it.
>
> Using a double seems wise to me.

I think that both int64 & double are fine.

The likelyhood of having underflows (double) or overflows/wraparounds 
(int64) seems pretty low for any practical run. I took the former because 
it is exact... up to when it is totally wrong. The later one may provide 
underestimated results silently: it would fail more continuously.

So I'm okay if it is changed to double consistently.

-- 
Fabien.