Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement) - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Date
Msg-id alpine.DEB.2.02.1306101010220.12980@localhost6.localdomain6
Whole thread Raw
In response to Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers
Hello Greg,

Thanks for this very detailed review and the suggestions!

I'll submit a new patch

>> Question 1: should it report the maximum lang encountered?
>
> I haven't found the lag measurement to be very useful yet, outside of 
> debugging the feature itself.  Accordingly I don't see a reason to add even 
> more statistics about the number outside of testing the code.  I'm seeing 
> some weird lag problems that this will be useful for though right now, more 
> on that a few places below.

I'll explain below why it is really interesting to get this figure, and 
that it is not really available as precisely elsewhere.

>> Question 2: the next step would be to have the current lag shown under
>> option --progress, but that would mean having a combined --throttle
>> --progress patch submission, or maybe dependencies between patches.
>
> This is getting too far ahead.

Ok!

> Let's get the throttle part nailed down before introducing even more 
> moving parts into this.  I've attached an updated patch that changes a 
> few things around already.  I'm not done with this yet and it needs some 
> more review before commit, but it's not too far away from being ready.

Ok. I'll submit a new version by the end of the week.

> This feature works quite well.  On a system that will run at 25K TPS without 
> any limit, I did a run with 25 clients and a rate of 400/second, aiming at 
> 10,000 TPS, and that's what I got:
>
> number of clients: 25
> number of threads: 1
> duration: 60 s
> number of transactions actually processed: 599620
> average transaction lag: 0.307 ms
> tps = 9954.779317 (including connections establishing)
> tps = 9964.947522 (excluding connections establishing)
>
> I never thought of implementing the throttle like this before,

Stochastic processes are a little bit magic:-)

> but it seems to work out well so far.  Check out tps.png to see the 
> smoothness of the TPS curve (the graphs came out of pgbench-tools. 
> There's a little more play outside of the target than ideal for this 
> case.  Maybe it's worth tightening the Poisson curve a bit around its 
> center?

The point of a Poisson distribution is to model random events the kind of 
which are a little bit irregular, such as web requests or queuing clients 
at a taxi stop. I cannot really change the formula, but if you want to 
argue with Siméon Denis Poisson, hist current address is 19th section of 
"Père Lachaise" graveyard in Paris:-)

More seriously, the only parameter that can be changed is the "1000000.0" 
which drives the granularity of the Poisson process. A smaller value would 
mean a smaller potential multiplier; that is how far from the average time 
the schedule can go. This may come under "tightening", although it would 
depart from a "perfect" process and possibly may be a little less 
"smooth"... for a given definition of "tight", "perfect" and "smooth":-)

> [...] What I did instead was think of this as a transaction rate target, 
> which makes the help a whole lot simpler:
>
>  -R SPEC, --rate SPEC
>               target rate per client in transactions per second

Ok, I'm fine with this name.

> Made the documentation easier to write too.  I'm not quite done with that 
> yet, the docs wording in this updated patch could still be better.

I'm not an English native speaker, any help is welcome here. I'll do my 
best.

> I personally would like this better if --rate specified a *total* rate across 
> all clients.

Ok, I can do that, with some reworking so that the stochastic process is 
shared by all threads instead of being within each client. This mean that 
a lock between threads to access some variables, which should not impact 
the test much. Another option is to have a per-thread stochastic process.

> However, there are examples of both types of settings in the 
> program already, so there's no one precedent for which is right here.  -t is 
> per-client and now -R is too; I'd prefer it to be like -T instead.  It's not 
> that important though, and the code is cleaner as it's written right now. 
> Maybe this is better; I'm not sure.

I like the idea of just one process instead of a per-client one. I did not 
try at the beginning because the implementation is less straightforward.

> On the topic of this weird latency spike issue, I did see that show up in 
> some of the results too.

Your example illustrates *exactly* why the lag measure was added.

The Poisson processes generate an ideal event line (that is irregularly 
scheduled transaction start times targetting the expected tps) which 
induces a varrying load that the database is trying to handle.

If it cannot start right away, this means that some transactions are 
differed with respect to their schedule start time. The measure latency 
reports exactly that: the clients do not handle the load. There may be 
some catchup later, that is the clients come back in line with the 
scheduled transactions.

I need to put this measure here because the "schedluled time" is only 
known to pgbench and not available elsewhere. The max would really be more 
interesting than the mean, so as to catch that some things were 
temporarily amiss, even if things went back to nominal later.

> Here's one where I tried to specify a rate higher 
> than the system can actually handle, 80000 TPS total on a SELECT-only test
>
> $ pgbench -S -T 30 -c 8 -j 4 -R10000tps pgbench
> starting vacuum...end.
> transaction type: SELECT only
> scaling factor: 100
> query mode: simple
> number of clients: 8
> number of threads: 4
> duration: 30 s
> number of transactions actually processed: 761779
> average transaction lag: 10298.380 ms

The interpretation is the following: as the database cannot handle the 
load, transactions were processed on average 10 seconds behind their 
scheduled transaction time. You had on average a 10 second latency to 
answer "incoming" requests. Also some transactions where implicitely not 
even scheduled, so the situation is worse than that...

> tps = 25392.312544 (including connections establishing)
> tps = 25397.294583 (excluding connections establishing)
>
> It was actually limited by the capabilities of the hardware, 25K TPS. 10298 
> ms of lag per transaction can't be right though.
>
> Some general patch submission suggestions for you as a new contributor:

Hmmm, I did a few things such as "pgxs" back in 2004, so maybe "not very 
active" is a better description than "new":-)

> -When re-submitting something with improvements, it's a good idea to add a 
> version number to the patch so reviewers can tell them apart easily. But 
> there is no reason to change the subject line of the e-mail each time.  I 
> followed that standard here.  If you updated this again I would name the file 
> pgbench-throttle-v9.patch but keep the same e-mail subject.

Ok.

> -There were some extra carriage return characters in your last submission. 
> Wasn't a problem this time, but if you can get rid of those that makes for a 
> better patch.

Ok.

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Placing hints in line pointers
Next
From: KONDO Mitsumasa
Date:
Subject: Improvement of checkpoint IO scheduler for stable transaction responses