Thread: pgbench throttling latency limit

pgbench throttling latency limit

From
Fabien COELHO
Date:
Add --limit to limit latency under throttling

Under throttling, transactions are scheduled for execution at certain 
times. Transactions may be far behind schedule and the system may catch up 
with the load later. This option allows to change this behavior by 
skipping transactions which are too far behind schedule, and count those 
as skipped.

The idea is to help simulate a latency-constrained environment such as a 
database used by a web server.

-- 
Fabien.

Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
> Add --limit to limit latency under throttling
>
> Under throttling, transactions are scheduled for execution at certain times. 
> Transactions may be far behind schedule and the system may catch up with the 
> load later. This option allows to change this behavior by skipping 
> transactions which are too far behind schedule, and count those as skipped.
>
> The idea is to help simulate a latency-constrained environment such as a 
> database used by a web server.

Find attached a new version: - fix dropped percent computation in the final report - simplify progress report code

-- 
Fabien.

Re: pgbench throttling latency limit

From
Rukh Meski
Date:
Hi Fabien,

On Sun, Aug 24, 2014 at 9:16 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Find attached a new version:
>  - fix dropped percent computation in the final report
>  - simplify progress report code

I have reviewed this patch.

Is the patch in a patch format which has context? Yes.
Does it apply cleanly to the current git master? Yes.
Does it include reasonable tests, necessary doc patches, etc? Yes.

Does the patch actually implement that? Yes.
Do we want that? I think we do, yes.
Do we already have it? No.
Are there dangers? None that I can see.

Does the feature work as advertised? Almost, see below.
Are there corner cases the author has failed to consider? None that I can see.
Are there any assertion failures or crashes? No.

I can't make the -L option work at all.  If I do this: ./pgbench -R 100 -L 1
I get: pgbench: invalid option -- L
Which appears to be caused by the fact that the call to getopt_long()
has not been updated to reflect the new parameter.

Also this part:
+          "  -L, --limit=NUM          under throttling (--rate), skip
transactions that\n"
+          "                           far behind schedule in ms
(default: do not skip)\n"
I would suggest rewording this to something like "skip transactions
that are more than NUM milliseconds behind schedule (default: do not
skip)".

Marking Waiting for Author until these small issues have been fixed.


Thanks,

♜



Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
Hello Rukh,

> I have reviewed this patch.

Thanks!

> [...] I get: pgbench: invalid option -- L
> Which appears to be caused by the fact that the call to getopt_long()
> has not been updated to reflect the new parameter.

Indeed, I only tested/used it with the --limit= syntax.

> Also this part:
> +          "  -L, --limit=NUM          under throttling (--rate), skip
> transactions that\n"
> +          "                           far behind schedule in ms
> (default: do not skip)\n"
> I would suggest rewording this to something like "skip transactions
> that are more than NUM milliseconds behind schedule (default: do not
> skip)".

Done, with milliseconds written as "ms" to keep it short.

> Marking Waiting for Author until these small issues have been fixed.

Please find attached a new version which fixes these two points.

-- 
Fabien.

Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
> Marking Waiting for Author until these small issues have been fixed.

I've put it back to "Needs review". Feel free to set it to "Ready" if it 
is ok for you.

-- 
Fabien.



Re: pgbench throttling latency limit

From
Rukh Meski
Date:
Hi Fabien,

On Tue, Aug 26, 2014 at 04:07 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Please find attached a new version which fixes these two points.

Indeed it does.  Marking the patch ready for a committer.


Thanks,

♜



Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 08/27/2014 03:47 AM, Rukh Meski wrote:
> Hi Fabien,
>
> On Tue, Aug 26, 2014 at 04:07 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>>
>> Please find attached a new version which fixes these two points.
>
> Indeed it does.  Marking the patch ready for a committer.

I find the definition of the latency limit a bit strange. It's a limit 
on how late a transaction can *start* compared to it's scheduled 
starting time, not how long a query is allowed to last. How do figure 
out what it should be set to?

That model might make some sense if you think e.g. of a web application, 
where the web server has a timeout for how long it waits to get a 
database connection from a pool, but once a query is started, the 
transaction is considered a succeess no matter how long it takes. The 
latency limit would be that timeout. But I think a more useful model is 
that when the user clicks a button, he waits at most X seconds for the 
result. If that deadline is exceeded, the web server will give a 404, or 
the user will simply get bored and go away, and the transaction is 
considered a failure.

So I think a more useful model is that new queries arrive at a given 
rate, and each query is expected to finish in X milliseconds from its 
arrival time (i.e the time the query is scheduled to begin, not the time 
it was sent to the server) or it's counted as failed. If a transaction 
cannot even be started by that deadline, because the connection is still 
busy with the previous query, it's counted as failed without even 
sending it to the server.

With that definition, it makes sense to specify the latency limit even 
without --rate. In that case, it's simply a limit on how long each 
query's execution is allowed to last until it's considered as failed. 
IOW, each query's scheduled start time is when the previous query ends.

- Heikki




Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
Hello Heikki,

> I find the definition of the latency limit a bit strange. It's a limit on how 
> late a transaction can *start* compared to it's scheduled starting time, not 
> how long a query is allowed to last.

Yes. This is what can be done easily with pgbench under throttling. Note 
that if transactions take long it is recorded (average, stddev...) so it 
appears elsewhere.

> How do figure out what it should be set to?

It is really just a simple tool to measure unresponsiveness under 
throttling, which I'm testing and complaining about in another thread.

The underlying model I have in mind would some timeout from an 
application, say a web server, or a pooling process which is handling a 
queue of requests...

Now, if I describe that as "lag limit" instead if "latency limit", maybe
it is clearer and better?

> That model might make some sense if you think e.g. of a web application,
> [...]

Yep, that is what I had in mind, but the primary objective is really to 
check whether pg is responsive or not.

> So I think a more useful model is that new queries arrive at a given 
> rate, and each query is expected to finish in X milliseconds from its 
> arrival time (i.e the time the query is scheduled to begin, not the time 
> it was sent to the server) or it's counted as failed. If a transaction 
> cannot even be started by that deadline, because the connection is still 
> busy with the previous query, it's counted as failed without even 
> sending it to the server. With that definition, it makes sense to 
> specify the latency limit even without --rate.

Yep. But that is not what I'm doing here. It would be interesting as well. 
It would be another patch.

> In that case, it's simply a limit on how long each query's 
> execution is allowed to last until it's considered as failed. IOW, each 
> query's scheduled start time is when the previous query ends.

Not under --rate... that is the point of throttling!  Under throttling, 
the latency should really be computed wrt to the schedule start time and 
not the actual start time which may be 10 seconds afterwards when things 
are going bad... Also, there is the question of whether the "failed query" 
is executed or not. Here I'm not executing them, in effect they were 
aborted by the application. With your suggestion they would be executed 
anyway but considered failed.

So what you are suggesting is another (interesting) functionnality, that 
could indeed be named "latency limit" (count slow above a threshold 
queries), what I'm doing here is "lag limit" (scheduled query could not 
start on time and are skipped, this is really specific to --rate).

In the updated patch attached, I changed the explanations, documentation 
and name to "lag limit" instead of "latency limit" to clarify this point. 
It was really a misnommer.

-- 
Fabien.

Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 08/27/2014 12:41 PM, Fabien COELHO wrote:
>
> Hello Heikki,
>
>> I find the definition of the latency limit a bit strange. It's a limit on how
>> late a transaction can *start* compared to it's scheduled starting time, not
>> how long a query is allowed to last.
>
> Yes. This is what can be done easily with pgbench under throttling. Note
> that if transactions take long it is recorded (average, stddev...) so it
> appears elsewhere.
>
>> How do figure out what it should be set to?
>
> It is really just a simple tool to measure unresponsiveness under
> throttling, which I'm testing and complaining about in another thread.

Ok, but wouldn't the definition I gave be just as useful for that 
purpose, and more useful in general?

You didn't really answer my question: How do you figure out what to set 
it to? With a latency limit on when the query should finish, as opposed 
to how late it can start, it's a lot easier to give a number. For 
example, your requirements might state that a user must always get a 
response to a click on a web page in 200 ms, so you set the limit to 200 ms.

>> So I think a more useful model is that new queries arrive at a given
>> rate, and each query is expected to finish in X milliseconds from its
>> arrival time (i.e the time the query is scheduled to begin, not the time
>> it was sent to the server) or it's counted as failed. If a transaction
>> cannot even be started by that deadline, because the connection is still
>> busy with the previous query, it's counted as failed without even
>> sending it to the server. With that definition, it makes sense to
>> specify the latency limit even without --rate.
>
> Yep. But that is not what I'm doing here. It would be interesting as well.
> It would be another patch.

Why is your patch more interesting than what I described? I'm pretty 
sure we don't need both.

>> In that case, it's simply a limit on how long each query's
>> execution is allowed to last until it's considered as failed. IOW, each
>> query's scheduled start time is when the previous query ends.
>
> Not under --rate... that is the point of throttling!

Right, I got that. With "in that case", I meant when you're not throttling.

> Under throttling,
> the latency should really be computed wrt to the schedule start time and
> not the actual start time which may be 10 seconds afterwards when things
> are going bad... Also, there is the question of whether the "failed query"
> is executed or not. Here I'm not executing them, in effect they were
> aborted by the application. With your suggestion they would be executed
> anyway but considered failed.

I was thinking that if a query is already late when the connection 
becomes free to execute it, it would not be executed. It would be 
skipped, just as in your patch.

> So what you are suggesting is another (interesting) functionnality, that
> could indeed be named "latency limit" (count slow above a threshold
> queries), what I'm doing here is "lag limit" (scheduled query could not
> start on time and are skipped, this is really specific to --rate).

Ok, but *why* are you doing a "lag limit", and not a "latency limit"? 
Under what circumstances is the lag limit a more useful setting?

- Heikki



Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
Hello Heikki,

> [...]
> With a latency limit on when the query should finish, as opposed to how 
> late it can start, it's a lot easier to give a number. For example, your 
> requirements might state that a user must always get a response to a click on 
> a web page in 200 ms, so you set the limit to 200 ms.

Yep. See below for the details.

> [...] Why is your patch more interesting than what I described?

It is more interesting because it exists, it is short and simple, it 
works, and it is useful right now to test pg responsiveness and also to 
model some timeout behavior on the client side?

> I'm pretty sure we don't need both.

Why not? Testing performance is tricky enough, the tool must be flexible.

I'm pretty sure that I'm interested in testing pg responsiveness right 
now, so I did the simpler one I need for that purpose. It somehow models 
an application/pooler queue management timeout, that would anyway proceed 
with what is already started.

> [...]
>
> I was thinking that if a query is already late when the connection becomes 
> free to execute it, it would not be executed. It would be skipped, just as in 
> your patch.

As for an actual "latency limit" under throttling, this is significantly 
more tricky and invasive to implement... ISTM that it would mean:
 - if the tx is not stated an the latency is already consummed, SKIP++.
 - if the tx is after its schedule start time but under latency, then   start it, and maybe inject a "SET TIMEOUT...".
 - if a tx is being processed but reaches its latency limit (after   schedule start time), abort it coldly, ROLLBACK++
(wellif the tx is   really started, there could also be shell commands and \set stuff in a   pgbench script, which mean
startedis not really started, so it would   be INTERRUPT++ if no BEGIN was sent).
 
 - if a tx is finished but the final commit returned after the latency   deadline, you cannot abort it anymore but it
islate nevertheless,   LATE++.
 

This is doable but far beyond my current needs. Moreover, I'm not sure 
that such a patch would pass because of invasiveness and complexity, so it 
could be a total loss of time.

> Ok, but *why* are you doing a "lag limit", and not a "latency limit"?

Because it is much simpler (see above) and is enough for testing pg 
responsiveness issue, which is my current objective, and models some
client timeout behavior.

> Under what circumstances is the lag limit a more useful setting?

It is not "more" useful" per se, it is what I'm using to test pg 
unresponsivness with a simple to define and interpret measure wrt 
throttling.

If I would do "latency limit" under throttling, it would be (1) more time 
to develop, more complex, more invasive in the code (see above, + also the 
implementation when not under throttling), (2) more complex to interpret, 
with at least 5 possible outcomes (skipped, interrupted, committed on 
time, committed but late, aborted), (3) this added information would not 
be useful to me.

I've submitted this "simple" lag limit version because being able to 
measure quickly and simply (un)responsiveness seems like a good idea, 
especially given the current state of things.

-- 
Fabien.



Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 08/27/2014 02:37 PM, Fabien COELHO wrote:
> As for an actual "latency limit" under throttling, this is significantly
> more tricky and invasive to implement... ISTM that it would mean:
>
>    - if the tx is not stated an the latency is already consummed, SKIP++.
>
>    - if the tx is after its schedule start time but under latency, then
>      start it, and maybe inject a "SET TIMEOUT...".
>
>    - if a tx is being processed but reaches its latency limit (after
>      schedule start time), abort it coldly, ROLLBACK++ (well if the tx is
>      really started, there could also be shell commands and \set stuff in a
>      pgbench script, which mean started is not really started, so it would
>      be INTERRUPT++ if no BEGIN was sent).
>
>    - if a tx is finished but the final commit returned after the latency
>      deadline, you cannot abort it anymore but it is late nevertheless,
>      LATE++.

Yeah, something like that. I don't think it would be necessary to set 
statement_timeout, you can inject that in your script or postgresql.conf 
if you want. I don't think aborting a transaction that's already started 
is necessary either. You could count it as LATE, but let it finish first.

> This is doable but far beyond my current needs. Moreover, I'm not sure
> that such a patch would pass because of invasiveness and complexity, so it
> could be a total loss of time.
>
>> Ok, but *why* are you doing a "lag limit", and not a "latency limit"?
>
> Because it is much simpler (see above) and is enough for testing pg
> responsiveness issue, which is my current objective, and models some
> client timeout behavior.
>
>> Under what circumstances is the lag limit a more useful setting?
>
> It is not "more" useful" per se, it is what I'm using to test pg
> unresponsivness with a simple to define and interpret measure wrt
> throttling.
>
> If I would do "latency limit" under throttling, it would be (1) more time
> to develop, more complex, more invasive in the code (see above, + also the
> implementation when not under throttling), (2) more complex to interpret,
> with at least 5 possible outcomes (skipped, interrupted, committed on
> time, committed but late, aborted), (3) this added information would not
> be useful to me.
>
> I've submitted this "simple" lag limit version because being able to
> measure quickly and simply (un)responsiveness seems like a good idea,
> especially given the current state of things.

Ok, fair enough. I don't think doing a "latency limit" would be 
significantly harder, but I can't force you. I'll mark this as Returned 
with Feedback then.

- Heikki




Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
>> As for an actual "latency limit" under throttling, this is significantly
>> more tricky and invasive to implement... ISTM that it would mean:
>> [...] 
>
> Yeah, something like that. I don't think it would be necessary to set 
> statement_timeout, you can inject that in your script or postgresql.conf if 
> you want. I don't think aborting a transaction that's already started is 
> necessary either. You could count it as LATE, but let it finish first.

If you remove all difficult cases from the spec, it is obviously much 
simpler to implement:-) It seems that your simplified version of "latency 
limit" would be just to distinguish LATE from ONTIME among the committed 
ones, compared to the current version, and not to actually limit the 
latency, which is the tricky part.

>> I've submitted this "simple" lag limit version because being able to
>> measure quickly and simply (un)responsiveness seems like a good idea,
>> especially given the current state of things.
>
> Ok, fair enough. I don't think doing a "latency limit" would be significantly 
> harder, but I can't force you. I'll mark this as Returned with Feedback then.

Hmmm. I can distinguish just the two cases. Rather mark it as "waiting on 
author", I may give it a go.

-- 
Fabien.



Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 08/27/2014 06:08 PM, Fabien COELHO wrote:
>>> I've submitted this "simple" lag limit version because being able to
>>> measure quickly and simply (un)responsiveness seems like a good idea,
>>> especially given the current state of things.
>>
>> Ok, fair enough. I don't think doing a "latency limit" would be significantly
>> harder, but I can't force you. I'll mark this as Returned with Feedback then.
>
> Hmmm. I can distinguish just the two cases. Rather mark it as "waiting on
> author", I may give it a go.

Feel free to mark it as such if you think you can get a new version 
posted in the next few days.

- Heikki



Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
>> [...]
>
> Yeah, something like that. I don't think it would be necessary to set 
> statement_timeout, you can inject that in your script or postgresql.conf if 
> you want. I don't think aborting a transaction that's already started is 
> necessary either. You could count it as LATE, but let it finish first.

I've implemented something along these simplified lines. The latency is 
not limited as such, but slow (over the limit) queries are counted and 
reported.

-- 
Fabien.

Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 08/27/2014 08:05 PM, Fabien COELHO wrote:
>
>>> [...]
>>
>> Yeah, something like that. I don't think it would be necessary to set
>> statement_timeout, you can inject that in your script or postgresql.conf if
>> you want. I don't think aborting a transaction that's already started is
>> necessary either. You could count it as LATE, but let it finish first.
>
> I've implemented something along these simplified lines. The latency is
> not limited as such, but slow (over the limit) queries are counted and
> reported.

Ok, thanks.

This now begs the question:

In --rate mode, shouldn't the reported transaction latency also be 
calculated from the *scheduled* start time, not the time the transaction 
actually started? Otherwise we're using two different definitions of 
"latency", one for the purpose of the limit, and another for reporting.

- Heikki




Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
Hello Heikki,

> This now begs the question:
>
> In --rate mode, shouldn't the reported transaction latency also be calculated 
> from the *scheduled* start time, not the time the transaction actually 
> started? Otherwise we're using two different definitions of "latency", one 
> for the purpose of the limit, and another for reporting.

It could. Find a small patch **on top of v5** which does that. I've tried 
to update the documentation accordingly as well.

Note that the information is already there as the average lag time is 
reported, ISTM that:
    avg latency2 ~ avg lag + avg latency1

so it is just a matter of choice, both are ok somehow. I would be fine 
with both.

-- 
Fabien.

Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
Hello Heikki,

> [...] I would be fine with both.

After giving it some thought, ISTM better to choose consistency over 
intuition, and have latency under throttling always defined wrt the 
scheduled start time and not the actual start time, even if having a 
latency of 10000 ms for an OLTP load might seem surprising to some.

The other one can be computed by substracting the average lag time.

I attached a v6 which is a consolidate patch of v5 + the small update for 
the latency definition.

-- 
Fabien.

Re: pgbench throttling latency limit

From
Andres Freund
Date:
Hi,

I generally want to say that having a feature like this feels *very*
helpful to me. Lots of pg development hasn't really paid attention to
anything but the final pgbench results...

On 2014-08-29 19:48:43 +0200, Fabien COELHO wrote:
> +    if (latency_limit)
> +        printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "\n",
> +               latency_limit / 1000.0, latency_late);
> +

Any reason not to report a percentage here?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
>> +    if (latency_limit)
>> +        printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "\n",
>> +               latency_limit / 1000.0, latency_late);
>> +
>
> Any reason not to report a percentage here?

Yes: I did not thought of it.

Here is a v7, with a percent. I also added a paragraph in the documenation 
about how the latency is computed under throttling, and I tried to reorder 
the reported stuff so that it is more logical.

-- 
Fabien.

Re: pgbench throttling latency limit

From
Jan Wieck
Date:
On 08/27/2014 04:08 AM, Heikki Linnakangas wrote:
> That model might make some sense if you think e.g. of a web application,
> where the web server has a timeout for how long it waits to get a
> database connection from a pool, but once a query is started, the
> transaction is considered a succeess no matter how long it takes. The
> latency limit would be that timeout. But I think a more useful model is
> that when the user clicks a button, he waits at most X seconds for the
> result. If that deadline is exceeded, the web server will give a 404, or
> the user will simply get bored and go away, and the transaction is
> considered a failure.

Correct, the whole TPC-B model better fits an application where client 
requests enter a queue at the specified TPS rate and that queue is 
processed.

While we are at it,

Note that in the original TPC-B specification the transaction duration 
measured is the time from receiving the client request (in current 
pgbench under throttling that is for when the transaction is scheduled) 
and when the request is answered. This is the client visible response 
time, which has nothing to do with the database latency.

As per TPC-B, the entire test is only valid if 90% of all client 
response times are within 2 seconds.

It would be useful if pgbench would

A) measure and report that client response time in the per transaction   log files and

B) report at the end what percentage of transactions finished within   a specified response time constraint (default 2
seconds).


Regards,
Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info



Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
>> That model might make some sense if you think e.g. of a web application,
>> where the web server has a timeout for how long it waits to get a
>> database connection from a pool, but once a query is started, the
>> transaction is considered a succeess no matter how long it takes. The
>> latency limit would be that timeout. But I think a more useful model is
>> that when the user clicks a button, he waits at most X seconds for the
>> result. If that deadline is exceeded, the web server will give a 404, or
>> the user will simply get bored and go away, and the transaction is
>> considered a failure.
>
> Correct, the whole TPC-B model better fits an application where client 
> requests enter a queue at the specified TPS rate and that queue is processed.
>
> While we are at it,
>
> Note that in the original TPC-B specification the transaction duration 
> measured is the time from receiving the client request (in current pgbench 
> under throttling that is for when the transaction is scheduled) and when the 
> request is answered. This is the client visible response time, which has 
> nothing to do with the database latency.

Ok. This correspond to the definition used in the current patch. However 
ISTM that the tpc-b bench is "as fast as possible", there is no underlying 
schedule as with the throttled pgbench.

> As per TPC-B, the entire test is only valid if 90% of all client response 
> times are within 2 seconds.
>
> It would be useful if pgbench would
>
> A) measure and report that client response time in the per transaction
>   log files and

I never used the per transaction log file. I think that it may already be 
the case that it contains this information when not throttled. When under 
throttling, I do not know.

> B) report at the end what percentage of transactions finished within
>   a specified response time constraint (default 2 seconds).

What is currently reported is the complement (% of transactions completed 
over the time limit).

Note that despite pg appaling latency performance, in may stay well over 
the 90% limit, or even 100%: when things are going well a lot of 
transaction run in about ms, while when things are going bad transactions 
would take a long time (although possibly under or about 1s anyway), *but* 
very few transactions are passed, the throughput is very small. The fact 
that during 15 seconds only 30 transactions are processed is a detail that 
does not show up in the metric.

-- 
Fabien.



Re: pgbench throttling latency limit

From
Jan Wieck
Date:
On 09/05/2014 10:12 AM, Fabien COELHO wrote:
> Note that despite pg appaling latency performance, in may stay well over
> the 90% limit, or even 100%: when things are going well a lot of
> transaction run in about ms, while when things are going bad transactions
> would take a long time (although possibly under or about 1s anyway), *but*
> very few transactions are passed, the throughput is very small. The fact
> that during 15 seconds only 30 transactions are processed is a detail that
> does not show up in the metric.

I haven't used the real pgbench for a long time. I will have to look at 
your patch and see what the current version actually does or does not.

What I have been using is a Python version of pgbench that I wrote for 
myself when I started learning that language. That one does record both 
values, the DB transaction latency and the client response time (time 
from the request being entered into the Queue until transaction commit). 
When I look at those results it is possible to have an utterly failing 
run, with <60% of client response times being within 2 seconds, but all 
the DB transactions are still in milliseconds.

As said, I'll have to take a look at it. Since I am on vacation next 
week, getting ready for my first day at EnterpriseDB, this may actually 
happen.


Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info



Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 09/05/2014 06:38 PM, Jan Wieck wrote:
> On 09/05/2014 10:12 AM, Fabien COELHO wrote:
>> Note that despite pg appaling latency performance, in may stay well over
>> the 90% limit, or even 100%: when things are going well a lot of
>> transaction run in about ms, while when things are going bad transactions
>> would take a long time (although possibly under or about 1s anyway), *but*
>> very few transactions are passed, the throughput is very small. The fact
>> that during 15 seconds only 30 transactions are processed is a detail that
>> does not show up in the metric.

Yeah, it makes much more sense to measure the latency from the 
"scheduled" time than the actual time.

> I haven't used the real pgbench for a long time. I will have to look at
> your patch and see what the current version actually does or does not.
>
> What I have been using is a Python version of pgbench that I wrote for
> myself when I started learning that language. That one does record both
> values, the DB transaction latency and the client response time (time
> from the request being entered into the Queue until transaction commit).
> When I look at those results it is possible to have an utterly failing
> run, with <60% of client response times being within 2 seconds, but all
> the DB transactions are still in milliseconds.

I think we have to reconsider what we're reporting in 9.4, when --rate 
is enabled, even though it's already very late in the release cycle. 
It's a bad idea to change the definition of latency between 9.4 and 9.5, 
so let's get it right in 9.4.

> As said, I'll have to take a look at it. Since I am on vacation next
> week, getting ready for my first day at EnterpriseDB, this may actually
> happen.

Oh, congrats! :-)

- Heikki




Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 09/09/2014 01:49 PM, Heikki Linnakangas wrote:
> I think we have to reconsider what we're reporting in 9.4, when --rate
> is enabled, even though it's already very late in the release cycle.
> It's a bad idea to change the definition of latency between 9.4 and 9.5,
> so let's get it right in 9.4.

As per the attached patch. I think we should commit this to 9.4. Any
objections?

The text this patch adds to the documentation needs some rewording,
though. As does this existing paragraph:

> High rate limit schedule lag values, that is lag values that are
> large compared to the actual transaction latency, indicate that
> something is amiss in the throttling process. High schedule lag can
> highlight a subtle problem there even if the target rate limit is met
> in the end. One possible cause of schedule lag is insufficient
> pgbench threads to handle all of the clients. To improve that,
> consider reducing the number of clients, increasing the number of
> threads in pgbench, or running pgbench on a separate host. Another
> possibility is that the database is not keeping up with the load at
> some point. When that happens, you will have to reduce the expected
> transaction rate to lower schedule lag.

It took me ages to parse "high rate limit schedule lag values".

- Heikki


Attachment

Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
Hello Heikki,

>> I think we have to reconsider what we're reporting in 9.4, when --rate
>> is enabled, even though it's already very late in the release cycle.
>> It's a bad idea to change the definition of latency between 9.4 and 9.5,
>> so let's get it right in 9.4.

Indeed.

> As per the attached patch. I think we should commit this to 9.4. Any 
> objections?

Ok for me. Some more propositions about the doc below.

> The text this patch adds to the documentation needs some rewording, though.

Probably. Not sure how to improve.

> As does this existing paragraph:
>
>> High rate limit schedule lag values, that is lag values that are
>> large compared to the actual transaction latency, indicate that
>> something is amiss in the throttling process. High schedule lag can
>> highlight a subtle problem there even if the target rate limit is met
>> in the end.

>> One possible cause of schedule lag is insufficient
>> pgbench threads to handle all of the clients. To improve that,
>> consider reducing the number of clients, increasing the number of
>> threads in pgbench, or running pgbench on a separate host. Another
>> possibility is that the database is not keeping up with the load at
>> some point. When that happens, you will have to reduce the expected
>> transaction rate to lower schedule lag.
>
> It took me ages to parse "high rate limit schedule lag values".

Indeed, I'm not proud of that one... Moreover the first sentence becomes 
false with the new latency computation, as the lag time is included.

I would suggest:

"When under throttling, the reported lag time measures the delay between 
the scheduled start time for the transaction and its actual start time. A 
high value, where the lag time represent most of the transaction latency, 
may indicate that something is amiss in the throttling process itself, 
even if the target rate is met in the end. One possible cause ..."

-- 
Fabien.



Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 09/09/2014 03:35 PM, Fabien COELHO wrote:
>
> Hello Heikki,
>
>>> I think we have to reconsider what we're reporting in 9.4, when --rate
>>> is enabled, even though it's already very late in the release cycle.
>>> It's a bad idea to change the definition of latency between 9.4 and 9.5,
>>> so let's get it right in 9.4.
>
> Indeed.
>
>> As per the attached patch. I think we should commit this to 9.4. Any
>> objections?
>
> Ok for me. Some more propositions about the doc below.

I looked closer at the this, and per Jan's comments, realized that we
don't log the lag time in the per-transaction log file. I think that's a
serious omission; when --rate is used, the schedule lag time is
important information to make sense of the result. I think we have to
apply the attached patch, and backpatch to 9.4. (The documentation on
the log file format needs updating)

Also, this is bizarre:

>         /*
>          * Use inverse transform sampling to randomly generate a delay, such
>          * that the series of delays will approximate a Poisson distribution
>          * centered on the throttle_delay time.
>          *
>          * 10000 implies a 9.2 (-log(1/10000)) to 0.0 (log 1) delay
>          * multiplier, and results in a 0.055 % target underestimation bias:
>          *
>          * SELECT 1.0/AVG(-LN(i/10000.0)) FROM generate_series(1,10000) AS i;
>          * = 1.000552717032611116335474
>          *
>          * If transactions are too slow or a given wait is shorter than a
>          * transaction, the next transaction will start right away.
>          */
>         int64        wait = (int64) (throttle_delay *
>                   1.00055271703 * -log(getrand(thread, 1, 10000) / 10000.0));

We're using getrand() to generate a uniformly distributed random value
between 1 and 10000, and then convert it to a double between 0.0 and
1.0. But getrand() is implemented by taking a double between 0.0 and 1.0
and converting it an integer, so we're just truncating the original
floating point number unnecessarily.  I think we should add a new
function, getPoissonRand(), that uses pg_erand48() directly. We already
have similiar getGaussianRand() and getExponentialRand() functions.
Barring objections, I'll prepare another patch to do that, and backpatch
to 9.4.
- Heikki


Attachment

Re: pgbench throttling latency limit

From
Mitsumasa KONDO
Date:
Hi,

I find typo in your patch. Please confirm.

@line 239
- agg->sum2_lag = 0;
+  agg->sum_lag = 0;

And back patch is welcome for me.

Best Regards,
--
Mitsumasa KONDO

Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
Hello Heikki,

> I looked closer at the this, and per Jan's comments, realized that we don't 
> log the lag time in the per-transaction log file. I think that's a serious 
> omission; when --rate is used, the schedule lag time is important information 
> to make sense of the result. I think we have to apply the attached patch, and 
> backpatch to 9.4. (The documentation on the log file format needs updating)

Indeed. I think that people do not like it to change. I remember that I 
suggested to change timestamps to "xxxx.yyyyyy" instead of the unreadable 
"xxxx yyy", and be told not to, because some people have tool which 
process the output so the format MUST NOT CHANGE. So my behavior is not to 
avoid touching anything in this area.

I'm fine if you do it, though:-) However I have not time to have a precise 
look at your patch to cross-check it before Friday.

> Also, this is bizarre:
>
>> int64 wait = (int64) (throttle_delay *
>>   1.00055271703 * -log(getrand(thread, 1, 10000) / 10000.0));
>
> We're using getrand() to generate a uniformly distributed random value 
> between 1 and 10000, and then convert it to a double between 0.0 and 1.0.

The reason for this conversion is to have randomness but to still avoid 
going to extreme multiplier values. The idea is to avoid a very large 
multiplier which would generate (even if it is not often) a 0 tps when 100 
tps is required. The 10000 granularity is basically random but the 
multiplier stays bounded (max 9.2, so the minimum possible tps would be 
around 11 for a target of 100 tps, bar issues from the database for 
processing the transactions).

So although this feature can be discussed and amended, it is fully 
voluntary. I think that it make sense so I would prefer to keep it as is. 
Maybe the comments could be update to be clearer.

-- 
Fabien.



Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 09/10/2014 05:57 PM, Fabien COELHO wrote:
>
> Hello Heikki,
>
>> I looked closer at the this, and per Jan's comments, realized that we don't
>> log the lag time in the per-transaction log file. I think that's a serious
>> omission; when --rate is used, the schedule lag time is important information
>> to make sense of the result. I think we have to apply the attached patch, and
>> backpatch to 9.4. (The documentation on the log file format needs updating)
>
> Indeed. I think that people do not like it to change. I remember that I
> suggested to change timestamps to "xxxx.yyyyyy" instead of the unreadable
> "xxxx yyy", and be told not to, because some people have tool which
> process the output so the format MUST NOT CHANGE. So my behavior is not to
> avoid touching anything in this area.
>
> I'm fine if you do it, though:-) However I have not time to have a precise
> look at your patch to cross-check it before Friday.

This is different from changing "xxx yyy" to "xxx.yyy" in two ways. 
First, this adds new information to the log that just isn't there 
otherwise, it's not just changing the format for the sake of it. Second, 
in this case it's easy to write a parser for the log format so that it 
works with the old and new formats. Many such programs would probably 
ignore any unexpected extra fields, as a matter of lazy programming, 
while changing the separator from space to a dot would surely require 
changing every parsing program.

We could leave out the lag fields, though, when --rate is not used. 
Without --rate, the lag is always zero anyway. That would keep the file 
format unchanged, when you don't use the new --rate feature. I'm not 
sure if that would be better or worse for programs that might want to 
parse the information.

>> Also, this is bizarre:
>>
>>> int64 wait = (int64) (throttle_delay *
>>>    1.00055271703 * -log(getrand(thread, 1, 10000) / 10000.0));
>>
>> We're using getrand() to generate a uniformly distributed random value
>> between 1 and 10000, and then convert it to a double between 0.0 and 1.0.
>
> The reason for this conversion is to have randomness but to still avoid
> going to extreme multiplier values. The idea is to avoid a very large
> multiplier which would generate (even if it is not often) a 0 tps when 100
> tps is required. The 10000 granularity is basically random but the
> multiplier stays bounded (max 9.2, so the minimum possible tps would be
> around 11 for a target of 100 tps, bar issues from the database for
> processing the transactions).
>
> So although this feature can be discussed and amended, it is fully
> voluntary. I think that it make sense so I would prefer to keep it as is.
> Maybe the comments could be update to be clearer.

Ok, yeah, the comments indeed didn't mention anything about that. I 
don't think such clamping is necessary. With that 9.2x clamp on the 
multiplier, the probability that any given transaction hits it is about 
1/10000. And a delay 9.2 times the average is still quite reasonable, 
IMHO. The maximum delay on my laptop, when pg_erand48() returns DBL_MIN, 
seems to be about 700 x the average, which is still reasonable if you 
run a decent number of transactions. And of course, the probability of 
hitting such an extreme value is miniscule, so if you're just doing a 
few quick test runs with a small total number of transactions, you won't 
hit that.

- Heikki




Re: pgbench throttling latency limit

From
Jan Wieck
Date:
On 09/10/2014 11:28 AM, Heikki Linnakangas wrote:
> On 09/10/2014 05:57 PM, Fabien COELHO wrote:
>>
>> Hello Heikki,
>>
>>> I looked closer at the this, and per Jan's comments, realized that we don't
>>> log the lag time in the per-transaction log file. I think that's a serious
>>> omission; when --rate is used, the schedule lag time is important information
>>> to make sense of the result. I think we have to apply the attached patch, and
>>> backpatch to 9.4. (The documentation on the log file format needs updating)
>>
>> Indeed. I think that people do not like it to change. I remember that I
>> suggested to change timestamps to "xxxx.yyyyyy" instead of the unreadable
>> "xxxx yyy", and be told not to, because some people have tool which
>> process the output so the format MUST NOT CHANGE. So my behavior is not to
>> avoid touching anything in this area.
>>
>> I'm fine if you do it, though:-) However I have not time to have a precise
>> look at your patch to cross-check it before Friday.
>
> This is different from changing "xxx yyy" to "xxx.yyy" in two ways.
> First, this adds new information to the log that just isn't there
> otherwise, it's not just changing the format for the sake of it. Second,
> in this case it's easy to write a parser for the log format so that it
> works with the old and new formats. Many such programs would probably
> ignore any unexpected extra fields, as a matter of lazy programming,
> while changing the separator from space to a dot would surely require
> changing every parsing program.
>
> We could leave out the lag fields, though, when --rate is not used.
> Without --rate, the lag is always zero anyway. That would keep the file
> format unchanged, when you don't use the new --rate feature. I'm not
> sure if that would be better or worse for programs that might want to
> parse the information.

We could also leave the default output format as is and introduce 
another option with a % style format string.


Jan


>
>>> Also, this is bizarre:
>>>
>>>> int64 wait = (int64) (throttle_delay *
>>>>    1.00055271703 * -log(getrand(thread, 1, 10000) / 10000.0));
>>>
>>> We're using getrand() to generate a uniformly distributed random value
>>> between 1 and 10000, and then convert it to a double between 0.0 and 1.0.
>>
>> The reason for this conversion is to have randomness but to still avoid
>> going to extreme multiplier values. The idea is to avoid a very large
>> multiplier which would generate (even if it is not often) a 0 tps when 100
>> tps is required. The 10000 granularity is basically random but the
>> multiplier stays bounded (max 9.2, so the minimum possible tps would be
>> around 11 for a target of 100 tps, bar issues from the database for
>> processing the transactions).
>>
>> So although this feature can be discussed and amended, it is fully
>> voluntary. I think that it make sense so I would prefer to keep it as is.
>> Maybe the comments could be update to be clearer.
>
> Ok, yeah, the comments indeed didn't mention anything about that. I
> don't think such clamping is necessary. With that 9.2x clamp on the
> multiplier, the probability that any given transaction hits it is about
> 1/10000. And a delay 9.2 times the average is still quite reasonable,
> IMHO. The maximum delay on my laptop, when pg_erand48() returns DBL_MIN,
> seems to be about 700 x the average, which is still reasonable if you
> run a decent number of transactions. And of course, the probability of
> hitting such an extreme value is miniscule, so if you're just doing a
> few quick test runs with a small total number of transactions, you won't
> hit that.
>
> - Heikki
>


-- 
Jan Wieck
Senior Software Engineer
http://slony.info



Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 09/10/2014 05:47 PM, Mitsumasa KONDO wrote:
> Hi,
>
> I find typo in your patch. Please confirm.
>
> @line 239
> - agg->sum2_lag = 0;
> +  agg->sum_lag = 0;

Ah thanks, cood catch!

> And back patch is welcome for me.

I've committed and backpatched this, as well as a patch to refactor the 
way the Poisson delay is computed.

I kept the log file format unchanged when --rate is not used, so it now 
has a different number of fields depending on whether --rate is used or not.

Please review the changes I made one more time, to double-check that I 
mess up something.

- Heikki




Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 08/30/2014 07:16 PM, Fabien COELHO wrote:
>
>>> +    if (latency_limit)
>>> +        printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "\n",
>>> +               latency_limit / 1000.0, latency_late);
>>> +
>>
>> Any reason not to report a percentage here?
>
> Yes: I did not thought of it.
>
> Here is a v7, with a percent. I also added a paragraph in the documenation
> about how the latency is computed under throttling, and I tried to reorder
> the reported stuff so that it is more logical.

Now that I've finished the detour and committed and backpatched the 
changes to the way latency is calculated, we can get back to this patch. 
It needs to be rebased.

How should skipped transactions should be taken into account in the log 
file output, with and without aggregation? I assume we'll want to have 
some trace of skipped transactions in the logs.

- Heikki




Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
Hello Heikki,

> Now that I've finished the detour and committed and backpatched the changes 
> to the way latency is calculated, we can get back to this patch. It needs to 
> be rebased.

Before rebasing, I think that there are a few small problems with the 
modification applied to switch from an integer range to double.

(1) ISTM that the + 0.5 which remains in the PoissonRand computation comes 
from the previous integer approach and is not needed here. If I'm not 
mistaken the formula should be plain:
     -log(uniform) * center

(2) I'm not sure of the name "center", I think that "lambda" or    "mean" would be more appropriate.

(3) I wish that the maximum implied multiplier could be explicitely    documented in the source code. From pg_rand48
sourcecode, I think    that it is 33.27106466687737
 

-- 
Fabien.



Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
> (3) I wish that the maximum implied multiplier could be explicitely
>    documented in the source code. From pg_rand48 source code, I think
>    that it is 33.27106466687737

Small possibly buggy code attached, to show how I computed the above 
figure.

-- 
Fabien.

Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
Hello Heikki

> Now that I've finished the detour and committed and backpatched the changes 
> to the way latency is calculated, we can get back to this patch. It needs to 
> be rebased.

Here is the rebase, which seems ok.

See also the small issues raised aboud getPoissonRand in another email.

-- 
Fabien.

Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
> How should skipped transactions should be taken into account in the log file 
> output, with and without aggregation? I assume we'll want to have some trace 
> of skipped transactions in the logs.

The problem with this point is that how to report something "not done" is 
unclear, especially as the logic of the log is one line per performed 
transaction.

Obviously we can log something, but as the transaction are not performed 
the format would be different, which break the expectation of a simple and 
homogeneous log file format that people like to process directly.

So bar any great idea, I would suggest not to log skipped transactions and 
to wait for someone to need to have access to this detailed information 
and for whom the final report is not enough.

-- 
Fabien.



Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 09/11/2014 03:36 PM, Fabien COELHO wrote:
>
> Hello Heikki,
>
>> Now that I've finished the detour and committed and backpatched the changes
>> to the way latency is calculated, we can get back to this patch. It needs to
>> be rebased.
>
> Before rebasing, I think that there are a few small problems with the
> modification applied to switch from an integer range to double.
>
> (1) ISTM that the + 0.5 which remains in the PoissonRand computation comes
> from the previous integer approach and is not needed here. If I'm not
> mistaken the formula should be plain:
>
>        -log(uniform) * center

No. The +0.5 is to round the result to the nearest integer, instead of 
truncating it down.

> (2) I'm not sure of the name "center", I think that "lambda" or
>       "mean" would be more appropriate.

(shrug), I guess. The comment says that it's the value the average of a 
series values is centered on, so "center" doesn't seem too bad. I guess 
the mathematically accurate term would be "expected value".

> (3) I wish that the maximum implied multiplier could be explicitely
>       documented in the source code. From pg_rand48 source code, I think
>       that it is 33.27106466687737

Oh, ok. That's an even smaller multiplier than I got just by feeding 
DBL_MIN to the formula. I don't think that's worth worrying about. That 
might change if the implementation of pg_erand48() is changed, so I'm a 
bit reluctant to state it explicitly.

- Heikki




Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 09/11/2014 05:16 PM, Fabien COELHO wrote:
>
>> How should skipped transactions should be taken into account in the log file
>> output, with and without aggregation? I assume we'll want to have some trace
>> of skipped transactions in the logs.
>
> The problem with this point is that how to report something "not done" is
> unclear, especially as the logic of the log is one line per performed
> transaction.
>
> Obviously we can log something, but as the transaction are not performed
> the format would be different, which break the expectation of a simple and
> homogeneous log file format that people like to process directly.
>
> So bar any great idea, I would suggest not to log skipped transactions and
> to wait for someone to need to have access to this detailed information
> and for whom the final report is not enough.

We have to come up with something. The point of the log file is that it 
contains all the information you need to build your own graphs, when 
pgbench's built-in reporting features are not enough. If it doesn't 
contain detailed information about skipped transactions, a report based 
on the log file would be inaccurate, or at least misleading.

How about printing a line in the log for every skipped transaction, with 
the string "skipped" in place of the latency. The "completion time" can 
show the time when the transaction was skipped, and the lag can show the 
difference between the scheduled time and the time it was skipped. Or 
put another way, print a line as if the transaction completed 
immediately, but with the "skipped" string in the latency field.

The "skipped" string will trip a program that doesn't expect that, but 
since this is a new feature that you have to enable manually, that's OK.

The output would look something like this (modified from the manual's 
example by hand, so the numbers don't add up):

0 199 2241 0 1175850568 995598 1020
0 200 2465 0 1175850568 998079 1010
0 201 skipped 1175850569 608 3011
0 202 skipped 1175850569 608 2400
0 203 skipped 1175850569 608 1000
0 204 2513 0 1175850569 608 500
0 205 2038 0 1175850569 2663 500

- Heikki




Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
>> (1) ISTM that the + 0.5 which remains in the PoissonRand computation comes
>> from the previous integer approach and is not needed here. If I'm not
>> mistaken the formula should be plain:
>>
>>        -log(uniform) * center
>
> No. The +0.5 is to round the result to the nearest integer, instead of 
> truncating it down.

Hmmm... probably ok. I'll have to think about it a bit.

In that case, it seems much clearer to do: "round(-log(uniform) * xxx)" 
instead of relying on the truncation of the cast.

>> (2) I'm not sure of the name "center", I think that "lambda" or
>>       "mean" would be more appropriate.
>
> (shrug), I guess. The comment says that it's the value the average of a 
> series values is centered on, so "center" doesn't seem too bad. I guess the 
> mathematically accurate term would be "expected value".

The word "center" does not appear once of the wikipedia page about 
"Poisson distribution". Its "mean" is called "lambda" (or rather λ:-) all 
over the place. I find "expected value" rather too generic, but it is 
better than "center".

>> (3) I wish that the maximum implied multiplier could be explicitely
>>       documented in the source code. From pg_rand48 source code, I think
>>       that it is 33.27106466687737
>
> Oh, ok. That's an even smaller multiplier than I got just by feeding DBL_MIN 
> to the formula. I don't think that's worth worrying about. That might change 
> if the implementation of pg_erand48() is changed, so I'm a bit reluctant to 
> state it explicitly.

I think that it is an important information, so it deserves to appear.

If pg_erand48 implementation changes, it should not be called "48", 
because the max value and the above multiplier limit is completely linked 
to the "16*3=48" structure of the random construction.

If the code change then the comments need be updated, that is life.

-- 
Fabien.

Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
> The output would look something like this (modified from the manual's example 
> by hand, so the numbers don't add up):
>
> 0 199 2241 0 1175850568 995598 1020
> 0 200 2465 0 1175850568 998079 1010
> 0 201 skipped 1175850569 608 3011
> 0 202 skipped 1175850569 608 2400
> 0 203 skipped 1175850569 608 1000
> 0 204 2513 0 1175850569 608 500
> 0 205 2038 0 1175850569 2663 500

My 0.02€: ISTM that the number of columns should stay the same whether it 
is skipped or not, so the "file_no" should be kept. Maybe to keep it a 
number would make sense (-1) or just a sign (-) which means "no value" 
with gnuplot for instance. Or "skipped".

Basically I would be fine with that, but as I do not use the log file 
feature I'm not sure that my opinion should count.

Note that there are also potential issues with the aggregate logging and 
the sampling stuff.

-- 
Fabien.

Re: pgbench throttling latency limit

From
Gregory Smith
Date:
On 9/10/14, 10:57 AM, Fabien COELHO wrote:
> Indeed. I think that people do not like it to change. I remember that 
> I suggested to change timestamps to "xxxx.yyyyyy" instead of the 
> unreadable "xxxx yyy", and be told not to, because some people have 
> tool which process the output so the format MUST NOT CHANGE. So my 
> behavior is not to avoid touching anything in this area.

That somewhat hysterical version of events isn't what I said. Heikki has 
the right idea for backpatching, so let me expand on that rationale, 
with an eye toward whether 9.5 is the right time to deal with this.

Not all software out there will process epoch timestamps with 
milliseconds added as a fraction at the end.  Being able to read an 
epoch time in seconds as an integer is a well defined standard; the 
fraction part is not.

Here's an example of the problem, from a Mac OS X system:

$ date -j -f "%a %b %d %T %Z %Y" "`date`" "+%s"
1410544903
$ date -r 1410544903
Fri Sep 12 14:01:43 EDT 2014
$ date -r 1410544903.532
usage: date [-jnu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...            [-f fmt date |
[[[mm]dd]HH]MM[[cc]yy][.ss]][+format]
 

The current file format allows any random shell script to use a tool 
like cut to pull out the second resolution timestamp column as an epoch 
integer field, then pass it through even a utility as simple as date to 
reformat that.  And for a lot of people, second resolution is perfectly 
fine anyway.

The change you propose will make that job harder for some people, in 
order to make the job you're interested in easier.  I picked the 
simplest possible example, but there are more.  Whether epoch timestamps 
can have millisecond parts depends on your time library in Java, in 
Python some behavior depends on whether you have 2.6 or earlier, I don't 
think gnuplot handles milllisecond ones at all yet; the list goes on and 
on.  Some people will just have to apply a second split for timestamp 
string pgbench outputs, at the period and use the left side, where right 
now they can just split the whole thing on a space.

What you want to do is actually fine with me--and as far as I know, I'm 
the producer of the most popular pgbench latency parsing script 
around--but it will be a new sort of headache.  I just wanted the 
benefit to outweigh that.  Breaking the existing scripts and burning 
compatibility with simple utilities like date was not worth the tiny 
improvement you wanted in your personal workflow.  That's just not how 
we do things in PostgreSQL.

If there's a good case that the whole format needs to be changed anyway, 
like adding a new field, then we might as well switch to fractional 
epoch timestamps too now though.  When I added timestamps to the latency 
log in 8.3, parsers that handled milliseconds were even more rare.  
Today it's still inconsistent, but the workarounds are good enough to me 
now.  There's a lot more people using things like Python instead of bash 
pipelines here in 2014 too.

-- 
Greg Smith greg.smith@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/



Re: pgbench throttling latency limit

From
Robert Haas
Date:
On Fri, Sep 12, 2014 at 2:27 PM, Gregory Smith <gregsmithpgsql@gmail.com> wrote:
> If there's a good case that the whole format needs to be changed anyway,
> like adding a new field, then we might as well switch to fractional epoch
> timestamps too now though.  When I added timestamps to the latency log in
> 8.3, parsers that handled milliseconds were even more rare.  Today it's
> still inconsistent, but the workarounds are good enough to me now.  There's
> a lot more people using things like Python instead of bash pipelines here in
> 2014 too.

+1.  s/\..*// is not an onerous requirement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 09/12/2014 08:59 PM, Fabien COELHO wrote:
>
>> The output would look something like this (modified from the manual's example
>> by hand, so the numbers don't add up):
>>
>> 0 199 2241 0 1175850568 995598 1020
>> 0 200 2465 0 1175850568 998079 1010
>> 0 201 skipped 1175850569 608 3011
>> 0 202 skipped 1175850569 608 2400
>> 0 203 skipped 1175850569 608 1000
>> 0 204 2513 0 1175850569 608 500
>> 0 205 2038 0 1175850569 2663 500
>
> My 0.02€: ISTM that the number of columns should stay the same whether it
> is skipped or not, so the "file_no" should be kept.

Oh, sorry, I totally agree. I left file_no out by mistake.

> Maybe to keep it a
> number would make sense (-1) or just a sign (-) which means "no value"
> with gnuplot for instance. Or "skipped".
>
> Basically I would be fine with that, but as I do not use the log file
> feature I'm not sure that my opinion should count.
>
> Note that there are also potential issues with the aggregate logging and
> the sampling stuff.

Yep.

- Heikki



Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
> [about logging...]

Here is an attempt at updating the log features, including the aggregate 
and sampling stuff, with skipped transactions under throttling.

I moved the logging stuff into a function which is called when a 
transaction is skipped or finished.

From a log file format perspective, I think that "-" would be better than 
"skipped".

-- 
Fabien.

Re: pgbench throttling latency limit

From
Robert Haas
Date:
On Sat, Sep 13, 2014 at 4:25 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> [about logging...]
>
> Here is an attempt at updating the log features, including the aggregate and
> sampling stuff, with skipped transactions under throttling.
>
> I moved the logging stuff into a function which is called when a transaction
> is skipped or finished.
>
> From a log file format perspective, I think that "-" would be better than
> "skipped".

I like skipped.  That seems a lot more clear, and less likely to get
parsed as a numeric value by a careless regex like [+-]\d*

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 09/13/2014 11:25 AM, Fabien COELHO wrote:
>
>> [about logging...]
>
> Here is an attempt at updating the log features, including the aggregate
> and sampling stuff, with skipped transactions under throttling.
>
> I moved the logging stuff into a function which is called when a
> transaction is skipped or finished.

Makes sense.

I spent some time on this, and this is what I ended up with. Notable
changes:

* I split this into two patches for easier review. The first patch
contains the refactoring of the logging stuff, and the second patch
contains the new functionality.

* I renamed many of the variables, to be more consistent with existing
similar variables

* progress reporting was broken with !PTHREAD_FORK_EMULATION. Need to
collect the number of skipped xacts from all threads.

* I renamed the long option to --latency-limit. --limit is too generic.

Please have a look. I have not looked at the docs changes yet.

One thing that needs some thinking and changing is the progress
reporting. It currently looks like this:

progress: 1.0 s, 4863.0 tps, lat 3.491 ms stddev 2.487, lag 1.809 ms, 99
skipped
progress: 2.0 s, 5042.8 tps, lat 3.265 ms stddev 2.264, lag 1.584 ms, 16
skipped
progress: 3.0 s, 4926.1 tps, lat 2.731 ms stddev 2.371, lag 1.196 ms, 45
skipped
progress: 4.0 s, 4963.9 tps, lat 1.904 ms stddev 1.212, lag 0.429 ms, 0
skipped
progress: 5.0 s, 4971.2 tps, lat 2.592 ms stddev 1.722, lag 0.975 ms, 0
skipped

The absolute number of skipped transactions doesn't make much sense when
all the other numbers are averages, and tps is a 1/s value. If you don't
know the total number of transactions executed, the absolute number is
meaningless. (Although you can calculate the absolute number of
transactions executed by multiplying the TPS value with the interval). I
think a percentage would be better here.

Should we also print the number of late transactions here? I think that
would be an even more important detail than the number of skipped
transactions. It might be better to print only the percentage of late
transactions, including skipped ones. Or both, but it's difficult to
cram everything on a single line. This needs some further thinking..

- Heikki


Attachment

Re: pgbench throttling latency limit

From
Robert Haas
Date:
On Mon, Sep 15, 2014 at 6:34 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Please have a look. I have not looked at the docs changes yet.
>
> One thing that needs some thinking and changing is the progress reporting.
> It currently looks like this:
>
> progress: 1.0 s, 4863.0 tps, lat 3.491 ms stddev 2.487, lag 1.809 ms, 99
> skipped
> progress: 2.0 s, 5042.8 tps, lat 3.265 ms stddev 2.264, lag 1.584 ms, 16
> skipped
> progress: 3.0 s, 4926.1 tps, lat 2.731 ms stddev 2.371, lag 1.196 ms, 45
> skipped
> progress: 4.0 s, 4963.9 tps, lat 1.904 ms stddev 1.212, lag 0.429 ms, 0
> skipped
> progress: 5.0 s, 4971.2 tps, lat 2.592 ms stddev 1.722, lag 0.975 ms, 0
> skipped
>
> The absolute number of skipped transactions doesn't make much sense when all
> the other numbers are averages, and tps is a 1/s value. If you don't know
> the total number of transactions executed, the absolute number is
> meaningless. (Although you can calculate the absolute number of transactions
> executed by multiplying the TPS value with the interval). I think a
> percentage would be better here.
>
> Should we also print the number of late transactions here? I think that
> would be an even more important detail than the number of skipped
> transactions. It might be better to print only the percentage of late
> transactions, including skipped ones. Or both, but it's difficult to cram
> everything on a single line. This needs some further thinking..

I'm not sure I like the idea of printing a percentage.  It might be
unclear what the denominator was if somebody feels the urge to work
back to the actual number of skipped transactions.  I mean, I guess
it's probably just the value you passed to -R, so maybe that's easy
enough, but then why bother dividing in the first place?  The user can
do that easily enough if they want the data that way.

I agree with you that it would be good to get some statistics on
late/skipped transactions, but it's not obvious what people will want.
Late transactions, straight up?  Late by more than a threshold value?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
> I'm not sure I like the idea of printing a percentage.  It might be
> unclear what the denominator was if somebody feels the urge to work
> back to the actual number of skipped transactions.  I mean, I guess
> it's probably just the value you passed to -R, so maybe that's easy
> enough, but then why bother dividing in the first place?  The user can
> do that easily enough if they want the data that way.

Indeed "skipped" and "late" per second may have an unclear denominator. If 
you divide by the time, the unit would be "tps", but 120 tps performance 
including 20 late tps, plus 10 skipped tps... I do not think it is that 
clear. Reporting "tps" for transaction *not* performed looks strange.

Maybe late transactions could be given as a percentage of all processed 
transactions in the interval. But for skipped the percentage of what? The 
only number that would make sense is the total number of transactions 
schedule in the interval, but that would mean that the denominator for 
late would be different than the denominator for skipped, which is 
basically uncomprehensible.

> I agree with you that it would be good to get some statistics on
> late/skipped transactions, but it's not obvious what people will want.
> Late transactions, straight up?  Late by more than a threshold value?

Yes.

Under throttling transaction are given a schedule start time. When the 
transactions can actually start:
  (1) if it is already more late (before even starting) than the latency      limit (a threshold), it is *NOT* started,
butcounted "skipped"
 
  (2) otherwise it is started. When it finishes, it may be    (2a) out of the latency limit (scheduled time + limit)
    => it is counted as "late"    (2b) within the latency limit         => all is well
 

-- 
Fabien.



Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 09/15/2014 08:46 PM, Fabien COELHO wrote:
>
>> I'm not sure I like the idea of printing a percentage.  It might be
>> unclear what the denominator was if somebody feels the urge to work
>> back to the actual number of skipped transactions.  I mean, I guess
>> it's probably just the value you passed to -R, so maybe that's easy
>> enough, but then why bother dividing in the first place?  The user can
>> do that easily enough if they want the data that way.
>
> Indeed "skipped" and "late" per second may have an unclear denominator. If
> you divide by the time, the unit would be "tps", but 120 tps performance
> including 20 late tps, plus 10 skipped tps... I do not think it is that
> clear. Reporting "tps" for transaction *not* performed looks strange.
>
> Maybe late transactions could be given as a percentage of all processed
> transactions in the interval. But for skipped the percentage of what? The
> only number that would make sense is the total number of transactions
> schedule in the interval, but that would mean that the denominator for
> late would be different than the denominator for skipped, which is
> basically uncomprehensible.

Hmm. I guess the absolute number makes sense, if you expect that there
are normally zero skipped transactions, or at least a very small number.
It's more like a "good or no good" indicator. Ok, I'm fine with that.

The version I'm now working on prints output like this:

> $ ./pgbench -T10 -P1  --rate=1600 --latency-limit=10
> starting vacuum...end.
> progress: 1.0 s, 1579.0 tps, lat 2.973 ms stddev 2.493, lag 2.414 ms, 4 skipped
> progress: 2.0 s, 1570.0 tps, lat 2.140 ms stddev 1.783, lag 1.599 ms, 0 skipped
> progress: 3.0 s, 1663.0 tps, lat 2.372 ms stddev 1.742, lag 1.843 ms, 4 skipped
> progress: 4.0 s, 1603.2 tps, lat 2.435 ms stddev 2.247, lag 1.902 ms, 13 skipped
> progress: 5.0 s, 1540.9 tps, lat 1.845 ms stddev 1.270, lag 1.303 ms, 0 skipped
> progress: 6.0 s, 1588.0 tps, lat 1.630 ms stddev 1.003, lag 1.097 ms, 0 skipped
> progress: 7.0 s, 1577.0 tps, lat 2.071 ms stddev 1.445, lag 1.517 ms, 0 skipped
> progress: 8.0 s, 1669.9 tps, lat 2.375 ms stddev 1.917, lag 1.846 ms, 0 skipped
> progress: 9.0 s, 1636.0 tps, lat 2.801 ms stddev 2.354, lag 2.250 ms, 5 skipped
> progress: 10.0 s, 1606.1 tps, lat 2.751 ms stddev 2.117, lag 2.197 ms, 2 skipped
> transaction type: TPC-B (sort of)
> scaling factor: 5
> query mode: simple
> number of clients: 1
> number of threads: 1
> duration: 10 s
> number of transactions actually processed: 16034
> number of transactions skipped: 28 (0.174 %)
> number of transactions above the 10.0 ms latency limit: 70 (0.436 %)
> latency average: 2.343 ms
> latency stddev: 1.940 ms
> rate limit schedule lag: avg 1.801 (max 9.994) ms
> tps = 1603.370819 (including connections establishing)
> tps = 1603.619536 (excluding connections establishing)

Those progress lines are 79 or 80 characters wide, so they *just* fit in
a 80-char terminal. Of course, if any of the printed numbers were
higher, it would not fit. I don't see how to usefully make it more
terse, though. I think we can live with this - these days it shouldn't
be a huge problem to enlare the terminal to make the output fit.

Here are new patches, again the first one is just refactoring, and the
second one contains this feature. I'm planning to commit the first one
shortly, and the second one later after people have had a chance to look
at it.

Greg: As the author of pgbench-tools, what do you think of this patch?
The log file format, in particular.

- Heikki


Attachment

Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
Hello Heikki,

> Here are new patches, again the first one is just refactoring, and the second 
> one contains this feature. I'm planning to commit the first one shortly, and 
> the second one later after people have had a chance to look at it.

I looked at it. It looks ok, but for a few spurious spacing changes here 
and there. No big deal.

I tested it, everything I tested behaved as expected, so it is ok for me.

-- 
Fabien.



Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 10/05/2014 10:43 AM, Fabien COELHO wrote:
>
> Hello Heikki,
>
>> Here are new patches, again the first one is just refactoring, and the second
>> one contains this feature. I'm planning to commit the first one shortly, and
>> the second one later after people have had a chance to look at it.
>
> I looked at it. It looks ok, but for a few spurious spacing changes here
> and there. No big deal.
>
> I tested it, everything I tested behaved as expected, so it is ok for me.

Thanks!

I committed the refactoring patch earlier, and just went through the
second patch again. I wordsmithed the documentation and comments, and
fixed the documentation on the log format. I also fixed the logging of
skipped transactions so that the schedule lag is reported correctly for
them.

One thing bothers me with the log format. Here's an example:

>  0 81 4621 0 1412881037 912698 3005
>  0 82 6173 0 1412881037 914578 4304
>  0 83 skipped 0 1412881037 914578 5217
>  0 83 skipped 0 1412881037 914578 5099
>  0 83 4722 0 1412881037 916203 3108
>  0 84 4142 0 1412881037 918023 2333
>  0 85 2465 0 1412881037 919759 740

Note how the transaction counter is not incremented for skipped
transactions. That's understandable, since we're not including skipped
transactions in the number of transactions executed, but it means that
the skipped transactions don't have a unique ID. That's annoying.

Here's a new version of the patch. I'll sleep over it before committing,
but I think it's fine now, except maybe for the unique ID thing.

- Heikki


Attachment

Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
> One thing bothers me with the log format. Here's an example:
>
>>  0 81 4621 0 1412881037 912698 3005
>>  0 82 6173 0 1412881037 914578 4304
>>  0 83 skipped 0 1412881037 914578 5217
>>  0 83 skipped 0 1412881037 914578 5099
>>  0 83 4722 0 1412881037 916203 3108
>>  0 84 4142 0 1412881037 918023 2333
>>  0 85 2465 0 1412881037 919759 740
>
> Note how the transaction counter is not incremented for skipped transactions.
> That's understandable, since we're not including skipped transactions in the 
> number of transactions executed, but it means that the skipped transactions 
> don't have a unique ID. That's annoying.

Indeed. As transactions were not done, it does not make much sense to 
identify them. Otherwise it should report "intended" transactions and 
"performed" transactions, which would not help clarify the matter much.

My idea of "skipped" transactions, which are not transactions as such, is 
just a health quality measurement for both the throttling process and the 
database latency, so I would really let it as it is.

> Here's a new version of the patch. I'll sleep over it before committing, but 
> I think it's fine now, except maybe for the unique ID thing.

I saw a typo in a comment: "lateny" -> "latency". Otherwise it looks ok, 
and the documentation seems indeed clearer than before.

-- 
Fabien.



Re: pgbench throttling latency limit

From
Heikki Linnakangas
Date:
On 10/09/2014 10:39 PM, Fabien COELHO wrote:
>> One thing bothers me with the log format. Here's an example:
>>
>>>   0 81 4621 0 1412881037 912698 3005
>>>   0 82 6173 0 1412881037 914578 4304
>>>   0 83 skipped 0 1412881037 914578 5217
>>>   0 83 skipped 0 1412881037 914578 5099
>>>   0 83 4722 0 1412881037 916203 3108
>>>   0 84 4142 0 1412881037 918023 2333
>>>   0 85 2465 0 1412881037 919759 740
>>
>> Note how the transaction counter is not incremented for skipped transactions.
>> That's understandable, since we're not including skipped transactions in the
>> number of transactions executed, but it means that the skipped transactions
>> don't have a unique ID. That's annoying.
>
> Indeed. As transactions were not done, it does not make much sense to
> identify them. Otherwise it should report "intended" transactions and
> "performed" transactions, which would not help clarify the matter much.
>
> My idea of "skipped" transactions, which are not transactions as such, is
> just a health quality measurement for both the throttling process and the
> database latency, so I would really let it as it is.

Hmm. I wonder if this is going to be a problem for programs that might 
try to load the log file into a database table. No using transaction ID 
as a unique key. Then again, you'll have to somehow deal with "skipped" 
anyway.

>> Here's a new version of the patch. I'll sleep over it before committing, but
>> I think it's fine now, except maybe for the unique ID thing.
>
> I saw a typo in a comment: "lateny" -> "latency". Otherwise it looks ok,
> and the documentation seems indeed clearer than before.

Ok, committed after a few more typo-fixes.

Greg Smith, I'd still appreciate it if you could take a look at this, to 
check how this will work for pgbench-tools.

- Heikki




Re: pgbench throttling latency limit

From
Gregory Smith
Date:
On 10/13/14, 1:54 PM, Heikki Linnakangas wrote:
> Greg Smith, I'd still appreciate it if you could take a look at this, 
> to check how this will work for pgbench-tools.

I'll do a QA pass on the committed version looking for issues, and 
update the toolchain I publish to be compatible with it along the way too.

-- 
Greg Smith greg.smith@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/



Re: pgbench throttling latency limit

From
Andres Freund
Date:
On 2014-08-14 15:01:53 +0200, Fabien COELHO wrote:
> 
> Add --limit to limit latency under throttling
> 
> Under throttling, transactions are scheduled for execution at certain times.
> Transactions may be far behind schedule and the system may catch up with the
> load later. This option allows to change this behavior by skipping
> transactions which are too far behind schedule, and count those as skipped.
> 
> The idea is to help simulate a latency-constrained environment such as a
> database used by a web server.

I was just trying to run tests with this, but as far as I can see it
doesn't really work:

pgbench postgres -M prepared -c 72 -j 72 -P 5 -T 3600 -R40000 -L100
...
progress: 240.0 s, 40191.8 tps, lat 1.250 ms stddev 0.965, lag 0.501 ms, 0 skipped
progress: 245.0 s, 39722.1 tps, lat 1.128 ms stddev 0.946, lag 0.435 ms, 0 skipped
progress: 250.0 s, 40074.5 tps, lat 1.059 ms stddev 0.745, lag 0.391 ms, 0 skipped
progress: 255.0 s, 40001.4 tps, lat 1.038 ms stddev 0.680, lag 0.377 ms, 0 skipped
progress: 260.0 s, 40147.6 tps, lat 1.161 ms stddev 0.950, lag 0.448 ms, 0 skipped
progress: 265.0 s, 39980.1 tps, lat 1.186 ms stddev 0.862, lag 0.457 ms, 0 skipped
progress: 270.0 s, 40090.9 tps, lat 1.292 ms stddev 1.239, lag 0.544 ms, 0 skipped
progress: 275.0 s, 33847.8 tps, lat 26.617 ms stddev 41.681, lag 25.317 ms, 26698 skipped
progress: 280.0 s, 20237.9 tps, lat 96.041 ms stddev 11.983, lag 92.510 ms, 97745 skipped
progress: 285.0 s, 24385.0 tps, lat 94.490 ms stddev 10.865, lag 91.514 ms, 80944 skipped
progress: 290.0 s, 27349.6 tps, lat 92.755 ms stddev 10.905, lag 90.136 ms, 62268 skipped
progress: 295.0 s, 28382.1 tps, lat 92.752 ms stddev 10.238, lag 90.212 ms, 58253 skipped
progress: 300.0 s, 28798.3 tps, lat 92.673 ms stddev 10.506, lag 90.172 ms, 56741 skipped
progress: 305.0 s, 29346.6 tps, lat 91.659 ms stddev 10.982, lag 89.210 ms, 53163 skipped
progress: 310.0 s, 30072.9 tps, lat 91.190 ms stddev 11.071, lag 88.802 ms, 48370 skipped
progress: 315.0 s, 30733.2 tps, lat 90.893 ms stddev 11.312, lag 88.548 ms, 47020 skipped
progress: 320.0 s, 31170.9 tps, lat 89.498 ms stddev 12.132, lag 87.192 ms, 43403 skipped
progress: 325.0 s, 33399.0 tps, lat 85.795 ms stddev 15.196, lag 83.639 ms, 32923 skipped
progress: 330.0 s, 22969.8 tps, lat 91.929 ms stddev 14.762, lag 88.805 ms, 84780 skipped
progress: 335.0 s, 18913.3 tps, lat 95.236 ms stddev 14.523, lag 91.444 ms, 104960 skipped
progress: 340.0 s, 20061.2 tps, lat 95.258 ms stddev 13.284, lag 91.660 ms, 100396 skipped
progress: 345.0 s, 20405.3 tps, lat 94.781 ms stddev 13.794, lag 91.255 ms, 98510 skipped
progress: 350.0 s, 20596.0 tps, lat 94.661 ms stddev 13.345, lag 91.189 ms, 95426 skipped
progress: 355.0 s, 13635.7 tps, lat 96.998 ms stddev 38.039, lag 91.691 ms, 132598 skipped
progress: 360.0 s, 16648.0 tps, lat 95.138 ms stddev 26.329, lag 90.809 ms, 117129 skipped
progress: 365.0 s, 18392.1 tps, lat 94.857 ms stddev 23.917, lag 90.980 ms, 106244 skipped

100k skipped transactions at a rate limit of 40k? That doesn't seem right.

(that's master server/pgbench as of 5be94a9eb15a)

Regards,

Andres



Re: pgbench throttling latency limit

From
Andres Freund
Date:
On 2015-10-20 20:55:46 +0200, Andres Freund wrote:
> On 2014-08-14 15:01:53 +0200, Fabien COELHO wrote:
> > 
> > Add --limit to limit latency under throttling
> > 
> > Under throttling, transactions are scheduled for execution at certain times.
> > Transactions may be far behind schedule and the system may catch up with the
> > load later. This option allows to change this behavior by skipping
> > transactions which are too far behind schedule, and count those as skipped.
> > 
> > The idea is to help simulate a latency-constrained environment such as a
> > database used by a web server.
> 
> I was just trying to run tests with this, but as far as I can see it
> doesn't really work:
> 
> pgbench postgres -M prepared -c 72 -j 72 -P 5 -T 3600 -R40000 -L100

> progress: 365.0 s, 18392.1 tps, lat 94.857 ms stddev 23.917, lag 90.980 ms, 106244 skipped
> 
> 100k skipped transactions at a rate limit of 40k? That doesn't seem right.

Argh. It's just because I used -P5. It's a bit confusing that the other
options are per second, and this is per interval...

Andres



Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
Hello Andres,

>> pgbench postgres -M prepared -c 72 -j 72 -P 5 -T 3600 -R40000 -L100
>
>> progress: 365.0 s, 18392.1 tps, lat 94.857 ms stddev 23.917, lag 90.980 ms, 106244 skipped
>>
>> 100k skipped transactions at a rate limit of 40k? That doesn't seem right.
>
> Argh. It's just because I used -P5. It's a bit confusing that the other
> options are per second, and this is per interval...

I agree, but I'm unsure of a fix, beyond what is already done which is to 
show units next to the figures...

ISTM that people expect "tps" for performance, even on several seconds. 
When it comes to skipped transactions, a count seemed more natural. I 
really just see this as an indicator that things are not going smoothly.

Maybe it could be shown as a percentage of scheduled transactions, 
possibly with an option?

A mitigation is to always run with -P 1 :-).

-- 
Fabien.



Re: pgbench throttling latency limit

From
Amit Langote
Date:
On 2015/10/22 18:20, Fabien COELHO wrote:
>>
>>> progress: 365.0 s, 18392.1 tps, lat 94.857 ms stddev 23.917, lag 90.980
>>> ms, 106244 skipped
>>>
>>> 100k skipped transactions at a rate limit of 40k? That doesn't seem right.
>>
>> Argh. It's just because I used -P5. It's a bit confusing that the other
>> options are per second, and this is per interval...
> 
> I agree, but I'm unsure of a fix, beyond what is already done which is to
> show units next to the figures...
> 
> ISTM that people expect "tps" for performance, even on several seconds.
> When it comes to skipped transactions, a count seemed more natural. I
> really just see this as an indicator that things are not going smoothly.
> 
> Maybe it could be shown as a percentage of scheduled transactions,
> possibly with an option?
> 
> A mitigation is to always run with -P 1 :-).

Wouldn't printing average (per second) over the interval work?

Thanks,
Amit




Re: pgbench throttling latency limit

From
Fabien COELHO
Date:
>>> Argh. It's just because I used -P5. It's a bit confusing that the other
>>> options are per second, and this is per interval...
>>
>> I agree, but I'm unsure of a fix, beyond what is already done which is to
>> show units next to the figures...
>>
>> ISTM that people expect "tps" for performance, even on several seconds.
>> When it comes to skipped transactions, a count seemed more natural. I
>> really just see this as an indicator that things are not going smoothly.
>>
>> Maybe it could be shown as a percentage of scheduled transactions,
>> possibly with an option?
>>
>> A mitigation is to always run with -P 1 :-).
>
> Wouldn't printing average (per second) over the interval work?

Yes it would. That would be "skipped tps". Why not. The percentage seems 
also attractive to me, because it does not matter whether you get big 
figures or small figures as it is relative.

-- 
Fabien.