Thread: pgbench rate limiting changes transaction latency computation

pgbench rate limiting changes transaction latency computation

From
Andres Freund
Date:
Hi,

I noticed that pgbench's -R influences not just the computation of lag,
but also of latency. That doesn't look right to me, but maybe I'm just
missing something?

It's quite easy to demonstrate when running pgbench with/without
progress report at a transaction rate that's around the limit of what
the server can do:

andres@alap4:~/src/postgresql$ pgbench -n -M prepared -c 1 -j 1 -T 100000 -P1 -r -S pgbench_10
progress: 1.0 s, 37416.3 tps, lat 0.027 ms stddev 0.013
progress: 2.0 s, 37345.1 tps, lat 0.027 ms stddev 0.011
progress: 3.0 s, 38787.8 tps, lat 0.026 ms stddev 0.009
...

andres@alap4:~/src/postgresql$ pgbench -n -M prepared -c 1 -j 1 -T 100000 -P1 -r -S -R 37000 pgbench_10
progress: 1.0 s, 32792.8 tps, lat 81.795 ms stddev 35.552, lag 81.765 ms
progress: 2.0 s, 37770.6 tps, lat 113.194 ms stddev 4.651, lag 113.168 ms
progress: 3.0 s, 37006.3 tps, lat 113.905 ms stddev 5.007, lag 113.878 ms

That's obviously a very different result.

ISTM that's because processXactStats() computes latency as:

latency = INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled;

which is set differently when throttling is enabled:

                /*
                 * When not throttling, this is also the transaction's
                 * scheduled start time.
                 */
                if (!throttle_delay)
                    st->txn_scheduled = INSTR_TIME_GET_MICROSEC(now);


replacing latency computation with

latency = INSTR_TIME_GET_MICROSEC(*now) - INSTR_TIME_GET_MICROSEC(st->txn_begin);

immediately makes the result make more sense:

andres@alap4:~/src/postgresql$ pgbench -n -M prepared -c 1 -j 1 -T 100000 -P1 -r -S -R 37000 pgbench_10
progress: 1.0 s, 37141.7 tps, lat 0.026 ms stddev 0.011, lag 1.895 ms
progress: 2.0 s, 36805.6 tps, lat 0.026 ms stddev 0.012, lag 0.670 ms
progress: 3.0 s, 37033.5 tps, lat 0.026 ms stddev 0.012, lag 1.067 ms

and you still get lag if the rate is too high:

andres@alap4:~/src/postgresql$ pgbench -n -M prepared -c 1 -j 1 -T 100000 -P1 -r -S -R 80000 pgbench_10
progress: 1.0 s, 37628.8 tps, lat 0.026 ms stddev 0.016, lag 287.379 ms
progress: 2.0 s, 39651.8 tps, lat 0.025 ms stddev 0.008, lag 790.527 ms
progress: 3.0 s, 39254.8 tps, lat 0.025 ms stddev 0.009, lag 1290.833 ms
progress: 4.0 s, 38859.5 tps, lat 0.026 ms stddev 0.009, lag 1808.529 ms
progress: 5.0 s, 39699.0 tps, lat 0.025 ms stddev 0.008, lag 2307.732 ms
progress: 6.0 s, 39297.0 tps, lat 0.025 ms stddev 0.009, lag 2813.291 ms
progress: 7.0 s, 39880.6 tps, lat 0.025 ms stddev 0.008, lag 3315.430 ms

Fabien, is this a bug, or am I misunderstanding something?

Greetings,

Andres Freund



Re: pgbench rate limiting changes transaction latency computation

From
Andres Freund
Date:
Hi

On 2019-06-10 21:56:31 -0700, Andres Freund wrote:
> I noticed that pgbench's -R influences not just the computation of lag,
> but also of latency. That doesn't look right to me, but maybe I'm just
> missing something?

I apparently did:

> -P sec
> --progress=sec
> 
>     Show progress report every sec seconds. The report includes the time
>     since the beginning of the run, the TPS since the last report, and
>     the transaction latency average and standard deviation since the
>     last report. Under throttling (-R), the latency is computed with
>     respect to the transaction scheduled start time, not the actual
>     transaction beginning time, thus it also includes the average
>     schedule lag time.

But that makes very little sense to me. I see that was changed by Heikki
in

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=02e3bcc661598275a75a8649b48602dca7aeec79
>  Change the way latency is calculated with pgbench --rate option.
> 
> The reported latency values now include the "schedule lag" time, that is,
> the time between the transaction's scheduled start time and the time it
> actually started. This relates better to a model where requests arrive at a
> certain rate, and we are interested in the response time to the end user or
> application, rather than the response time of the database itself.
> 
> Also, when --rate is used, include the schedule lag time in the log output.
> 
> The --rate option is new in 9.4, so backpatch to 9.4. It seems better to
> make this change in 9.4, while we're still in the beta period, than ship a
> 9.4 version that calculates the values differently than 9.5.

I find that pretty unconvincing. Report a new field, sure. But what's
the point of changing an *established* field, just due to rate limiting?
At the very least that ought to commented upon in the code as well.

Doesn't this mean that latency and lag are quite redundant, just more
obscure, due this change?

        latency = INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled;
        lag = INSTR_TIME_GET_MICROSEC(st->txn_begin) - st->txn_scheduled;

I guess I can just subtract latency from lag to get to the non-throttled
latency. But that is, uh, odd.

Greetings,

Andres Freund



Re: pgbench rate limiting changes transaction latency computation

From
Fabien COELHO
Date:
Hello Andres,

> I noticed that pgbench's -R influences not just the computation of lag,
> but also of latency. That doesn't look right to me, but maybe I'm just
> missing something?
>
> It's quite easy to demonstrate when running pgbench with/without
> progress report at a transaction rate that's around the limit of what
> the server can do:
>
> andres@alap4:~/src/postgresql$ pgbench -n -M prepared -c 1 -j 1 -T 100000 -P1 -r -S pgbench_10
> progress: 1.0 s, 37416.3 tps, lat 0.027 ms stddev 0.013
> progress: 2.0 s, 37345.1 tps, lat 0.027 ms stddev 0.011
> progress: 3.0 s, 38787.8 tps, lat 0.026 ms stddev 0.009
> ...
>
> andres@alap4:~/src/postgresql$ pgbench -n -M prepared -c 1 -j 1 -T 100000 -P1 -r -S -R 37000 pgbench_10
> progress: 1.0 s, 32792.8 tps, lat 81.795 ms stddev 35.552, lag 81.765 ms
> progress: 2.0 s, 37770.6 tps, lat 113.194 ms stddev 4.651, lag 113.168 ms
> progress: 3.0 s, 37006.3 tps, lat 113.905 ms stddev 5.007, lag 113.878 ms

[...]

> Fabien, is this a bug, or am I misunderstanding something?

This behavior under -R is fully voluntary, and the result above just show 
that the database cannot really keep up with the load, which is simply the 
case, so for me it is okay to show bad figures.

The idea under throttling is to model a client which would want the result 
of a query at a certain point in time, say a query for a web page which is 
being generated, which is the scheduled time. It is the when the client 
knows it wants an answer. If it is not processed immediately, that is bad 
for its client perceived latency.

Whether this is due to lag (i.e. the server is loaded and cannot start to 
process the answer) or because the server is slow to answer is irrelevant, 
the client is waiting, the web page is not generated, the system is slow. 
So latency under -R is really "client latency", not only query latency, as 
it is documented.

You can offset the lag to get the query latency only, but from a client 
perspective the fact is that the system does not keep up with the 
scheduled load is the main information, thus this is what is displayed. 
The bad figures reflect a bad behavior wrt handling the load. For me it is 
what should be wanted under -R. Maybe it could be more clearly documented, 
but for me this is the right behavior, and it is I wanted to measure with 
throttling.

Under this performance model, the client would give up its requests after 
some time, hence the available --latency-limit option.

-- 
Fabien.



Re: pgbench rate limiting changes transaction latency computation

From
Andres Freund
Date:
Hi,

On 2019-06-11 08:36:55 +0200, Fabien COELHO wrote:
> > I noticed that pgbench's -R influences not just the computation of lag,
> > but also of latency. That doesn't look right to me, but maybe I'm just
> > missing something?
> > 
> > It's quite easy to demonstrate when running pgbench with/without
> > progress report at a transaction rate that's around the limit of what
> > the server can do:
> > 
> > andres@alap4:~/src/postgresql$ pgbench -n -M prepared -c 1 -j 1 -T 100000 -P1 -r -S pgbench_10
> > progress: 1.0 s, 37416.3 tps, lat 0.027 ms stddev 0.013
> > progress: 2.0 s, 37345.1 tps, lat 0.027 ms stddev 0.011
> > progress: 3.0 s, 38787.8 tps, lat 0.026 ms stddev 0.009
> > ...
> > 
> > andres@alap4:~/src/postgresql$ pgbench -n -M prepared -c 1 -j 1 -T 100000 -P1 -r -S -R 37000 pgbench_10
> > progress: 1.0 s, 32792.8 tps, lat 81.795 ms stddev 35.552, lag 81.765 ms
> > progress: 2.0 s, 37770.6 tps, lat 113.194 ms stddev 4.651, lag 113.168 ms
> > progress: 3.0 s, 37006.3 tps, lat 113.905 ms stddev 5.007, lag 113.878 ms
> 
> [...]
> 
> > Fabien, is this a bug, or am I misunderstanding something?
> 
> This behavior under -R is fully voluntary, and the result above just show
> that the database cannot really keep up with the load, which is simply the
> case, so for me it is okay to show bad figures.

I mean, you just turned one named value, into a different one, without
renaming it. And the new meaning under -R, is basically the same as one
that's already there (lag). Also note that it also can actually keep up
in the above example.


> The idea under throttling is to model a client which would want the result
> of a query at a certain point in time, say a query for a web page which is
> being generated, which is the scheduled time. It is the when the client
> knows it wants an answer. If it is not processed immediately, that is bad
> for its client perceived latency.

> Whether this is due to lag (i.e. the server is loaded and cannot start to
> process the answer) or because the server is slow to answer is irrelevant,
> the client is waiting, the web page is not generated, the system is slow. So
> latency under -R is really "client latency", not only query latency, as it
> is documented.

What does that have to do with incorporating the same data into both lag
and latency? I just fail to see what the point is, except to make it
unnecessarily harder to compare postgres' behaviour under both a
throttled and push-it-to-the-breaking point loads.

How long individual transactions take, and how much variance there is in
that, is something *crucial* to optimize for. *Especially* when the
machine/load is provisioned in a way to not overload the machine.

How is e.g.
progress: 1.6 s, 0.0 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms
progress: 2.0 s, 103546.5 tps, lat 1584.161 ms stddev 35.589, lag 1582.043 ms
progress: 3.0 s, 108535.2 tps, lat 1347.619 ms stddev 101.782, lag 1346.170 ms
progress: 4.0 s, 108528.8 tps, lat 996.603 ms stddev 106.052, lag 995.159 ms
progress: 5.0 s, 109468.8 tps, lat 633.464 ms stddev 108.483, lag 632.030 ms
progress: 6.0 s, 110606.7 tps, lat 252.923 ms stddev 110.391, lag 251.505 ms
progress: 7.0 s, 84253.3 tps, lat 6.829 ms stddev 15.067, lag 6.423 ms
progress: 8.0 s, 80470.7 tps, lat 0.142 ms stddev 0.079, lag 0.017 ms
progress: 9.0 s, 80104.2 tps, lat 0.142 ms stddev 0.081, lag 0.017 ms
progress: 10.0 s, 80277.0 tps, lat 0.152 ms stddev 0.150, lag 0.017 ms

the lat column adds basically nothing over the lag column here.

more useful than:
progress: 1.3 s, 0.0 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms
progress: 2.0 s, 116315.6 tps, lat 1.425 ms stddev 1.440, lag 1087.076 ms
progress: 3.0 s, 113526.2 tps, lat 1.390 ms stddev 0.408, lag 709.908 ms
progress: 4.0 s, 111816.4 tps, lat 1.407 ms stddev 0.399, lag 302.866 ms
progress: 5.0 s, 88061.9 tps, lat 0.543 ms stddev 0.652, lag 16.526 ms
progress: 6.0 s, 80045.4 tps, lat 0.128 ms stddev 0.079, lag 0.017 ms
progress: 7.0 s, 79636.3 tps, lat 0.124 ms stddev 0.073, lag 0.016 ms
progress: 8.0 s, 80535.3 tps, lat 0.125 ms stddev 0.073, lag 0.016 ms

where I can see that the transactions are now actually fast enough.
Obviously this is a toy example, but this really make -R close to
useless to me.  I often want to switch from a unthrottled to a 90% load,
and improve the outlier beheaviour - but that outlier behaviour is
hidden due to this redefinition of lat (as the issue is now reported
over a much longer period of time, as it includes lag).

I think we should just restore lat to a sane behaviour under -R, and if
you want to have lat + lag as a separate column in -R mode, then let's
do that.

Greetings,

Andres Freund



Re: pgbench rate limiting changes transaction latency computation

From
Heikki Linnakangas
Date:
On 11/06/2019 10:12, Andres Freund wrote:
> On 2019-06-11 08:36:55 +0200, Fabien COELHO wrote:
>> This behavior under -R is fully voluntary, and the result above just show
>> that the database cannot really keep up with the load, which is simply the
>> case, so for me it is okay to show bad figures.
> 
> I mean, you just turned one named value, into a different one, without
> renaming it. And the new meaning under -R, is basically the same as one
> that's already there (lag). Also note that it also can actually keep up
> in the above example.

It's not fair to say that its meaning was changed. Before 9.4, there was 
no -R option. As Fabien said, the reported latency is the latency seen 
by the imaginary user of the system, and without -R, there's no lag so 
it's the same number. See also how it works with --latency-limit. The 
limit is on the reported latency, which includes the lag.

Yeah, I can see that the server-observed transaction latency would often 
be more useful than what's printed now. But changing the current meaning 
doesn't seem like a good idea.

>> The idea under throttling is to model a client which would want the result
>> of a query at a certain point in time, say a query for a web page which is
>> being generated, which is the scheduled time. It is the when the client
>> knows it wants an answer. If it is not processed immediately, that is bad
>> for its client perceived latency.
> 
>> Whether this is due to lag (i.e. the server is loaded and cannot start to
>> process the answer) or because the server is slow to answer is irrelevant,
>> the client is waiting, the web page is not generated, the system is slow. So
>> latency under -R is really "client latency", not only query latency, as it
>> is documented.
> 
> What does that have to do with incorporating the same data into both lag
> and latency? I just fail to see what the point is, except to make it
> unnecessarily harder to compare postgres' behaviour under both a
> throttled and push-it-to-the-breaking point loads.
> 
> How long individual transactions take, and how much variance there is in
> that, is something *crucial* to optimize for. *Especially* when the
> machine/load is provisioned in a way to not overload the machine.
> 
> How is e.g.
> progress: 1.6 s, 0.0 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms
> progress: 2.0 s, 103546.5 tps, lat 1584.161 ms stddev 35.589, lag 1582.043 ms
> progress: 3.0 s, 108535.2 tps, lat 1347.619 ms stddev 101.782, lag 1346.170 ms
> progress: 4.0 s, 108528.8 tps, lat 996.603 ms stddev 106.052, lag 995.159 ms
> progress: 5.0 s, 109468.8 tps, lat 633.464 ms stddev 108.483, lag 632.030 ms
> progress: 6.0 s, 110606.7 tps, lat 252.923 ms stddev 110.391, lag 251.505 ms
> progress: 7.0 s, 84253.3 tps, lat 6.829 ms stddev 15.067, lag 6.423 ms
> progress: 8.0 s, 80470.7 tps, lat 0.142 ms stddev 0.079, lag 0.017 ms
> progress: 9.0 s, 80104.2 tps, lat 0.142 ms stddev 0.081, lag 0.017 ms
> progress: 10.0 s, 80277.0 tps, lat 0.152 ms stddev 0.150, lag 0.017 ms
> 
> the lat column adds basically nothing over the lag column here.
> 
> more useful than:
> progress: 1.3 s, 0.0 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms
> progress: 2.0 s, 116315.6 tps, lat 1.425 ms stddev 1.440, lag 1087.076 ms
> progress: 3.0 s, 113526.2 tps, lat 1.390 ms stddev 0.408, lag 709.908 ms
> progress: 4.0 s, 111816.4 tps, lat 1.407 ms stddev 0.399, lag 302.866 ms
> progress: 5.0 s, 88061.9 tps, lat 0.543 ms stddev 0.652, lag 16.526 ms
> progress: 6.0 s, 80045.4 tps, lat 0.128 ms stddev 0.079, lag 0.017 ms
> progress: 7.0 s, 79636.3 tps, lat 0.124 ms stddev 0.073, lag 0.016 ms
> progress: 8.0 s, 80535.3 tps, lat 0.125 ms stddev 0.073, lag 0.016 ms
> 
> where I can see that the transactions are now actually fast enough.
> Obviously this is a toy example, but this really make -R close to
> useless to me.  I often want to switch from a unthrottled to a 90% load,
> and improve the outlier beheaviour - but that outlier behaviour is
> hidden due to this redefinition of lat (as the issue is now reported
> over a much longer period of time, as it includes lag).

The outlier behavior seems very visible in both of the above. The system 
completely stalled for about 1-2 seconds. And then it takes a few 
seconds to process the backlog and catch up.

For testing the server under full load, like during that catch up 
period, testing without -R seems better. Or perhaps you'd want to use 
the --latency-limit option? You said that the transactions are now "fast 
enough", so that might be a better fit for what you're trying to model.

> I think we should just restore lat to a sane behaviour under -R, and if
> you want to have lat + lag as a separate column in -R mode, then let's
> do that.

It seems like a bad idea to change the meaning of the value after the 
fact. Would be good to at least rename it, to avoid confusion. Maybe 
that's not too important for the interactive -P reports, but got to be 
mindful about the numbers logged in the log file, at least.

If you change it, please also consider how it plays together with 
--latency-limit.

- Heikki



Re: pgbench rate limiting changes transaction latency computation

From
Andres Freund
Date:
Hi,

On 2019-06-11 11:31:15 +0300, Heikki Linnakangas wrote:
> It's not fair to say that its meaning was changed. Before 9.4, there was no
> -R option.

Well, my point is that -R changed the existing meaning of a field, and
that that's not nice.


> Yeah, I can see that the server-observed transaction latency would often be
> more useful than what's printed now. But changing the current meaning
> doesn't seem like a good idea.

Well, then a *new* column should have been added for that value under
-R. Although admittedly 'lat' is not a very good descriptor for the non
-R behaviour.

But anyway, to go forward, I think we should replace 'lat' with a
'txtime' (or similar) column that is not affected by -R. And then, under
-R only, add a new 'txlat' column, that shows the 'current' meaning of
lat under -R.  Not convinced the names are right, but you get the gist.


> As Fabien said, the reported latency is the latency seen by the
> imaginary user of the system, and without -R, there's no lag so it's the
> same number. See also how it works with --latency-limit. The limit is on the
> reported latency, which includes the lag.

Well, that's how it works in a lot of scenarios (e.g. interactive
scenarios, where users give up). But there's also a lot where the amount
of work doesn't decrease due to lateness, it just queues up (e.g. many
batch / queue processing workloads).


> The outlier behavior seems very visible in both of the above. The system
> completely stalled for about 1-2 seconds. And then it takes a few seconds to
> process the backlog and catch up.

But that was just because I was showing a simplistic
example. E.g. here's a log of a vacuum finishing, and then another
starting a few seconds later (both vacuums lasting a fair while):

progress: 139.0 s, 2438.4 tps, txtime 13.033 ms stddev 3.830, lat 17530.784 ms stddev 590.153, lag 17517.751 ms
progress: 140.0 s, 2489.0 tps, txtime 12.911 ms stddev 3.642, lat 17752.862 ms stddev 600.661, lag 17739.952 ms
progress: 141.0 s, 2270.0 tps, txtime 14.021 ms stddev 4.965, lat 17973.805 ms stddev 594.784, lag 17959.784 ms
progress: 142.0 s, 1408.0 tps, txtime 22.848 ms stddev 5.365, lat 18417.808 ms stddev 632.729, lag 18394.960 ms
progress: 143.0 s, 3001.0 tps, txtime 10.724 ms stddev 4.318, lat 18796.971 ms stddev 617.462, lag 18786.247 ms
progress: 144.0 s, 4678.0 tps, txtime 6.823 ms stddev 2.136, lat 18503.253 ms stddev 669.072, lag 18496.431 ms
progress: 145.0 s, 4577.0 tps, txtime 7.001 ms stddev 1.526, lat 18108.596 ms stddev 689.843, lag 18101.596 ms
progress: 146.0 s, 2596.0 tps, txtime 12.261 ms stddev 3.060, lat 17961.623 ms stddev 683.498, lag 17949.363 ms
progress: 147.0 s, 2654.0 tps, txtime 12.072 ms stddev 3.282, lat 18120.009 ms stddev 685.074, lag 18107.938 ms
progress: 148.0 s, 3471.0 tps, txtime 9.240 ms stddev 3.702, lat 18251.712 ms stddev 676.572, lag 18242.472 ms
progress: 149.0 s, 3056.0 tps, txtime 10.468 ms stddev 5.131, lat 18058.950 ms stddev 675.334, lag 18048.482 ms
progress: 150.0 s, 2319.0 tps, txtime 13.778 ms stddev 3.762, lat 18305.101 ms stddev 688.186, lag 18291.323 ms
progress: 151.0 s, 2355.0 tps, txtime 13.567 ms stddev 3.891, lat 18586.073 ms stddev 691.656, lag 18572.506 ms
progress: 152.0 s, 2321.0 tps, txtime 13.742 ms stddev 3.708, lat 18835.985 ms stddev 709.580, lag 18822.244 ms
progress: 153.0 s, 2360.0 tps, txtime 13.604 ms stddev 3.533, lat 19121.166 ms stddev 709.682, lag 19107.562 ms

The period inbetween where no vacuum was running is imo considerably
harder to spot looking at 'lat'. I guess you can argue that one can just
look at tps instead. But for other rate limited cases that doesn't work
as well:

progress: 121.0 s, 961.0 tps, txtime 3.452 ms stddev 0.947, lat 3.765 ms stddev 1.268, lag 0.313 ms
progress: 122.0 s, 979.0 tps, txtime 5.388 ms stddev 8.737, lat 7.378 ms stddev 11.137, lag 1.990 ms
progress: 123.0 s, 1078.8 tps, txtime 3.679 ms stddev 1.278, lat 4.322 ms stddev 3.216, lag 0.643 ms
progress: 124.0 s, 1082.2 tps, txtime 5.575 ms stddev 9.790, lat 8.716 ms stddev 15.317, lag 3.141 ms
progress: 125.0 s, 990.0 tps, txtime 3.489 ms stddev 1.148, lat 3.817 ms stddev 1.456, lag 0.328 ms
progress: 126.0 s, 955.0 tps, txtime 9.284 ms stddev 15.362, lat 14.210 ms stddev 22.084, lag 4.926 ms
progress: 127.0 s, 960.0 tps, txtime 11.951 ms stddev 11.222, lat 17.732 ms stddev 21.066, lag 5.781 ms
progress: 128.0 s, 945.9 tps, txtime 11.702 ms stddev 17.590, lat 23.791 ms stddev 45.327, lag 12.089 ms
progress: 129.0 s, 1013.1 tps, txtime 19.871 ms stddev 19.407, lat 42.530 ms stddev 39.582, lag 22.659 ms
progress: 130.0 s, 1004.7 tps, txtime 12.748 ms stddev 7.864, lat 19.827 ms stddev 20.084, lag 7.079 ms
progress: 131.0 s, 1025.2 tps, txtime 9.005 ms stddev 16.524, lat 18.491 ms stddev 29.864, lag 9.485 ms
progress: 132.0 s, 1015.9 tps, txtime 3.366 ms stddev 0.885, lat 3.640 ms stddev 1.182, lag 0.274 ms
progress: 133.0 s, 1013.1 tps, txtime 4.749 ms stddev 8.485, lat 6.828 ms stddev 12.520, lag 2.079 ms
progress: 134.0 s, 1026.0 tps, txtime 4.362 ms stddev 2.556, lat 4.879 ms stddev 3.158, lag 0.517 ms

here e.g. the was a noticable slowdown, where looking at tps doesn't
help (because we're still able to meet the tps goal).


I still don't quite get when I would want to look at lat, when I have
lag. They're always going to be close by.


> For testing the server under full load, like during that catch up period,
> testing without -R seems better.

One area in which postgres is pretty weak, although less bad than we
used to be, is in is predicatable latency. Most production servers
aren't run under the highest possible throughput, therefore optimizing
jitter under loaded but not breakneck speeds is important.

And to be able to localize where such latency is introduced, it's
important to see the precise moment things got slower / where
performance recovered.


> > I think we should just restore lat to a sane behaviour under -R, and if
> > you want to have lat + lag as a separate column in -R mode, then let's
> > do that.
> 
> It seems like a bad idea to change the meaning of the value after the fact.
> Would be good to at least rename it, to avoid confusion. Maybe that's not
> too important for the interactive -P reports, but got to be mindful about
> the numbers logged in the log file, at least.

Given that 'lat' is currently in the file, and I would bet the majority
of users of pgbench use it without -R, I'm not convinced that's
something to care about. Most are going to interpret it the way it's
computed without -R.

Greetings,

Andres Freund



Re: pgbench rate limiting changes transaction latency computation

From
Fabien COELHO
Date:
Hello Andres,

> On 2019-06-11 11:31:15 +0300, Heikki Linnakangas wrote:
>> It's not fair to say that its meaning was changed. Before 9.4, there was no
>> -R option.
>
> Well, my point is that -R changed the existing meaning of a field,

I do not think it does, because the client and transaction latencies are 
actually the same without -R, so nothing is changed.

When the two concepts started to differ, I choose the one interpretion 
that I thought best, and for me the whole point of running pgbench with -R 
is to look at the client latency, i.e. to answer to the question "is the 
database keeping up with the scheduled load", so this is the one 
displayed.

> and that that's not nice.

Well, that is clearly not what you would have expected.

>> Yeah, I can see that the server-observed transaction latency would often be
>> more useful than what's printed now. But changing the current meaning
>> doesn't seem like a good idea.
>
> Well, then a *new* column should have been added for that value under
> -R. Although admittedly 'lat' is not a very good descriptor for the non
> -R behaviour.
>
> But anyway, to go forward, I think we should replace 'lat' with a
> 'txtime' (or similar) column that is not affected by -R. And then, under
> -R only, add a new 'txlat' column, that shows the 'current' meaning of
> lat under -R.  Not convinced the names are right, but you get the gist.

I do not have a strong opinion. "lat" says latency in a short form 
constrained by a one line output. What precise latency is displayed is 
explained in the doc.

>> As Fabien said, the reported latency is the latency seen by the
>> imaginary user of the system, and without -R, there's no lag so it's the
>> same number. See also how it works with --latency-limit. The limit is on the
>> reported latency, which includes the lag.
>
> Well, that's how it works in a lot of scenarios (e.g. interactive
> scenarios, where users give up). But there's also a lot where the amount
> of work doesn't decrease due to lateness, it just queues up (e.g. many
> batch / queue processing workloads).

Sure. In which case --latency-limit should not be used.

>> The outlier behavior seems very visible in both of the above. The system
>> completely stalled for about 1-2 seconds. And then it takes a few seconds to
>> process the backlog and catch up.
>
> But that was just because I was showing a simplistic example. E.g. 
> here's a log of a vacuum finishing, and then another starting a few 
> seconds later (both vacuums lasting a fair while):
>
> progress: 139.0 s, 2438.4 tps, txtime 13.033 ms stddev 3.830, lat 17530.784 ms stddev 590.153, lag 17517.751 ms
> progress: 140.0 s, 2489.0 tps, txtime 12.911 ms stddev 3.642, lat 17752.862 ms stddev 600.661, lag 17739.952 ms
> progress: 141.0 s, 2270.0 tps, txtime 14.021 ms stddev 4.965, lat 17973.805 ms stddev 594.784, lag 17959.784 ms
> progress: 142.0 s, 1408.0 tps, txtime 22.848 ms stddev 5.365, lat 18417.808 ms stddev 632.729, lag 18394.960 ms
> progress: 143.0 s, 3001.0 tps, txtime 10.724 ms stddev 4.318, lat 18796.971 ms stddev 617.462, lag 18786.247 ms
> progress: 144.0 s, 4678.0 tps, txtime 6.823 ms stddev 2.136, lat 18503.253 ms stddev 669.072, lag 18496.431 ms
> progress: 145.0 s, 4577.0 tps, txtime 7.001 ms stddev 1.526, lat 18108.596 ms stddev 689.843, lag 18101.596 ms
> progress: 146.0 s, 2596.0 tps, txtime 12.261 ms stddev 3.060, lat 17961.623 ms stddev 683.498, lag 17949.363 ms
> progress: 147.0 s, 2654.0 tps, txtime 12.072 ms stddev 3.282, lat 18120.009 ms stddev 685.074, lag 18107.938 ms
> progress: 148.0 s, 3471.0 tps, txtime 9.240 ms stddev 3.702, lat 18251.712 ms stddev 676.572, lag 18242.472 ms
> progress: 149.0 s, 3056.0 tps, txtime 10.468 ms stddev 5.131, lat 18058.950 ms stddev 675.334, lag 18048.482 ms
> progress: 150.0 s, 2319.0 tps, txtime 13.778 ms stddev 3.762, lat 18305.101 ms stddev 688.186, lag 18291.323 ms
> progress: 151.0 s, 2355.0 tps, txtime 13.567 ms stddev 3.891, lat 18586.073 ms stddev 691.656, lag 18572.506 ms
> progress: 152.0 s, 2321.0 tps, txtime 13.742 ms stddev 3.708, lat 18835.985 ms stddev 709.580, lag 18822.244 ms
> progress: 153.0 s, 2360.0 tps, txtime 13.604 ms stddev 3.533, lat 19121.166 ms stddev 709.682, lag 19107.562 ms
>
> The period inbetween where no vacuum was running is imo considerably
> harder to spot looking at 'lat'.

ISTM that the signal is pretty clear in whether the lag increases or 
decreases. Basically the database is 18 seconds behind its load, which is 
very bad if a user is waiting.

> I guess you can argue that one can just look at tps instead. But for 
> other rate limited cases that doesn't work as well:
>
> progress: 121.0 s, 961.0 tps, txtime 3.452 ms stddev 0.947, lat 3.765 ms stddev 1.268, lag 0.313 ms
> progress: 122.0 s, 979.0 tps, txtime 5.388 ms stddev 8.737, lat 7.378 ms stddev 11.137, lag 1.990 ms
> progress: 123.0 s, 1078.8 tps, txtime 3.679 ms stddev 1.278, lat 4.322 ms stddev 3.216, lag 0.643 ms
> progress: 124.0 s, 1082.2 tps, txtime 5.575 ms stddev 9.790, lat 8.716 ms stddev 15.317, lag 3.141 ms
> progress: 125.0 s, 990.0 tps, txtime 3.489 ms stddev 1.148, lat 3.817 ms stddev 1.456, lag 0.328 ms
> progress: 126.0 s, 955.0 tps, txtime 9.284 ms stddev 15.362, lat 14.210 ms stddev 22.084, lag 4.926 ms
> progress: 127.0 s, 960.0 tps, txtime 11.951 ms stddev 11.222, lat 17.732 ms stddev 21.066, lag 5.781 ms
> progress: 128.0 s, 945.9 tps, txtime 11.702 ms stddev 17.590, lat 23.791 ms stddev 45.327, lag 12.089 ms
> progress: 129.0 s, 1013.1 tps, txtime 19.871 ms stddev 19.407, lat 42.530 ms stddev 39.582, lag 22.659 ms

Although the tps seems stable, something is clearly amiss because the lag 
and stddev (both lat & txtime) are increasing, and also the txtime. ISTM 
that the information is there, but needs to be interpreted, and this is 
never trivial.

> here e.g. the was a noticable slowdown, where looking at tps doesn't
> help (because we're still able to meet the tps goal).

Sure, that is why there are other informations on the line.

> I still don't quite get when I would want to look at lat, when I have
> lag. They're always going to be close by.

More or less. Under -R "lag + tx = lat", they are close if tx is small. 
The reason I choose so show lat is because I think it is the most 
important figure, and I added lag to show what part it represented in the 
overall performance. I do not think that showing all 3 is very useful 
because it makes the lines too long.

>> For testing the server under full load, like during that catch up period,
>> testing without -R seems better.
>
> One area in which postgres is pretty weak, although less bad than we
> used to be, is in is predicatable latency. Most production servers
> aren't run under the highest possible throughput, therefore optimizing
> jitter under loaded but not breakneck speeds is important.

Yes, I completely agree. I spent a lot of time working on the checkpointer 
to improve its behavior and reduce large client side latency spikes that 
could occur even with a moderate load.

> And to be able to localize where such latency is introduced, it's
> important to see the precise moment things got slower / where
> performance recovered.

For me it is more complicated. The tx could stay low but the system could 
lag far behind, unable to catch up. In your first trace above, txtime 
seems low enough (~ 10-15 ms), but the system is lagging 18000 ms behind.

>>> I think we should just restore lat to a sane behaviour under -R, and if
>>> you want to have lat + lag as a separate column in -R mode, then let's
>>> do that.
>>
>> It seems like a bad idea to change the meaning of the value after the fact.
>> Would be good to at least rename it, to avoid confusion. Maybe that's not
>> too important for the interactive -P reports, but got to be mindful about
>> the numbers logged in the log file, at least.
>
> Given that 'lat' is currently in the file, and I would bet the majority
> of users of pgbench use it without -R, I'm not convinced that's
> something to care about. Most are going to interpret it the way it's
> computed without -R.

As said above, without -R we have "lat = txtime" and "lag = 0". It is more 
of your intuitive expectations than an issue with what is displayed under 
-R, which for me are currently the most important figures from a 
benchmarking (client oriented) perspective.

I do not think that there is an issue to fix, but you can do as you feel.

Maybe there could be an option to show the transaction time (that I would 
shorten as "tx") instead of "lat", for people who want to focus more on 
server performance than on client perceived performance. I can add such an 
option, maybe for the next CF or the one afterwards, although I have 
already sent quite a few small patches about pgbench already.

-- 
Fabien.



Re: pgbench rate limiting changes transaction latency computation

From
Heikki Linnakangas
Date:
On 12/06/2019 02:24, Andres Freund wrote:
> But anyway, to go forward, I think we should replace 'lat' with a
> 'txtime' (or similar) column that is not affected by -R. And then, under
> -R only, add a new 'txlat' column, that shows the 'current' meaning of
> lat under -R.  Not convinced the names are right, but you get the gist.

I'm OK with that.

>> For testing the server under full load, like during that catch up period,
>> testing without -R seems better.
> 
> One area in which postgres is pretty weak, although less bad than we
> used to be, is in is predicatable latency. Most production servers
> aren't run under the highest possible throughput, therefore optimizing
> jitter under loaded but not breakneck speeds is important.
> 
> And to be able to localize where such latency is introduced, it's
> important to see the precise moment things got slower / where
> performance recovered.

I agree with all that. I'm still not convinced the changes you're 
proposing will help much, but if you would find it useful, I can't argue 
with that.

- Heikki



Re: pgbench rate limiting changes transaction latency computation

From
Heikki Linnakangas
Date:
On 12/06/2019 09:23, Fabien COELHO wrote:
>> But that was just because I was showing a simplistic example. E.g.
>> here's a log of a vacuum finishing, and then another starting a few
>> seconds later (both vacuums lasting a fair while):
>>
>> progress: 139.0 s, 2438.4 tps, txtime 13.033 ms stddev 3.830, lat 17530.784 ms stddev 590.153, lag 17517.751 ms
>> progress: 140.0 s, 2489.0 tps, txtime 12.911 ms stddev 3.642, lat 17752.862 ms stddev 600.661, lag 17739.952 ms
>> progress: 141.0 s, 2270.0 tps, txtime 14.021 ms stddev 4.965, lat 17973.805 ms stddev 594.784, lag 17959.784 ms
>> progress: 142.0 s, 1408.0 tps, txtime 22.848 ms stddev 5.365, lat 18417.808 ms stddev 632.729, lag 18394.960 ms
>> progress: 143.0 s, 3001.0 tps, txtime 10.724 ms stddev 4.318, lat 18796.971 ms stddev 617.462, lag 18786.247 ms
>> progress: 144.0 s, 4678.0 tps, txtime 6.823 ms stddev 2.136, lat 18503.253 ms stddev 669.072, lag 18496.431 ms
>> progress: 145.0 s, 4577.0 tps, txtime 7.001 ms stddev 1.526, lat 18108.596 ms stddev 689.843, lag 18101.596 ms
>> progress: 146.0 s, 2596.0 tps, txtime 12.261 ms stddev 3.060, lat 17961.623 ms stddev 683.498, lag 17949.363 ms
>> progress: 147.0 s, 2654.0 tps, txtime 12.072 ms stddev 3.282, lat 18120.009 ms stddev 685.074, lag 18107.938 ms
>> progress: 148.0 s, 3471.0 tps, txtime 9.240 ms stddev 3.702, lat 18251.712 ms stddev 676.572, lag 18242.472 ms
>> progress: 149.0 s, 3056.0 tps, txtime 10.468 ms stddev 5.131, lat 18058.950 ms stddev 675.334, lag 18048.482 ms
>> progress: 150.0 s, 2319.0 tps, txtime 13.778 ms stddev 3.762, lat 18305.101 ms stddev 688.186, lag 18291.323 ms
>> progress: 151.0 s, 2355.0 tps, txtime 13.567 ms stddev 3.891, lat 18586.073 ms stddev 691.656, lag 18572.506 ms
>> progress: 152.0 s, 2321.0 tps, txtime 13.742 ms stddev 3.708, lat 18835.985 ms stddev 709.580, lag 18822.244 ms
>> progress: 153.0 s, 2360.0 tps, txtime 13.604 ms stddev 3.533, lat 19121.166 ms stddev 709.682, lag 19107.562 ms
>>
>> The period inbetween where no vacuum was running is imo considerably
>> harder to spot looking at 'lat'.
> 
> ISTM that the signal is pretty clear in whether the lag increases or
> decreases. Basically the database is 18 seconds behind its load, which is
> very bad if a user is waiting.

That was my thought too, when looking at this example. When there is 
already a long queue of transactions, the txtime of individual 
transactions doesn't matter much. The most important thing under that 
condition is how fast the system can dissolve the queue (or how fast it 
builds up even more). So the derivative of the lag or lat seems like the 
most important figure. We don't print exactly that, but it's roughly the 
same as the TPS. Jitter experienced by the user matters too, i.e. stddev 
of 'lat'.

To illustrate this, imagine that the server magically detected that 
there's a long queue of transactions. It would be beneficial to go into 
"batch mode", where it collects incoming transactions into larger 
batches. The effect of this imaginary batch mode is that the TPS rate 
increases by 50%, but the txtime also increases by 1000%, and becomes 
highly variable. Would that be a good tradeoff? I would say yes. The 
user is experiencing an 18 s delay anyway, and the increase in txtime 
would be insignificant compared to that, but the queue would be busted 
more quickly.

Of course, there is no such batch mode in PostgreSQL, and I wouldn't 
suggest trying to implement anything like that. In a different kind of 
application, you would rather maintain a steady txtime when the server 
is at full load, even if it means a lower overall TPS rate. And that 
feels like a more important goal than just TPS. I think we all agree on 
that. To simulate that kind of an application, though, you probably 
don't want to use -R, or you would use it with --latency-limit. Except 
clearly Andres is trying to do just that, which is why I'm still a bit 
confused :-).

- Heikki



Re: pgbench rate limiting changes transaction latency computation

From
Alvaro Herrera
Date:
On 2019-Jun-12, Heikki Linnakangas wrote:

> That was my thought too, when looking at this example. When there is already
> a long queue of transactions, the txtime of individual transactions doesn't
> matter much. The most important thing under that condition is how fast the
> system can dissolve the queue (or how fast it builds up even more). So the
> derivative of the lag or lat seems like the most important figure. We don't
> print exactly that, but it's roughly the same as the TPS. Jitter experienced
> by the user matters too, i.e. stddev of 'lat'.

It's funny that you mention taking the derivative of lat or lag, because
that suggests that these numbers should not be merely printed on the
screen but rather produced in a way that's easy for a database to
consume.  Then you can just write the raw numbers and provide a set of
pre-written queries that generate whatever numbers the user desires.
We already have that ... but we don't provide any help on actually using
those log files -- there aren't instructions on how to import that into
a table, or what queries could be useful.

Maybe that's a useful direction to move towards?  I think the console
output is good for getting a gut-feeling of the test, but not for actual
data analysis.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services