Re: pgbench rate limiting changes transaction latency computation - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: pgbench rate limiting changes transaction latency computation |
Date | |
Msg-id | a411e48c-f5af-45a4-873a-770c5a2da863@iki.fi Whole thread Raw |
In response to | Re: pgbench rate limiting changes transaction latency computation (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: pgbench rate limiting changes transaction latency computation
|
List | pgsql-hackers |
On 12/06/2019 09:23, Fabien COELHO wrote: >> But that was just because I was showing a simplistic example. E.g. >> here's a log of a vacuum finishing, and then another starting a few >> seconds later (both vacuums lasting a fair while): >> >> progress: 139.0 s, 2438.4 tps, txtime 13.033 ms stddev 3.830, lat 17530.784 ms stddev 590.153, lag 17517.751 ms >> progress: 140.0 s, 2489.0 tps, txtime 12.911 ms stddev 3.642, lat 17752.862 ms stddev 600.661, lag 17739.952 ms >> progress: 141.0 s, 2270.0 tps, txtime 14.021 ms stddev 4.965, lat 17973.805 ms stddev 594.784, lag 17959.784 ms >> progress: 142.0 s, 1408.0 tps, txtime 22.848 ms stddev 5.365, lat 18417.808 ms stddev 632.729, lag 18394.960 ms >> progress: 143.0 s, 3001.0 tps, txtime 10.724 ms stddev 4.318, lat 18796.971 ms stddev 617.462, lag 18786.247 ms >> progress: 144.0 s, 4678.0 tps, txtime 6.823 ms stddev 2.136, lat 18503.253 ms stddev 669.072, lag 18496.431 ms >> progress: 145.0 s, 4577.0 tps, txtime 7.001 ms stddev 1.526, lat 18108.596 ms stddev 689.843, lag 18101.596 ms >> progress: 146.0 s, 2596.0 tps, txtime 12.261 ms stddev 3.060, lat 17961.623 ms stddev 683.498, lag 17949.363 ms >> progress: 147.0 s, 2654.0 tps, txtime 12.072 ms stddev 3.282, lat 18120.009 ms stddev 685.074, lag 18107.938 ms >> progress: 148.0 s, 3471.0 tps, txtime 9.240 ms stddev 3.702, lat 18251.712 ms stddev 676.572, lag 18242.472 ms >> progress: 149.0 s, 3056.0 tps, txtime 10.468 ms stddev 5.131, lat 18058.950 ms stddev 675.334, lag 18048.482 ms >> progress: 150.0 s, 2319.0 tps, txtime 13.778 ms stddev 3.762, lat 18305.101 ms stddev 688.186, lag 18291.323 ms >> progress: 151.0 s, 2355.0 tps, txtime 13.567 ms stddev 3.891, lat 18586.073 ms stddev 691.656, lag 18572.506 ms >> progress: 152.0 s, 2321.0 tps, txtime 13.742 ms stddev 3.708, lat 18835.985 ms stddev 709.580, lag 18822.244 ms >> progress: 153.0 s, 2360.0 tps, txtime 13.604 ms stddev 3.533, lat 19121.166 ms stddev 709.682, lag 19107.562 ms >> >> The period inbetween where no vacuum was running is imo considerably >> harder to spot looking at 'lat'. > > ISTM that the signal is pretty clear in whether the lag increases or > decreases. Basically the database is 18 seconds behind its load, which is > very bad if a user is waiting. That was my thought too, when looking at this example. When there is already a long queue of transactions, the txtime of individual transactions doesn't matter much. The most important thing under that condition is how fast the system can dissolve the queue (or how fast it builds up even more). So the derivative of the lag or lat seems like the most important figure. We don't print exactly that, but it's roughly the same as the TPS. Jitter experienced by the user matters too, i.e. stddev of 'lat'. To illustrate this, imagine that the server magically detected that there's a long queue of transactions. It would be beneficial to go into "batch mode", where it collects incoming transactions into larger batches. The effect of this imaginary batch mode is that the TPS rate increases by 50%, but the txtime also increases by 1000%, and becomes highly variable. Would that be a good tradeoff? I would say yes. The user is experiencing an 18 s delay anyway, and the increase in txtime would be insignificant compared to that, but the queue would be busted more quickly. Of course, there is no such batch mode in PostgreSQL, and I wouldn't suggest trying to implement anything like that. In a different kind of application, you would rather maintain a steady txtime when the server is at full load, even if it means a lower overall TPS rate. And that feels like a more important goal than just TPS. I think we all agree on that. To simulate that kind of an application, though, you probably don't want to use -R, or you would use it with --latency-limit. Except clearly Andres is trying to do just that, which is why I'm still a bit confused :-). - Heikki
pgsql-hackers by date: