Re: checkpointer continuous flushing - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: checkpointer continuous flushing |
Date | |
Msg-id | alpine.DEB.2.10.1603172153220.28507@sto Whole thread Raw |
In response to | Re: checkpointer continuous flushing (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: checkpointer continuous flushing
|
List | pgsql-hackers |
>> Is it possible to run tests with distinct table spaces on those many disks? > > Nope, that'd require reconfiguring the system (and then back), and I don't > have access to that system (just SSH). Ok. > Also, I don't quite see what would that tell us? Currently the flushing context is shared between table space, but I think that it should be per table space. My tests did not manage to convince Andres, so getting some more figures would be great. That will be another time! >> I would have suggested using the --latency-limit option to filter out >> very slow queries, otherwise if the system is stuck it may catch up >> later, but then this is not representative of "sustainable" performance. >> >> When pgbench is running under a target rate, in both runs the >> transaction distribution is expected to be the same, around 5000 tps, >> and the green run looks pretty ok with respect to that. The magenta one >> shows that about 25% of the time, things are not good at all, and the >> higher figures just show the catching up, which is not really >> interesting if you asked for a web page and it is finally delivered 1 >> minutes later. > > Maybe. But that'd only increase the stress on the system, possibly causing > more issues, no? And the magenta line is the old code, thus it would only > increase the improvement of the new code. Yes and no. I agree that it stresses the system a little more, but the fact that you have 5000 tps in the end does not show that you can really sustain 5000 tps with reasonnable latency. I find this later information more interesting than knowing that you can get 5000 tps on average, thanks to some catching up. Moreover the non throttled runs already shown that the system could do 8000 tps, so the bandwidth is already there. > Notice the max latency is in microseconds (as logged by pgbench), so > according to the "max latency" charts the latencies are below 10 seconds > (old) and 1 second (new) about 99% of the time. AFAICS, the max latency is aggregated by second, but then it does not say much about the distribution of individuals latencies in the interval, that is whether they were all close to the max or not, Having the same chart with median or average might help. Also, with the stddev chart, the percent do not correspond with the latency one, so it may be that the latency is high but the stddev is low, i.e. all transactions are equally bad on the interval, or not. So I must admit that I'm not clear at all how to interpret the max latency & stddev charts you provided. > So I don't think this would make any measurable difference in practice. I think that it may show that 25% of the time the system could not match the target tps, even if it can handle much more on average, so the tps achieved when discarding late transactions would be under 4000 tps. -- Fabien.
pgsql-hackers by date: