Re: checkpointer continuous flushing - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: checkpointer continuous flushing |
Date | |
Msg-id | 34db4b9b-6bcb-8633-df87-064df76065e6@2ndquadrant.com Whole thread Raw |
In response to | Re: checkpointer continuous flushing (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: checkpointer continuous flushing
|
List | pgsql-hackers |
Hi, On 03/17/2016 06:36 PM, Fabien COELHO wrote: > > Hello Tomas, > > Thanks for these great measures. > >> * 4 x CPU E5-4620 (2.2GHz) > > 4*8 = 32 cores / 64 threads. Yep. I only used 32 clients though, to keep some of the CPU available for the rest of the system (also, HT does not really double the number of cores). > >> * 256GB of RAM > > Wow! > >> * 24x SSD on LSI 2208 controller (with 1GB BBWC) > > Wow! RAID configuration ? The patch is designed to fix very big issues > on HDD, but it is good to see that the impact is good on SSD as well. Yep, RAID-10. I agree that doing the test on a HDD-based system would be useful, however (a) I don't have a comparable system at hand at the moment, and (b) I was a bit worried that it'll hurt performance on SSDs, but thankfully that's not the case. I will do the test on a much smaller system with HDDs in a few days. > > Is it possible to run tests with distinct table spaces on those many disks? Nope, that'd require reconfiguring the system (and then back), and I don't have access to that system (just SSH). Also, I don't quite see what would that tell us? >> * shared_buffers=64GB > > 1/4 of the available memory. > >> The pgbench was scale 60000, so ~750GB of data on disk, > > *3 available memory, mostly on disk. > >> or like this ("throttled"): >> >> pgbench -c 32 -j 8 -T 86400 -R 5000 -l --aggregate-interval=1 pgbench >> >> The reason for the throttling is that people generally don't run >> production databases 100% saturated, so it'd be sad to improve the >> 100% saturated case and hurt the common case by increasing latency. > > Sure. > >> The machine does ~8000 tps, so 5000 tps is ~60% of that. > > Ok. > > I would have suggested using the --latency-limit option to filter out > very slow queries, otherwise if the system is stuck it may catch up > later, but then this is not representative of "sustainable" performance. > > When pgbench is running under a target rate, in both runs the > transaction distribution is expected to be the same, around 5000 tps, > and the green run looks pretty ok with respect to that. The magenta one > shows that about 25% of the time, things are not good at all, and the > higher figures just show the catching up, which is not really > interesting if you asked for a web page and it is finally delivered 1 > minutes later. Maybe. But that'd only increase the stress on the system, possibly causing more issues, no? And the magenta line is the old code, thus it would only increase the improvement of the new code. Notice the max latency is in microseconds (as logged by pgbench), so according to the "max latency" charts the latencies are below 10 seconds (old) and 1 second (new) about 99% of the time. So I don't think this would make any measurable difference in practice. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: