Re: checkpointer continuous flushing - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: checkpointer continuous flushing |
Date | |
Msg-id | alpine.DEB.2.10.1508310743250.30124@sto Whole thread Raw |
In response to | Re: checkpointer continuous flushing (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: checkpointer continuous flushing
|
List | pgsql-hackers |
Hello Amit, > IBM POWER-8 24 cores, 192 hardware threads > RAM = 492GB Wow! Thanks for trying the patch on such high-end hardware! About the disks: what kind of HDD (RAID? speed?)? HDD write cache? What is the OS? The FS? > warmup=60 Quite short, but probably ok. > scale=300 Means about 4-4.5 GB base. > time=7200 > synchronous_commit=on > shared_buffers=8GB This is small wrt hardware, but given the scale setup I think that it should not matter much. > max_wal_size=5GB Hmmm... Maybe quite small given the average performance? > checkpoint_timeout=2min This seems rather small. Are the checkpoints xlog or time triggered? You did not update checkpoint_completion_target, which means 0.5 so that the checkpoint is scheduled to run in at most 1 minute, which suggest at least 130 MB/s write performance for the checkpoint. > parallelism - 128 clients, 128 threads Given 192 hw threads, I would have tried used 128 clients & 64 threads, so that each pgbench client has its own dedicated postgres in a thread, and that postgres processes are not competing with pgbench. Now as pgbench is mostly sleeping, probably that does not matter much... I may also be totally wrong:-) > Sort - off > avg over 7200: 8256.382528 ± 6218.769282 [0.000000, 76.050000, > 10975.500000, 13105.950000, 21729.000000] > percent of values below 10.0: 19.5% The max performance is consistent with 128 threads * 200 (random) writes per second. > Sort - on > avg over 7200: 8375.930639 ± 6148.747366 [0.000000, 84.000000, > 10946.000000, 13084.000000, 20289.900000] > percent of values below 10.0: 18.6% This is really a small improvement, probably in the error interval of the measure. I would not trust much 1.5% tps or 0.9% availability improvements. I think that we could conclude that on your (great) setup, with these configuration parameter, this patch does not harm performance. This is a good thing, even if I would have hoped to see better performance. > Before going to conclusion, let me try to explain above data (I am > explaining again even though Fabien has explained, to make it clear > if someone has not read his mail) > > Let's try to understand with data for sorting - off option > > avg over 7200: 8256.382528 ± 6218.769282 > > 8256.382528 - average tps for 7200s pgbench run > 6218.769282 - standard deviation on per second figures > > [0.000000, 84.000000, 10946.000000, 13084.000000, 20289.900000] > > These 5 values can be read as minimum TPS, q1, median TPS, q3, > maximum TPS over 7200s pgbench run. As far as I understand q1 > and q3 median of subset of values which I didn't focussed much. q1 = 84 means that 25% of the time the performance was below 84 tps, about 1% of the average performance, which I would translate as "pg is pretty unresponsive 25% of the time". This is the kind of issue I really want to address, the eventual tps improvements are just a side effect. > percent of values below 10.0: 19.5% > > Above means percent of time the result is below 10 tps. Which means "postgres is really unresponsive 19.5% of the time". If you count zeros, you will get "postgres was totally unresponsive X% of the time". > Now about test results, these tests are done for pgbench full speed runs > and the above results indicate that there is approximately 1.5% > improvement in avg. TPS and ~1% improvement in tps values which are > below 10 with sorting on and there is almost no improvement in median or > maximum TPS values, instead they or slightly less when sorting is > on which could be due to run-to-run variation. Yes, I agree. > I have done more tests as well by varying time and number of clients > keeping other configuration same as above, but the results are quite > similar. Given the hardware, I would suggest to raise checkpoint_timeout, shared_buffers and max_wal_size, and use checkpoint_completion_target=0.8. I would expect that it should improve performance both with and without sorting. It would be interesting to have informations from checkpoint logs (especially how many buffers written in how long, whether checkpoints are time or xlog triggered, ...). > The results of sorting patch for the tests done indicate that the win is > not big enough with just doing sorting during checkpoints, ISTM that you do too much generalization: The win is not big "under this configuration and harware". I think that the patch may have very small influence under some conditions, but should not degrade performance significantly, and on the other hand it should provide great improvements under some (other) conditions. So having no performance degradation is a good result, even if I would hope to get better results. It would be interesting to understand why random disk writes do not perform too poorly on this box: size of I/O queue, kind of (expensive:-) disks, write caches, file system, raid level... > we should consider flush patch along with sorting. I also think that it would be interesting. > I would like to perform some tests with both the patches together (sort > + flush) unless somebody else thinks that sorting patch alone is > beneficial and we should test some other kind of scenarios to see it's > benefit. Yep. Is it a Linux box? If not, does it support posix_fadvise()? >> The reason for the tablespace balancing is [...] > > What if tablespaces are not on separate disks I would expect that it might very slightly degrade performance, but only marginally. > or not enough hardware support to make Writes parallel? I'm not sure that balancing or not writes over tablespaces would change anything to an I/O bottleneck which is not the disk write performance, so I would say "no impact" in that case. > I think for such cases it might be better to do it sequentially. Writing sequentially to different disks would be a bug, and degrade performance significantly on a setup with several disks, up to dividing the performance by the number of disks... so I do think that a patch which predictability and significantly degrades performance on high-end harware is a reasonable option. If you want to be able to disactivate balancing, it could be done with a guc, but I cannot see good reasons to want to do that: it would complicate the code and it does not make much sense to use many tablespaces on one disk, while anyone who uses several tablespaces on several disks is probably expecting to see her expensive disks actually used in parallel. -- Fabien.
pgsql-hackers by date: