Re: checkpointer continuous flushing - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: checkpointer continuous flushing |
Date | |
Msg-id | alpine.DEB.2.10.1508190812290.5936@sto Whole thread Raw |
In response to | Re: checkpointer continuous flushing (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: checkpointer continuous flushing
|
List | pgsql-hackers |
> Sure, I think what can help here is a testcase/'s (in form of script file > or some other form, to test this behaviour of patch) which you can write > and post here, so that others can use that to get the data and share it. Sure... note that I already did that on this thread, without any echo... but I can do it again... Tests should be run on a dedicated host. If it has n cores, I suggest to share them between postgres checkpointer & workers and pgbench threads so as to avoid thread competition to use cores. With 8 cores I used up to 2 threads & 4 clients, so that there is 2 core left for the checkpointer and other stuff (i.e. I also run iotop & htop in parallel...). Although it may seem conservative to do so, I think that the point of the test is to exercise checkpoints and not to test the process scheduler of the OS. Here are the latest version of my test scripts: (1) cp_test.sh <name> <test> Run "test" with setup "name". Currently it runs 4000 seconds pgbench with the 4 possible on/off combinations for sorting & flushing, after some warmup. The 4000 second is chosen so that there are a few checkpoint cycles. For larger checkpoint times, I suggest to extend the run time to see at least 3 checkpoints during the run. More test settings can be added to the 2 "case"s. Postgres settings, especially shared_buffers, should be set to a pertinent value wrt the memory of the test host. The test run with postgres version found in the PATH, so ensure that the right version is found! (2) cp_test_count.py one-test-output.log For rate limited runs, look at the final figures and compute the number of late & skipped transactions. This can also be done by hand. (3) avg.py For full speed runs, compute stats about per second tps: sh> grep 'progress:' one-test-output.log | cut -d' ' -f4 | \ ./avg.py --limit=10 --length=4000 warning: 633 missingdata, extending with zeros avg over 4000: 199.290575 ± 512.114070 [0.000000, 0.000000, 4.000000, 5.000000, 2280.900000] percent of values below 10.0: 82.5% The figures I reported are the 199 (average tps), 512 (standard deviation on per second figures), 82.5% (percent of time below 10 tps, aka postgres is basically unresponsive). In brakets, the min q1 median q3 and max tps seen in the run. > Ofcourse, that is not mandatory to proceed with this patch, but still can > help you to prove your point as you might not have access to different > kind of systems to run the tests. I agree that more tests would be useful to decide which default value for the flushing option is the better. For Linux, all tests so far suggest "on" is the best choice, but for other systems that use posix_fadvise, it is really an open question. Another option would be to give me a temporary access for some available host, I'm used to running these tests... -- Fabien.
pgsql-hackers by date: