Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: checkpointer continuous flushing
Date
Msg-id alpine.DEB.2.10.1508190812290.5936@sto
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: checkpointer continuous flushing  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
> Sure, I think what can help here is a testcase/'s (in form of script file
> or some other form, to test this behaviour of patch) which you can write
> and post here, so that others can use that to get the data and share it.

Sure... note that I already did that on this thread, without any echo... 
but I can do it again...

Tests should be run on a dedicated host. If it has n cores, I suggest to 
share them between postgres checkpointer & workers and pgbench threads so 
as to avoid thread competition to use cores. With 8 cores I used up to 2 
threads & 4 clients, so that there is 2 core left for the checkpointer and 
other stuff (i.e. I also run iotop & htop in parallel...). Although it may 
seem conservative to do so, I think that the point of the test is to 
exercise checkpoints and not to test the process scheduler of the OS.

Here are the latest version of my test scripts:
 (1) cp_test.sh <name> <test>

Run "test" with setup "name". Currently it runs 4000 seconds pgbench with 
the 4 possible on/off combinations for sorting & flushing, after some 
warmup. The 4000 second is chosen so that there are a few checkpoint 
cycles. For larger checkpoint times, I suggest to extend the run time to 
see at least 3 checkpoints during the run.

More test settings can be added to the 2 "case"s. Postgres settings,
especially shared_buffers, should be set to a pertinent value wrt the 
memory of the test host.

The test run with postgres version found in the PATH, so ensure that the 
right version is found!
 (2) cp_test_count.py one-test-output.log

For rate limited runs, look at the final figures and compute the number of 
late & skipped transactions. This can also be done by hand.
 (3) avg.py

For full speed runs, compute stats about per second tps:
  sh> grep 'progress:' one-test-output.log | cut -d' ' -f4 | \        ./avg.py --limit=10 --length=4000  warning: 633
missingdata, extending with zeros  avg over 4000: 199.290575 ± 512.114070 [0.000000, 0.000000, 4.000000, 5.000000,
2280.900000] percent of values below 10.0: 82.5%
 

The figures I reported are the 199 (average tps), 512 (standard deviation 
on per second figures), 82.5% (percent of time below 10 tps, aka postgres 
is basically unresponsive). In brakets, the min q1 median q3 and max tps 
seen in the run.

> Ofcourse, that is not mandatory to proceed with this patch, but still can
> help you to prove your point as you might not have access to different
> kind of systems to run the tests.

I agree that more tests would be useful to decide which default value for 
the flushing option is the better. For Linux, all tests so far suggest 
"on" is the best choice, but for other systems that use posix_fadvise, it 
is really an open question.

Another option would be to give me a temporary access for some available 
host, I'm used to running these tests...

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: proposal: function parse_ident
Next
From: 'Victor Wagner *EXTERN*'
Date:
Subject: Re: Proposal: Implement failover on libpq connect level.