Thread: help understanding pgbench results
Hi all, I'm trying to understand some simple benchmarks but I need an hint. =# select version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit shared_buffers = 1 GB checkpoint_timeout = 5 min I've created a pgbench database as follows (around 4.5 GB): % pgbench -i -s 300 -F 100 --foreign-keys --unlogged-tables -h 127.0.0.1 -U luca pgbench and I've tested three times (each time after a restart) with the following: % pgbench -T 600 -j 4 -c 4 -h 127.0.0.1 -U luca -P 60 pgbench Since tables are unlogged, I was expecting no much difference in setting checkpoint_completion_target, but I got (average results): - checkpoint_completion_target = 0.1 ==> 755 tps - checkpoint_completation_target = 0.5 ==> 767 tps - checkpoint_completion_target = 0.9 ==> 681 tps so while there is not a big different in the first two cases, it seems throttling I/O reduces the tps, and I don't get why. Please note that there is some small activity while benchmarking, and that's why I ran at least three tests for each setting. Am I looking at wrong (or meaningless) numbers? Thanks for any hint. Luca
On Fri, Jul 12, 2019 at 12:04 PM Luca Ferrari <fluca1978@gmail.com> wrote: > Since tables are unlogged, I was expecting no much difference in > setting checkpoint_completion_target, but I got (average results): > - checkpoint_completion_target = 0.1 ==> 755 tps > - checkpoint_completation_target = 0.5 ==> 767 tps > - checkpoint_completion_target = 0.9 ==> 681 tps > I've repeated the test with normal (logged) tables, same configuration, and the results are: - checkpoint_completion_target = 0.1 ==> 560 tps - checkpoint_completion_target = 0.5 ==> 624 tps - checkpoint_completion_target = 0.9 ==> 619 tps so here forcing I/O on checkpoints reduces the tps, as I would be expecting. However I'm still unable to get an interpretation of the unlogged table results. That also makes me think it could these tests have no meaning at all. Luca
Hi Luca (I tried to reproduce your tests, but I got similar results over different checkpoint_completion_target) The rest is in line here below: On 12/07/2019 12:04, Luca Ferrari wrote: > > shared_buffers = 1 GB > checkpoint_timeout = 5 min > > I've created a pgbench database as follows (around 4.5 GB): > % pgbench -i -s 300 -F 100 --foreign-keys --unlogged-tables -h > 127.0.0.1 -U luca pgbench > > and I've tested three times (each time after a restart) with the following: > % pgbench -T 600 -j 4 -c 4 -h 127.0.0.1 -U luca -P 60 pgbench > > > Since tables are unlogged, I was expecting no much difference in > setting checkpoint_completion_target, but I got (average results): > - checkpoint_completion_target = 0.1 ==> 755 tps > - checkpoint_completation_target = 0.5 ==> 767 tps > - checkpoint_completion_target = 0.9 ==> 681 tps unlogged tables are not written to WAL, therefore checkpoints do not fit into the picture (unless something else is writingdata..). > > so while there is not a big different in the first two cases, it seems > throttling I/O reduces the tps, and I don't get why. Please note that > there is some small activity while benchmarking, and that's why I ran > at least three tests for each setting. It is not a good idea to have anything running in the background. Also is always a good idea to run tests multiple times, and I think that 3 is the bare minimum. You want to make sure your tests are as reliable as possible, means having similar results between each other, thereforeyou might post all the results, not only the average, so people can give their interpretation of the data. Back to your question, your tests run for 10 minutes, and checkpoints happen every 5, so we should expect to see 2 checkpointsper test, which might influence your results. How long is a checkpoint spread over time, is given by checkpoint_completion_target Assuming that the 'background activity' writes data, a value of (checkpoint_completion_target) 0.9 means that when your teststarts, the system might be still busy in writing data from the previous checkpoint (which started before your pgbenchtest was launched). That is less likely to happen with a value of 0.1 Maybe looking at the graphs (CPU, disk) of your server might point to something. Also the postgres logs should be able to tell you more, eg: when a checkpoint starts, finishes, and how much stuff it wrote. I hope I gave you enough inputs to better understand what is going on. regards, fabio pardi
On Mon, Jul 15, 2019 at 1:35 PM Fabio Pardi <f.pardi@portavita.eu> wrote: > unlogged tables are not written to WAL, therefore checkpoints do not fit into the picture (unless something else is writingdata..). That's my thought, and I was not expecting any big change in tps due to checkpoint_completion_target on unlogged tables. > It is not a good idea to have anything running in the background. Yes, I know, but the activity in the database is a task importing data on a per-schedule basis, always importing the same number of tuples (and therefore the same data size). In other words, it is a very constant and predictable workload. > > Also is always a good idea to run tests multiple times, and I think that 3 is the bare minimum. > You want to make sure your tests are as reliable as possible, means having similar results between each other, thereforeyou might post all the results, not only the average, so people can give their interpretation of the data. > I'm trying to prepare a virual machine to run more tests in a completely isolated environment. But I was not trying to benchmarking the database, rather guessing what caused the different tps in such environment. > Assuming that the 'background activity' writes data, a value of (checkpoint_completion_target) 0.9 means that when yourtest starts, the system might be still busy in writing data from the previous checkpoint (which started before your pgbenchtest was launched). That is less likely to happen with a value of 0.1 Uhm...but in the logged table tests a value of 0.9 increases the tps, that as far as I understand is in contrast with what you are stating. Anyway, I'll test more and report back some more results. Thanks, Luca
On 15/07/2019 15:14, Luca Ferrari wrote: >> Assuming that the 'background activity' writes data, a value of (checkpoint_completion_target) 0.9 means that when yourtest starts, the system might be still busy in writing data from the previous checkpoint (which started before your pgbenchtest was launched). That is less likely to happen with a value of 0.1 > > Uhm...but in the logged table tests a value of 0.9 increases the tps, > that as far as I understand is in contrast with what you are stating. What I stated is valid for unlogged tables. (a background checkpoint makes your pgbench results 'dirty') When you talk about logged tables, you actually want to spread the checkpoint over time. The more it is spread, the betterperformances. But here, probably, checkpoint has a lot to write compared to the data produced by background job (and,maybe, checkpoints are happening more frequently?). > > Anyway, I'll test more and report back some more results. good, let us know and do not forget to provide the log lines produced by the checkpoints too. regards, fabio pardi
I've done another set of tests, and effectively it seems that, with unlogged tables, the checkpoint_completion_target does not influence the final results. I've increased the test duration in order to include several checkpoints within each run. First of all, initialization of the database: % pgbench -i -s 300 -F 100 --foreign-keys --unlogged-tables -h 127.0.0.1 -U luca pgbench Then the test I ran, six time after a restart between a batch and the other: % pgbench -T 720 -j 4 -c 4 -h 127.0.0.1 -U luca pgbench The average tps results always around 795, so I believe in the previous batch of tests I was misleaded by a few wrong numbers that made the average floating up and down. The following is a detail about the runs. Note that this has been run with the same external database activity as in the previous bunch of tests. # First batch name | setting ------------------------------+--------- checkpoint_completion_target | 0.5 checkpoint_timeout | 300 shared_buffers | 131072 runs = 765, 807, 781, 799, 822, 796 avg = 795 tps Sample checkpoint log: 08:43:06 LOG: checkpoint starting: time 08:43:07 LOG: checkpoint complete: wrote 13 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=1.304 s, sync=0.032 s, total=1.368 s; sync files=17, longest=0.006 s, average=0.001 s; distance=9426 kB, estimate=9907 kB # Second batch name | setting ------------------------------+--------- checkpoint_completion_target | 0.1 checkpoint_timeout | 300 shared_buffers | 131072 runs = 810, 777, 808, 774, 806, 798 avg = 795 tps Sample checkpoint log: 09:34:54 LOG: checkpoint starting: time 09:34:55 LOG: checkpoint complete: wrote 12 buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=1.204 s, sync=0.105 s, total=1.363 s; sync files=13, longest=0.049 s, average=0.008 s; distance=9431 kB, estimate=9716 kB # Third batch name | setting ------------------------------+--------- checkpoint_completion_target | 0.9 checkpoint_timeout | 300 shared_buffers | 131072 runs = 809, 801, 772, 797, 789, 795 avg = 793 tps Sample checkpoint log: 12:17:28 LOG: checkpoint starting: time 12:17:32 LOG: checkpoint complete: wrote 39 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=3.914 s, sync=0.004 s, total=3.927 s; sync files=11, longest=0.001 s, average=0.000 s; distance=393 kB, estimate=7606 kB