Thread: help understanding pgbench results

help understanding pgbench results

From

Luca Ferrari

Date:

12 July 2019, 10:04:01

Hi all,
I'm trying to understand some simple benchmarks but I need an hint.

=# select version();
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-28), 64-bit


shared_buffers = 1 GB
checkpoint_timeout = 5 min

I've created a pgbench database as follows (around 4.5 GB):
% pgbench -i -s 300 -F 100 --foreign-keys --unlogged-tables -h
127.0.0.1 -U luca pgbench

and I've tested three times (each time after a restart) with the following:
% pgbench  -T 600  -j 4 -c 4  -h 127.0.0.1 -U luca -P 60  pgbench


Since tables are unlogged, I was expecting no much difference in
setting checkpoint_completion_target, but I got (average results):
- checkpoint_completion_target = 0.1  ==> 755 tps
- checkpoint_completation_target = 0.5 ==> 767 tps
- checkpoint_completion_target = 0.9 ==> 681 tps

so while there is not a big different in the first two cases, it seems
throttling I/O reduces the tps, and I don't get why. Please note that
there is some small activity while benchmarking, and that's why I ran
at least three tests for each setting.
Am I looking at wrong (or meaningless) numbers?

Thanks for any hint.
Luca

Re: help understanding pgbench results

From

Luca Ferrari

Date:

15 July 2019, 06:30:21

On Fri, Jul 12, 2019 at 12:04 PM Luca Ferrari <fluca1978@gmail.com> wrote:
> Since tables are unlogged, I was expecting no much difference in
> setting checkpoint_completion_target, but I got (average results):
> - checkpoint_completion_target = 0.1  ==> 755 tps
> - checkpoint_completation_target = 0.5 ==> 767 tps
> - checkpoint_completion_target = 0.9 ==> 681 tps
>

I've repeated the test with normal (logged) tables, same
configuration, and the results are:
- checkpoint_completion_target = 0.1 ==> 560 tps
- checkpoint_completion_target  = 0.5 ==> 624 tps
- checkpoint_completion_target = 0.9 ==> 619 tps

so here forcing I/O on checkpoints reduces the tps, as I would be
expecting. However I'm still unable to get an interpretation of the
unlogged table results.
That also makes me think it could these tests have no meaning at all.

Luca

Re: help understanding pgbench results

From

Fabio Pardi

Date:

15 July 2019, 11:35:29

Hi Luca

(I tried to reproduce your tests, but I got similar results over different checkpoint_completion_target)

The rest is in line here below:

On 12/07/2019 12:04, Luca Ferrari wrote:

> 
> shared_buffers = 1 GB
> checkpoint_timeout = 5 min
> 
> I've created a pgbench database as follows (around 4.5 GB):
> % pgbench -i -s 300 -F 100 --foreign-keys --unlogged-tables -h
> 127.0.0.1 -U luca pgbench
> 
> and I've tested three times (each time after a restart) with the following:
> % pgbench  -T 600  -j 4 -c 4  -h 127.0.0.1 -U luca -P 60  pgbench
> 
> 
> Since tables are unlogged, I was expecting no much difference in
> setting checkpoint_completion_target, but I got (average results):
> - checkpoint_completion_target = 0.1  ==> 755 tps
> - checkpoint_completation_target = 0.5 ==> 767 tps
> - checkpoint_completion_target = 0.9 ==> 681 tps

unlogged tables are not written to WAL, therefore checkpoints do not fit into the picture (unless something else is
writingdata..).

> 
> so while there is not a big different in the first two cases, it seems
> throttling I/O reduces the tps, and I don't get why. Please note that
> there is some small activity while benchmarking, and that's why I ran
> at least three tests for each setting.

It is not a good idea to have anything running in the background. 

Also is always a good idea to run tests multiple times, and I think that 3 is the bare minimum. 
You want to make sure your tests are as reliable as possible, means having similar results between each other,
thereforeyou might post all the results, not only the average, so people can give their interpretation of the data.

Back to your question, your tests run for 10 minutes, and checkpoints happen every 5, so we should expect to see 2
checkpointsper test, which might influence your results. How long is a checkpoint spread over time, is given by
checkpoint_completion_target

Assuming that the 'background activity' writes data, a value of (checkpoint_completion_target) 0.9 means that when your
teststarts, the system might be still busy in writing data from the previous checkpoint (which started before your
pgbenchtest was launched). That is less likely to happen with a value of 0.1

Maybe looking at the graphs (CPU, disk) of your server might point to something.
Also the postgres logs should be able to tell you more, eg: when a checkpoint starts, finishes, and how much stuff it
wrote.

I hope I gave you enough inputs to better understand what is going on.

regards,

fabio pardi

Re: help understanding pgbench results

From

Luca Ferrari

Date:

15 July 2019, 13:14:38

On Mon, Jul 15, 2019 at 1:35 PM Fabio Pardi <f.pardi@portavita.eu> wrote:
> unlogged tables are not written to WAL, therefore checkpoints do not fit into the picture (unless something else is
writingdata..). 

That's my thought, and I was not expecting any big change in tps due
to checkpoint_completion_target on unlogged tables.

> It is not a good idea to have anything running in the background.

Yes, I know, but the activity in the database is a task importing data
on a per-schedule basis, always importing the same number of tuples
(and therefore the same data size). In other words, it is a very
constant and predictable workload.

>
> Also is always a good idea to run tests multiple times, and I think that 3 is the bare minimum.
> You want to make sure your tests are as reliable as possible, means having similar results between each other,
thereforeyou might post all the results, not only the average, so people can give their interpretation of the data. 
>

I'm trying to prepare a virual machine to run more tests in a
completely isolated environment.
But I was not trying to benchmarking the database, rather guessing
what caused the different tps in such environment.

> Assuming that the 'background activity' writes data, a value of (checkpoint_completion_target) 0.9 means that when
yourtest starts, the system might be still busy in writing data from the previous checkpoint (which started before your
pgbenchtest was launched). That is less likely to happen with a value of 0.1 

Uhm...but in the logged table tests a value of 0.9 increases the tps,
that as far as I understand is in contrast with what you are stating.

Anyway, I'll test more and report back some more results.

Thanks,
Luca

Re: help understanding pgbench results

From

Fabio Pardi

Date:

15 July 2019, 13:25:05

On 15/07/2019 15:14, Luca Ferrari wrote:

>> Assuming that the 'background activity' writes data, a value of (checkpoint_completion_target) 0.9 means that when
yourtest starts, the system might be still busy in writing data from the previous checkpoint (which started before your
pgbenchtest was launched). That is less likely to happen with a value of 0.1

> 
> Uhm...but in the logged table tests a value of 0.9 increases the tps,
> that as far as I understand is in contrast with what you are stating.

What I stated is valid for unlogged tables. (a background checkpoint makes your pgbench results 'dirty')

When you talk about logged tables, you actually want to spread the checkpoint over time. The more it is spread, the
betterperformances. But here, probably, checkpoint has a lot to write compared to the data produced by background job
(and,maybe, checkpoints are happening more frequently?).

> 
> Anyway, I'll test more and report back some more results.

good, let us know and do not forget to provide the log lines produced by the checkpoints too.

regards,

fabio pardi

Re: help understanding pgbench results

From

Luca Ferrari

Date:

16 July 2019, 10:53:26

I've done another set of tests, and effectively it seems that, with
unlogged tables, the checkpoint_completion_target does not influence
the final results.
I've increased the test duration in order to include several
checkpoints within each run.

First of all, initialization of the database:
% pgbench -i -s 300 -F 100 --foreign-keys --unlogged-tables -h
127.0.0.1 -U luca pgbench

Then the test I ran, six time after a restart between a batch and the other:
% pgbench  -T 720  -j 4 -c 4  -h 127.0.0.1 -U luca  pgbench


The average tps results always around 795, so I believe in the
previous batch of tests I was misleaded by a few wrong numbers that
made the average floating up and down.
The following is a detail about the runs. Note that this has been run
with the same external database activity as in the previous bunch of
tests.

# First batch

             name             | setting
------------------------------+---------
 checkpoint_completion_target | 0.5
 checkpoint_timeout           | 300
 shared_buffers               | 131072


runs = 765, 807, 781, 799, 822, 796
avg  = 795 tps


Sample checkpoint log:

08:43:06 LOG:  checkpoint starting: time
08:43:07 LOG:  checkpoint complete: wrote 13 buffers (0.0%); 0 WAL
file(s) added, 0 removed, 0 recycled; write=1.304 s, sync=0.032 s,
total=1.368 s; sync files=17, longest=0.006 s, average=0.001 s;
distance=9426 kB, estimate=9907 kB


# Second batch

             name             | setting
------------------------------+---------
 checkpoint_completion_target | 0.1
 checkpoint_timeout           | 300
 shared_buffers               | 131072

runs = 810, 777, 808, 774, 806, 798
avg  = 795 tps

Sample checkpoint log:

09:34:54 LOG:  checkpoint starting: time
09:34:55 LOG:  checkpoint complete: wrote 12 buffers (0.0%); 0 WAL
file(s) added, 1 removed, 0 recycled; write=1.204 s, sync=0.105 s,
total=1.363 s; sync files=13, longest=0.049 s, average=0.008 s;
distance=9431 kB, estimate=9716 kB


# Third batch


             name             | setting
------------------------------+---------
 checkpoint_completion_target | 0.9
 checkpoint_timeout           | 300
 shared_buffers               | 131072

runs = 809, 801, 772, 797, 789, 795
avg  = 793 tps

Sample checkpoint log:

12:17:28 LOG:  checkpoint starting: time
12:17:32 LOG:  checkpoint complete: wrote 39 buffers (0.0%); 0 WAL
file(s) added, 0 removed, 0 recycled; write=3.914 s, sync=0.004 s,
total=3.927 s; sync files=11, longest=0.001 s, average=0.000 s;
distance=393 kB, estimate=7606 kB