Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: checkpointer continuous flushing
Date
Msg-id CAA4eK1+SUvYLgRvjfF8CPKAX9gPo8xuUrPQOat1AsYVmMuZOjQ@mail.gmail.com
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On Sat, Sep 5, 2015 at 12:26 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
I would be curious whether flushing helps, though.

Yes, me too. I think we should try to reach on consensus for exact scenarios and configuration where this patch('es) can give benefit or we want to verify if there is any regression as I have access to this m/c for a very-very limited time.  This m/c might get formatted soon for some other purpose.

Yep, it would be great if you have time for a flush test before it disappears... I think it is advisable to disable the write cache as it may also hide the impact of flushing.

Still thinking... Depending on the results, it might be interesting to have these tests run with the write cache enabled as well, to check how much it interferes positively with performance.


I have done some tests with both the patches(sort+flush) and below
are results:

M/c details

--------------------IBM POWER-8 24 cores, 192 hardware threads
RAM = 492GB




Test - 1 (Data Fits in shared_buffers)
--------------------------------------------------------
non-default settings used in script provided by Fabien upthread

used below options for pgbench and the same is used for rest
of tests as well.

fw)  ## full speed parallel write pgbench
run="FW"
opts="-M prepared -P 1 -T $time $para"
;;

warmup=1000
scale=300
max_connections=300
shared_buffers=32GB
checkpoint_timeout=10min
time=7200
synchronous_commit=on
max_wal_size=15GB

para="-j 64 -c 128"
checkpoint_completion_target=0.8

checkpoint_flush_to_disk="on off"
checkpoint_sort="on off"

Flush - off and Sort - off
avg over 7203: 27480.350104 ± 12791.098857 [0.000000, 16009.400000, 32109.200000, 37629.000000, 51671.400000]
percent of values below 10.0: 2.8%

Flush - off and Sort - on
avg over 7200: 27482.501264 ± 12552.036065 [0.000000, 16587.250000, 31225.950000, 37516.450000, 51296.900000]
percent of values below 10.0: 2.8%

Flush - on and Sort - off
avg over 7200: 25214.757292 ± 11059.709509 [5268.000000, 14188.400000, 26472.450000, 35626.100000, 51479.000000]
percent of values below 10.0: 0.0%

Flush - on and Sort - on
avg over 7200: 26819.631722 ± 10589.745016 [5191.700000, 16825.450000, 29429.750000, 35707.950000, 51475.100000]
percent of values below 10.0: 0.0%

For this test run, the best results are when both the sort and flush options
are enabled, the value of lowest TPS is increased substantially without
sacrificing much on average or median TPS values (though there is ~9%
dip in median TPS value).  When only sorting is enabled, there is neither
significant gain nor any loss.  When only flush is enabled, there is
significant degradation in both average and median value of TPS ~8%
and ~21% respectively.


Test - 2 (Data doesn't fit in shared_buffers, but fits in RAM)
----------------------------------------------------------------------------------------
warmup=1000
scale=3000
max_connections=300
shared_buffers=32GB
checkpoint_timeout=10min
time=7200
synchronous_commit=on
max_wal_size=25GB

para="-j 64 -c 128"
checkpoint_completion_target=0.8

checkpoint_flush_to_disk="on off"
checkpoint_sort="on off"

Flush - off and Sort - off
avg over 7200: 5050.059444 ± 4884.528702 [0.000000, 98.100000, 4699.100000, 10125.950000, 13631.000000]
percent of values below 10.0: 7.7%

Flush - off and Sort - on
avg over 7200: 6194.150264 ± 4913.525651 [0.000000, 98.100000, 8982.000000, 10558.000000, 14035.200000]
percent of values below 10.0: 11.0%

Flush - on and Sort - off
avg over 7200: 2771.327472 ± 1860.963043 [287.900000, 2038.850000, 2375.500000, 2679.000000, 12862.000000]
percent of values below 10.0: 0.0%

Flush - on and Sort - on
avg over 7200: 6110.617722 ± 1939.381029 [1652.200000, 5215.100000, 5724.000000, 6196.550000, 13828.000000]
percent of values below 10.0: 0.0%


For this test run, again the best results are when both the sort and flush
options are enabled, the value of lowest TPS is increased substantially
and the average and median value of TPS has also increased to
~21% and ~22% respectively.  When only sorting is enabled, there is a
significant gain in average and median TPS values, but then there is also
an increase in number of times when TPS is below 10 which is bad.
When only flush is enabled, there is significant degradation in both average
and median value of TPS to ~82% and ~97% respectively, now I am not
sure if such a big degradation could be expected for this case or it's just
a problem in this run, I have not repeated this test.


Test - 3 (Data doesn't fit in shared_buffers, but fits in RAM)
----------------------------------------------------------------------------------------
Same configuration and settings as above, but this time, I have enforced
Flush to use posix_fadvise() rather than sync_file_range()  (basically changed
code to comment out sync_file_range() and enable posix_fadvise()).

Flush - on and Sort - on
avg over 7200: 3400.915069 ± 739.626478 [1642.100000, 2965.550000, 3271.900000, 3558.800000, 6763.000000]
percent of values below 10.0: 0.0%

On using posix_fadvise(), the results for best case (both flush and sort as
on) shows significant degradation in average and median TPS values
by ~48% and ~43% which indicates that probably using posix_fadvise()
with the current options might not be the best way to achieve Flush.


Overall, I think this patch (sort+flush) brings a lot of value on table in
terms of stablizing the TPS during checkpoint, however some of the
cases like use of posix_fadvise() and the case (all data fits in shared_buffers)
where the value of median TPS is regressed could be investigated
to see what can be done to improve them.  I think more tests can be done
to ensure the benefit or regression of this patch, but for now this is what
best I can do.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: [PATCH] Microvacuum for gist.
Next
From: Amit Kapila
Date:
Subject: Re: Speed up Clog Access by increasing CLOG buffers