Home > mailing lists

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: checkpointer continuous flushing
Date	September 8, 2015 13:57:54
Msg-id	CAA4eK1+SUvYLgRvjfF8CPKAX9gPo8xuUrPQOat1AsYVmMuZOjQ@mail.gmail.com Whole thread Raw
In response to	Re: checkpointer continuous flushing (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses	Re: checkpointer continuous flushing (Fabien COELHO <coelho@cri.ensmp.fr>)
List	pgsql-hackers

Tree view

On Sat, Sep 5, 2015 at 12:26 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

I would be curious whether flushing helps, though.

Yes, me too. I think we should try to reach on consensus for exact scenarios and configuration where this patch('es) can give benefit or we want to verify if there is any regression as I have access to this m/c for a very-very limited time. This m/c might get formatted soon for some other purpose.

Yep, it would be great if you have time for a flush test before it disappears... I think it is advisable to disable the write cache as it may also hide the impact of flushing.

Still thinking... Depending on the results, it might be interesting to have these tests run with the write cache enabled as well, to check how much it interferes positively with performance.

I have done some tests with both the patches(sort+flush) and below

are results:

M/c details

--------------------IBM POWER-8 24 cores, 192 hardware threads
RAM = 492GB

Test - 1 (Data Fits in shared_buffers)

--------------------------------------------------------

non-default settings used in script provided by Fabien upthread

used below options for pgbench and the same is used for rest

of tests as well.

fw) ## full speed parallel write pgbench

run="FW"

opts="-M prepared -P 1 -T $time $para"

;;

warmup=1000

scale=300

max_connections=300

shared_buffers=32GB

checkpoint_timeout=10min

time=7200

synchronous_commit=on

max_wal_size=15GB

para="-j 64 -c 128"

checkpoint_completion_target=0.8

checkpoint_flush_to_disk="on off"

checkpoint_sort="on off"

Flush - off and Sort - off

avg over 7203: 27480.350104 ± 12791.098857 [0.000000, 16009.400000, 32109.200000, 37629.000000, 51671.400000]

percent of values below 10.0: 2.8%

Flush - off and Sort - on

avg over 7200: 27482.501264 ± 12552.036065 [0.000000, 16587.250000, 31225.950000, 37516.450000, 51296.900000]

percent of values below 10.0: 2.8%

Flush - on and Sort - off

avg over 7200: 25214.757292 ± 11059.709509 [5268.000000, 14188.400000, 26472.450000, 35626.100000, 51479.000000]

percent of values below 10.0: 0.0%

Flush - on and Sort - on

avg over 7200: 26819.631722 ± 10589.745016 [5191.700000, 16825.450000, 29429.750000, 35707.950000, 51475.100000]

percent of values below 10.0: 0.0%

For this test run, the best results are when both the sort and flush options

are enabled, the value of lowest TPS is increased substantially without

sacrificing much on average or median TPS values (though there is ~9%

dip in median TPS value). When only sorting is enabled, there is neither

significant gain nor any loss. When only flush is enabled, there is

significant degradation in both average and median value of TPS ~8%

and ~21% respectively.

Test - 2 (Data doesn't fit in shared_buffers, but fits in RAM)

----------------------------------------------------------------------------------------

warmup=1000

scale=3000

max_connections=300

shared_buffers=32GB

checkpoint_timeout=10min

time=7200

synchronous_commit=on

max_wal_size=25GB

para="-j 64 -c 128"

checkpoint_completion_target=0.8

checkpoint_flush_to_disk="on off"

checkpoint_sort="on off"

Flush - off and Sort - off

avg over 7200: 5050.059444 ± 4884.528702 [0.000000, 98.100000, 4699.100000, 10125.950000, 13631.000000]

percent of values below 10.0: 7.7%

Flush - off and Sort - on

avg over 7200: 6194.150264 ± 4913.525651 [0.000000, 98.100000, 8982.000000, 10558.000000, 14035.200000]

percent of values below 10.0: 11.0%

Flush - on and Sort - off

avg over 7200: 2771.327472 ± 1860.963043 [287.900000, 2038.850000, 2375.500000, 2679.000000, 12862.000000]

percent of values below 10.0: 0.0%

Flush - on and Sort - on

avg over 7200: 6110.617722 ± 1939.381029 [1652.200000, 5215.100000, 5724.000000, 6196.550000, 13828.000000]

percent of values below 10.0: 0.0%

For this test run, again the best results are when both the sort and flush

options are enabled, the value of lowest TPS is increased substantially

and the average and median value of TPS has also increased to

~21% and ~22% respectively. When only sorting is enabled, there is a

significant gain in average and median TPS values, but then there is also

an increase in number of times when TPS is below 10 which is bad.

When only flush is enabled, there is significant degradation in both average

and median value of TPS to ~82% and ~97% respectively, now I am not

sure if such a big degradation could be expected for this case or it's just

a problem in this run, I have not repeated this test.

Test - 3 (Data doesn't fit in shared_buffers, but fits in RAM)

----------------------------------------------------------------------------------------

Same configuration and settings as above, but this time, I have enforced

Flush to use posix_fadvise() rather than sync_file_range() (basically changed

code to comment out sync_file_range() and enable posix_fadvise()).

Flush - on and Sort - on

avg over 7200: 3400.915069 ± 739.626478 [1642.100000, 2965.550000, 3271.900000, 3558.800000, 6763.000000]

percent of values below 10.0: 0.0%

On using posix_fadvise(), the results for best case (both flush and sort as

on) shows significant degradation in average and median TPS values

by ~48% and ~43% which indicates that probably using posix_fadvise()

with the current options might not be the best way to achieve Flush.

Overall, I think this patch (sort+flush) brings a lot of value on table in

terms of stablizing the TPS during checkpoint, however some of the

cases like use of posix_fadvise() and the case (all data fits in shared_buffers)

where the value of median TPS is regressed could be investigated

to see what can be done to improve them. I think more tests can be done

to ensure the benefit or regression of this patch, but for now this is what

best I can do.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Teodor Sigaev
Date: 08 September 2015, 13:48:02
Subject: Re: [PATCH] Microvacuum for gist.

From: Amit Kapila
Date: 08 September 2015, 14:50:08
Subject: Re: Speed up Clog Access by increasing CLOG buffers

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

Previous

Next