Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: checkpointer continuous flushing
Date
Msg-id alpine.DEB.2.10.1509081531300.25033@sto
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: checkpointer continuous flushing
List pgsql-hackers
Hello Amit,

> I have done some tests with both the patches(sort+flush) and below
> are results:

Thanks a lot for these runs on this great harware!

> Test - 1 (Data Fits in shared_buffers)

Rounded for easier comparison:
  flush/sort  off off: 27480.4 ± 12791.1 [   0, 16009, 32109, 37629, 51671] (2.8%)  off on : 27482.5 ± 12552.0 [   0,
16587,31226, 37516, 51297] (2.8%)
 

The two above case are pretty indistinguishable, sorting has no impact. 
The 2.8% means more than 1 minute offline per hour (not necessarily a 
whole minute, it may be distributed over the whole hour).
  on  off: 25214.8 ± 11059.7 [5268, 14188, 26472, 35626, 51479] (0.0%)  on  on : 26819.6 ± 10589.7 [5192, 16825, 29430,
35708,51475] (0.0%)
 

> For this test run, the best results are when both the sort and flush 
> options are enabled, the value of lowest TPS is increased substantially 
> without sacrificing much on average or median TPS values (though there 
> is ~9% dip in median TPS value).  When only sorting is enabled, there is 
> neither significant gain nor any loss.  When only flush is enabled, 
> there is significant degradation in both average and median value of TPS 
> ~8% and ~21% respectively.

I interpret the five numbers in bracket as an indicator of performance 
stability: they should be equal for perfect stability. Once they show some 
stability, the next point for me is to focus at the average performance. I 
do not see a median decrease as a big issue if the average is reasonably 
good.

Thus I essentially note the -2.5% dip on average of on-on vs off-on. I 
would say that it is probably significant, although it might be in the 
error margin of the measure. Not sure whether the little stddev reduction 
is really significant. Anyway the benefit is clear: 100% availability.

Flushing without sorting is a bad idea (tm), not a surprise.

> Test - 2 (Data doesn't fit in shared_buffers, but fits in RAM)
 flush/sort off off: 5050.1 ± 4884.5 [   0,   98, 4699, 10126, 13631] ( 7.7%) off on : 6194.2 ± 4913.5 [   0,   98,
8982,10558, 14035] (11.0%) on  off: 2771.3 ± 1861.0 [ 288, 2039, 2375,  2679, 12862] ( 0.0%) on  on : 6110.6 ± 1939.3
[1652,5215, 5724,  6196, 13828] ( 0.0%)
 

I'm not sure that the off-on vs on-on -1.3% avg tps dip is significant, 
but it may be. With both flushing and sorting pg becomes fully available,
and the standard deviation is devided by more than 2, so the benefit is 
clear.

> For this test run, again the best results are when both the sort and flush
> options are enabled, the value of lowest TPS is increased substantially
> and the average and median value of TPS has also increased to
> ~21% and ~22% respectively.  When only sorting is enabled, there is a
> significant gain in average and median TPS values, but then there is also
> an increase in number of times when TPS is below 10 which is bad.
> When only flush is enabled, there is significant degradation in both average
> and median value of TPS to ~82% and ~97% respectively, now I am not
> sure if such a big degradation could be expected for this case or it's just
> a problem in this run, I have not repeated this test.

Yes, I agree that it is strange that sorting without flushing on its own 
both improves performance (+20% tps) but seems to degrade availability at 
the same time. A rerun would have helped to check whether it is a fluke or 
it is reproducible.

> Test - 3 (Data doesn't fit in shared_buffers, but fits in RAM)
> ----------------------------------------------------------------------------------------
> Same configuration and settings as above, but this time, I have enforced
> Flush to use posix_fadvise() rather than sync_file_range()  (basically
> changed code to comment out sync_file_range() and enable posix_fadvise()).
>
> On using posix_fadvise(), the results for best case (both flush and sort as
> on) shows significant degradation in average and median TPS values
> by ~48% and ~43% which indicates that probably using posix_fadvise()
> with the current options might not be the best way to achieve Flush.

Yes, indeed.

The way posix_fadvise is implemented on Linux is between no effect and bad 
effect (the buffer is erased). You hit the later quite strongly... As you 
are doing a "not fit in shared buffer" test, it is essential that buffers 
are kept in ram, but posix_fadvise on Linux just instructs to erase the 
buffer from memory if it was already passed to the I/O subsystem, which 
given the probably large I/O device cache on your host should be done 
pretty quickly, so that later read must be fetch back from the device 
(either cache or disk), which means a drop in performance.

Note that FreeBSD implementation seems more convincing, although less good 
than Linux sync_file_range function. I've no idea about other systems.

> Overall, I think this patch (sort+flush) brings a lot of value on table 
> in terms of stablizing the TPS during checkpoint, however some of the 
> cases like use of posix_fadvise() and the case (all data fits in 
> shared_buffers) where the value of median TPS is regressed could be 
> investigated to see what can be done to improve them.  I think more 
> tests can be done to ensure the benefit or regression of this patch, but 
> for now this is what best I can do.

Thanks a lot, again, for these tests!

I think that we may conclude, on these run:

(1) sorting seems not to harm performance, and may help a lot.

(2) Linux flushing with sync_file_range may degrade a little raw tps    average in some case, but definitely improves
performancestability    (always 100% availability when on !).
 

(3) posix_fadvise on Linux is a bad idea... the good news is that it    is not needed there:-) How good or bad an idea
itis on other system    is an open question...
 

These results are consistent with the current default values in the patch: 
sorting is on by default, flushing is on with Linux and off otherwise 
(posix_fadvise).

Also, as the effect on other systems is unclear, I think it is best to 
keep both settings as GUCs for now.

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: New functions
Next
From: Teodor Sigaev
Date:
Subject: Re: jsonb_concat: make sure we always return a non-scalar value