Re: checkpointer continuous flushing - V16 - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: checkpointer continuous flushing - V16 |
Date | |
Msg-id | alpine.DEB.2.10.1602191012160.1254@sto Whole thread Raw |
In response to | Re: checkpointer continuous flushing - V16 (Andres Freund <andres@anarazel.de>) |
Responses |
Re: checkpointer continuous flushing - V16
|
List | pgsql-hackers |
Hello Andres, > I don't want to post a full series right now, but my working state is > available on > http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/checkpoint-flush > git://git.postgresql.org/git/users/andresfreund/postgres.git checkpoint-flush Below the results of a lot of tests with pgbench to exercise checkpoints on the above version when fetched. Overall comments: - sorting & flushing is basically always a winner - benchmarking with short runs on large databases isa bad idea the results are very different if a longer run is used (see andres00b vs andres00c) # HOST/SOFT 16 GB 2 cpu 8 cores 200 GB RAID1 HDD, ext4 FS Ubuntu 12.04 LTS (precise) # ABOUT THE REPORTED STATISTICS tps: is the "excluding connection" time tps, the higher the better 1-sec tps: average of measured per-second tps note- it should be the same as the previous one, but due to various hazards in the trace, especially when thingsgo badly and pg get stuck, it may be different. Such hazard also explain why there may be some non-integertps reported for some seconds. stddev: standard deviation, the lower the better the five figures in bracket givea feel of the distribution: - min: minimal per-second tps seen in the trace - q1: first quarter per-second tps seen inthe trace - med: median per-second tps seen in the trace - q3: third quarter per-second tps seen in the trace - max: maximalper-second tps seen in the trace the last percentage dubbed "<=10.0" is percent of seconds where performance isbelow 10 tps: this measures of how unresponsive pg was during the run ###### TINY2 pgbench -M prepared -N -P 1 -T 4000 -j 2 -c 4 with scale = 10 (~ 200 MB) postgresql.conf: shared_buffers = 1GB max_wal_size = 1GB checkpoint_timeout = 300s checkpoint_completion_target= 0.8 checkpoint_flush_after = { none, 0, 32, 64 } opts # | tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0 head 0 | 2574.1 / 2574.3 ± 367.4 [229.0, 2570.1, 2721.9, 2746.1, 2857.2] 0.0% 1 | 2575.0 / 2575.1 ± 359.3 [ 1.0, 2595.9,2712.0, 2732.0, 2847.0] 0.1% 2 | 2602.6 / 2602.7 ± 359.5 [ 54.0, 2607.1, 2735.1, 2768.1, 2908.0] 0.0% 0 0 | 2583.2 / 2583.7 ± 296.4 [164.0, 2580.0, 2690.0, 2717.1, 2833.8] 0.0% 1 | 2596.6 / 2596.9 ± 307.4 [296.0, 2590.5,2707.9, 2738.0, 2847.8] 0.0% 2 | 2604.8 / 2605.0 ± 300.5 [110.9, 2619.1, 2712.4, 2738.1, 2849.1] 0.0% 32 0 | 2625.5 / 2625.5 ± 250.5 [ 1.0, 2645.9, 2692.0, 2719.9, 2839.0] 0.1% 1 | 2630.2 / 2630.2 ± 243.1 [301.8, 2654.9,2697.2, 2726.0, 2837.4] 0.0% 2 | 2648.3 / 2648.4 ± 236.7 [570.1, 2664.4, 2708.9, 2739.0, 2844.9] 0.0% 64 0 | 2587.8 / 2587.9 ± 306.1 [ 83.0, 2610.1, 2680.0, 2731.0, 2857.1] 0.0% 1 | 2591.1 / 2591.1 ± 305.2 [455.9, 2608.9,2680.2, 2734.1, 2859.0] 0.0% 2 | 2047.8 / 2046.4 ± 925.8 [ 0.0, 1486.2, 2592.6, 2691.1, 3001.0] 0.2% ? Pretty small setup, all data fit in buffers. Good tps performance all around (best for 32 flushes), and flushing shows a noticable (360 -> 240) reduction in tps stddev. ###### SMALL pgbench -M prepared -N -P 1 -T 4000 -j 2 -c 4 with scale = 120 (~ 2 GB) postgresql.conf: shared_buffers = 2GB checkpoint_timeout = 300s checkpoint_completion_target = 0.8 checkpoint_flush_after= { none, 0, 32, 64 } opts # | tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0 head 0 | 209.2 / 204.2 ± 516.5 [0.0, 0.0, 4.0, 5.0, 2251.0] 82.3% 1 | 207.4 / 204.2 ± 518.7 [0.0, 0.0, 4.0, 5.0, 2245.1] 82.3% 2 | 217.5 / 211.0 ± 530.3 [0.0, 0.0, 3.0, 5.0, 2255.0] 82.0% 3 | 217.8 / 213.2± 531.7 [0.0, 0.0, 4.0, 6.0, 2261.9] 81.7% 4 | 230.7 / 223.9 ± 542.7 [0.0, 0.0, 4.0, 7.0, 2282.0]80.7% 0 0 | 734.8 / 735.5 ± 879.9 [0.0, 1.0, 16.5, 1748.3, 2281.1] 47.0% 1 | 694.9 / 693.0 ± 849.0 [0.0, 1.0, 29.5,1545.7, 2428.0] 46.4% 2 | 735.3 / 735.5 ± 888.4 [0.0, 0.0, 12.0, 1781.2, 2312.1] 47.9% 3 | 736.0 / 737.5± 887.1 [0.0, 1.0, 16.0, 1794.3, 2317.0] 47.5% 4 | 734.9 / 735.1 ± 885.1 [0.0, 1.0, 15.5, 1781.0, 2297.1]47.2% 32 0 | 738.1 / 737.9 ± 415.8 [0.0, 553.0, 679.0, 753.0, 2312.1] 0.2% 1 | 730.5 / 730.7 ± 413.2 [0.0, 546.5, 671.0, 744.0, 2319.0] 0.1% 2 | 741.9 / 741.9 ± 416.5 [0.0, 556.0, 682.0, 756.0, 2331.0] 0.2% 3 | 744.1 / 744.1± 414.4 [0.0, 555.5, 685.2, 758.0, 2285.1] 0.1% 4 | 746.9 / 746.9 ± 416.6 [0.0, 566.6, 685.0, 759.0, 2308.1] 0.1% 64 0 | 743.0 / 743.1 ± 416.5 [1.0, 555.0, 683.0, 759.0, 2353.0] 0.1% 1 | 742.5 / 742.5 ± 415.6 [0.0, 558.2, 680.0, 758.2, 2296.0] 0.1% 2 | 742.5 / 742.5 ± 415.9 [0.0, 559.0, 681.1, 757.0, 2310.0] 0.1% 3 | 529.0 / 526.6± 410.9 [0.0, 245.0, 444.0, 701.0, 2380.9] 1.5% ?? 4 | 734.8 / 735.0 ± 414.1 [0.0, 550.0, 673.0, 754.0, 2298.0] 0.1% Sorting brings * 3.3 tps, flushing significantly reduces tps stddev. Pg comes from 80% unresponsive to nearly always responsive. ###### MEDIUM pgbench: -M prepared -N -P 1 -T 4000 -j 2 -c 4 with scale = 250 (~ 3.8 GB) postgresql.conf: shared_buffers = 4GB max_wal_size = 4GB checkpoint_timeout = 15min checkpoint_completion_target= 0.8 checkpoint_flush_after = { none, 0, 32, 64 } opts # | tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0 head 0 | 214.8 / 211.8 ± 513.7 [0.0, 1.0, 4.0, 5.0, 2344.0] 82.4% 1 | 219.2 / 215.0 ± 524.1 [0.0, 0.0, 4.0, 5.0, 2316.0] 82.2% 2 | 240.9 / 234.6 ± 550.8 [0.0, 0.0, 4.0, 6.0, 2320.2] 81.0% 0 0 | 1064.7 / 1065.3 ± 888.2 [0.0, 11.0, 1089.0, 2017.7, 2461.9] 24.7% 1 | 1060.2 / 1061.2 ± 889.9 [0.0, 10.0,1056.7, 2022.0, 2444.9] 25.1% 2 | 1060.2 / 1061.4 ± 889.1 [0.0, 9.0, 1085.8, 2002.8, 2473.0] 25.6% 32 0 | 1059.4 / 1059.4 ± 476.3 [3.0, 804.9, 980.0, 1123.0, 2448.1] 0.1% 1 | 1062.5 / 1062.6 ± 475.6 [0.0, 807.0, 988.0, 1132.0, 2441.0] 0.1% 2 | 1063.7 / 1063.7 ± 475.4 [0.0, 814.0, 987.0, 1131.2, 2432.1] 0.1% 64 0 | 1052.6 / 1052.6 ± 475.3 [0.0, 793.0, 974.0, 1118.1, 2445.1] 0.1% 1 | 1059.8 / 1059.8 ± 475.1 [0.0, 799.0, 987.5, 1131.0, 2457.1] 0.1% 2 | 1058.5 / 1058.5 ± 472.8 [0.0, 807.0, 985.0, 1127.7, 2442.0] 0.1% Sorting brings * 4.8 tps, flushing significantly reduces tps stddev. Pg comes from +80% unresponsive to nearly always responsive. Performance is significantly better than "small" above, probably thanks to the longer checkpoint timeout. ###### LARGE pgbench -M prepared -N -P 1 -T 7500 -j 2 -c 4 with scale = 1000 (~ 15 GB) postgresql.conf: shared_buffers = 4GB max_wal_size = 2GB checkpoint_timeout = 40min checkpoint_completion_target= 0.8 checkpoint_flush_after = { none, 0, 32, 64} opts # | tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0 head 0 | 68.7 / 65.3 ± 78.6 [0.0, 3.0, 6.0, 136.0, 291.0] 53.1% 1 | 70.6 / 70.3 ± 80.1 [0.0, 4.0, 10.0, 151.0,282.0] 50.1% 2 | 74.3 / 75.8 ± 84.9 [0.0, 4.0, 9.0, 162.0, 311.2] 50.3% 0 0 | 117.2 / 116.9 ± 83.8 [0.0, 14.0, 139.0, 193.0, 372.4] 24.0% 1 | 117.3 / 117.8 ± 83.8 [0.0, 16.0, 140.0, 193.0,279.0] 23.9% 2 | 117.6 / 118.2 ± 84.1 [0.0, 16.0, 141.0, 194.0, 297.8] 23.7% 32 0 | 114.2 / 114.2 ± 45.7 [0.0, 84.0, 100.0, 131.0, 613.6] 0.4% 1 | 112.5 / 112.6 ± 44.0 [0.0, 83.0, 98.0, 130.0,293.0] 0.2% 2 | 108.0 / 108.0 ± 44.7 [0.0, 79.0, 94.0, 124.0, 303.6] 0.3% 64 0 | 113.0 / 113.0 ± 45.5 [0.0, 83.0, 99.0, 131.0, 289.0] 0.4% 1 | 80.0 / 80.3 ± 39.1 [0.0, 56.0, 72.0, 95.0,281.0] 0.8% ?? 2 | 112.2 / 112.3 ± 44.5 [0.0, 82.0, 99.0, 129.0, 282.0] 0.3% Data do not fit in the available memory, so plenty of read accesses. Sorting still has some impact on tps performance (* 1.6), flushing greatly improves responsiveness. ###### ANDRES00 pgbench -M prepared -N -P 1 -T 300 -c 16 -j 16 with scale = 800 (~ 13 GB) postgresql.conf: shared_buffers = 2GB max_wal_size = 100GB wal_level = hot_standby maintenance_work_mem = 2GB checkpoint_timeout = 30s checkpoint_completion_target = 0.8 synchronous_commit = off checkpoint_flush_after = { none,0, 32, 64 } opts # | tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0 head 0 | 328.7 / 329.9 ± 716.9 [0.0, 0.0, 0.0, 0.0, 3221.2] 77.7% 1 | 338.2 / 338.7 ± 728.6 [0.0, 0.0, 0.0, 17.0, 3296.3] 75.0% 2 | 304.5 / 304.3 ± 705.5 [0.0, 0.0, 0.0, 0.0, 3463.4] 79.3% 0 0 | 425.6 / 464.0 ± 724.0 [0.0, 0.0, 0.0, 1000.6, 3363.7] 61.0% 1 | 461.5 / 463.1 ± 735.8 [0.0, 0.0, 0.0,1011.2, 3490.9] 58.7% 2 | 452.4 / 452.6 ± 744.3 [0.0, 0.0, 0.0, 1078.9, 3631.9] 63.3% 32 0 | 514.4 / 515.8 ± 651.8 [0.0, 0.0, 337.4, 808.3, 2876.0] 40.7% 1 | 512.0 / 514.6 ± 661.6 [0.0, 0.0, 317.6, 690.8, 3315.8] 35.0% 2 | 529.5 / 530.3 ± 673.0 [0.0, 0.0, 321.1, 906.4, 3360.8] 40.3% 64 0 | 529.6 / 530.9 ± 668.2 [0.0, 0.0, 322.1, 786.1, 3538.0] 33.3% 1 | 496.4 / 498.0 ± 606.6 [0.0, 0.0, 321.4, 746.0, 2629.6] 36.3% 2 | 521.0 / 521.7 ± 657.0 [0.0, 0.0, 328.4, 737.9, 3262.9] 34.3% Data just hold in memory, maybe. Run is very short, settings are low, this is not representative of an sane installation, this is for testing a lot of checkpoints in a difficult situation. Sorting and flushing do bring significant benefits. ###### ANDRES00b (same as ANDRES00 but scale 800->1000) pgbench -M prepared -N -P 1 -T 300 -c 16 -j 16 with scale = 1000 (~ 15 GB) postgresql.conf: shared_buffers = 2GB max_wal_size = 100GB wal_level = hot_standby maintenance_work_mem = 2GB checkpoint_timeout= 30s checkpoint_completion_target = 0.8 synchronous_commit = off checkpoint_flush_after = { none, 0,32, 64 } opts # | tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0 head 0 | 150.2 / 150.3 ± 401.6 [0.0, 0.0, 0.0, 0.0, 2199.4] 75.1% 1 | 139.2 / 139.2 ± 372.2 [0.0, 0.0, 0.0, 0.0, 2111.4] 78.3% *** 2 | 127.3 / 127.1 ± 341.2 [0.0, 0.0, 0.0, 53.0, 2144.3] 74.7% *** 0 0 | 199.0 / 209.2 ± 400.4 [0.0, 0.0, 0.0, 243.6, 1846.0] 65.7% 1 | 220.4 / 226.7 ± 423.2 [0.0, 0.0, 0.0,264.0, 1777.0] 63.5% * 2 | 195.5 / 205.3 ± 337.9 [0.0, 0.0, 123.0, 212.0, 1721.9] 43.2% 32 0 | 362.3 / 359.0 ± 308.4 [0.0, 200.0, 265.0, 416.4, 1816.6] 5.0% 1 | 323.6 / 321.2 ± 327.1 [0.0, 142.9, 210.0,353.4, 1907.0] 4.0% 2 | 309.0 / 310.7 ± 381.3 [0.0, 122.0, 175.5, 298.0, 2090.4] 5.0% 64 0 | 342.7 / 343.6 ± 331.1 [0.0, 143.0, 239.5, 409.9, 1623.6] 5.3% 1 | 333.8 / 328.2 ± 356.3 [0.0, 132.9, 211.5,358.1, 1629.1] 10.7% ?? 2 | 352.0 / 352.0 ± 332.3 [0.0, 163.5, 239.9, 400.1, 1643.4] 5.3% A little bit larger than previous so that it does not really fit in memory. The performance inpact is significant compared to previous. Sorting and flushing brings * 2 tps, unresponsiveness comes from 75% to reach a better 5%. ###### ANDRES00c (same as ANDRES00b but time 300 -> 4000) opts # | tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0 head 0 | 115.2 / 114.3 ± 256.4 [0.0, 0.0, 75.0, 131.1, 3389.0] 46.5% 1 | 118.4 / 117.9 ± 248.3 [0.0, 0.0, 87.0,151.0, 3603.6] 46.7% 2 | 120.1 / 119.2 ± 254.4 [0.0, 0.0, 91.0, 143.0, 3307.8] 43.8% 0 0 | 217.4 / 211.0 ± 237.1 [0.0, 139.0, 193.0, 239.0, 3115.4] 16.8% 1 | 216.2 / 209.6 ± 244.9 [0.0, 138.9, 188.0,231.0, 3331.3] 16.3% 2 | 218.6 / 213.8 ± 246.7 [0.0, 137.0, 187.0, 232.0, 3229.6] 16.2% 32 0 | 146.6 / 142.5 ± 234.5 [0.0, 59.0, 93.0, 151.1, 3294.7] 17.5% 1 | 148.0 / 142.6 ± 239.2 [0.0, 64.0, 95.9,144.0, 3361.8] 16.0% 2 | 147.6 / 140.4 ± 233.2 [0.0, 59.4, 94.0, 148.0, 3108.4] 18.0% 64 0 | 145.3 / 140.5 ± 233.6 [0.0, 61.0, 93.0, 147.7, 3212.6] 16.5% 1 | 145.6 / 140.3 ± 233.3 [0.0, 58.0, 93.0,146.0, 3351.8] 17.3% 2 | 147.7 / 142.2 ± 233.2 [0.0, 61.0, 97.0, 148.4, 3616.3] 17.0% The only difference between ANDRES00B and ANDRES00C is the duration, from 5 minutes to 66 minutes. This show that short runs can be widelely misleading: In particular the longer runs shows less that half tps for some settings, and the relative comparison of head vs sort vs sort+flush is different. ###### ANDRES00d (same as ANDRES00b but wal_level hot_standby->minimal) opts # | tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0 head 0 | 191.6 / 195.1 ± 439.3 [0.0, 0.0, 0.0, 0.0, 2540.2] 76.3% 1 | 211.3 / 213.6 ± 461.9 [0.0, 0.0, 0.0, 13.0, 3203.7] 75.0% 2 | 152.4 / 154.9 ± 217.6 [0.0, 0.0, 58.0, 235.6, 995.9] 39.3% ??? 0 0 | 247.2 / 251.7 ± 454.0 [0.0, 0.0, 0.0, 375.3, 2592.4] 67.7% 1 | 215.4 / 232.7 ± 446.5 [0.0, 0.0, 0.0,103.0, 3046.7] 72.3% 2 | 160.6 / 160.8 ± 222.1 [0.0, 0.0, 80.0, 209.6, 885.3] 42.0% ??? 32 0 | 399.9 / 397.0 ± 356.6 [0.0, 67.0, 348.0, 572.8, 2604.2] 21.0% 1 | 391.8 / 392.5 ± 371.7 [0.0, 85.5, 314.4,549.3, 2590.3] 20.7% 2 | 406.1 / 404.8 ± 380.6 [0.0, 95.0, 348.5, 569.0, 3383.7] 21.3% 64 0 | 395.9 / 396.1 ± 352.4 [0.0, 89.5, 342.5, 556.0, 2366.9] 17.7% 1 | 355.1 / 351.9 ± 296.7 [0.0, 172.5, 306.1,468.1, 1663.5] 16.0% 2 | 403.6 / 401.8 ± 390.5 [0.0, 0.0, 337.0, 636.1, 2591.3] 26.7% ??? ###### ANDRES00e (same as ANDRES00b but maintenance_work_mem=2GB->64MB) opts # | tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0 head 0 | 153.5 / 161.3 ± 401.3 [0.0, 0.0, 0.0, 0.0, 2546.0] 82.0% 1 | 170.7 / 175.9 ± 399.9 [0.0, 0.0, 0.0, 14.0, 2537.4] 74.7% 2 | 184.7 / 190.4 ± 389.2 [0.0, 0.0, 0.0, 158.5, 2544.6] 69.3% 0 0 | 211.2 / 227.8 ± 418.8 [0.0, 0.0, 0.0, 334.6, 2589.3] 65.7% 1 | 221.7 / 226.0 ± 415.7 [0.0, 0.0, 0.0,276.8, 2588.2] 68.4% 2 | 232.5 / 233.2 ± 403.5 [0.0, 0.0, 0.0, 377.0, 2260.2] 62.0% 32 0 | 373.2 / 374.4 ± 309.2 [0.0, 180.6, 321.8, 475.2, 2596.5] 11.3% 1 | 348.7 / 348.1 ± 328.4 [0.0, 127.0, 284.1,451.9, 2595.1] 17.3% 2 | 376.3 / 375.3 ± 315.5 [0.0, 186.5, 329.6, 487.1, 2365.4] 15.3% 64 0 | 388.9 / 387.8 ± 348.7 [0.0, 164.0, 305.9, 546.5, 2587.2] 15.0% 1 | 380.3 / 378.7 ± 338.8 [0.0, 171.1, 317.4,524.8, 2592.4] 16.7% 2 | 369.8 / 367.4 ± 340.5 [0.0, 77.4, 320.6, 525.5, 2484.7] 20.7% Hmmm, interesting: maintenance_work_mem seems to have some influence on performance, although it is not too consistent between settings, probably because as the memory is used to its limit the performance is quite sensitive to the available memory. -- Fabien.
pgsql-hackers by date: