Thread: postgresql recommendation memory
Please help with advice!
Server
HP ProLiant BL460c G1
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
CPU socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 23
Stepping: 6
CPU MHz: 3000.105
BogoMIPS: 6000.04
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 6144K
NUMA node0 CPU(s): 0-7
32GB RAMServer
HP ProLiant BL460c G1
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
CPU socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 23
Stepping: 6
CPU MHz: 3000.105
BogoMIPS: 6000.04
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 6144K
NUMA node0 CPU(s): 0-7
[root@db3 ~]# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 32765 MB
node 0 free: 317 MB
node distances:
node 0
0: 10
CentOS release 6.3 (Final)
Linux 2.6.32-279.11.1.el6.x86_64 #1 SMP x86_64 GNU/Linux
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
vm.swappiness = 30
vm.dirty_background_bytes = 67108864
vm.dirty_bytes = 536870912
PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit
listen_addresses = '*'
port = 5433
max_connections = 350
shared_buffers = 8GB
temp_buffers = 64MB
max_prepared_transactions = 350
work_mem = 256MB
maintenance_work_mem = 1GB
max_stack_depth = 4MB
max_files_per_process = 5000
effective_io_concurrency = 2
wal_level = hot_standby
synchronous_commit = off
checkpoint_segments = 64
checkpoint_timeout = 15min
checkpoint_completion_target = 0.75
max_wal_senders = 4
wal_sender_delay = 100ms
wal_keep_segments = 128
random_page_cost = 3.0
effective_cache_size = 18GB
autovacuum = on
autovacuum_max_workers = 5
autovacuum_vacuum_threshold = 900
autovacuum_analyze_threshold = 350
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05
log_min_duration_statement = 500
deadlock_timeout = 1s
listen_addresses = '*'
port = 5433
max_connections = 350
shared_buffers = 8GB
temp_buffers = 64MB
max_prepared_transactions = 350
work_mem = 256MB
maintenance_work_mem = 1GB
max_stack_depth = 4MB
max_files_per_process = 5000
effective_io_concurrency = 2
wal_level = hot_standby
synchronous_commit = off
checkpoint_segments = 64
checkpoint_timeout = 15min
checkpoint_completion_target = 0.75
max_wal_senders = 4
wal_sender_delay = 100ms
wal_keep_segments = 128
random_page_cost = 3.0
effective_cache_size = 18GB
autovacuum = on
autovacuum_max_workers = 5
autovacuum_vacuum_threshold = 900
autovacuum_analyze_threshold = 350
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05
log_min_duration_statement = 500
deadlock_timeout = 1s
DB size is about 20GB. There is no high write activity on DB. But periodically in postgresql log i see for example: "select 1" duration is about 500-1000 ms.
In this period of time response time from db terribly. This period of time not bound with high traffic. It is not other app on the server. There is not specific cron job on server.
Our app written on java and use jdbc to connect to DB and internal pooling. There is about 100 connection to DB. This is sar output:
12:00:01 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
09:30:01 PM 73.17 302.72 134790.16 0.00 46809.73 0.00 0.00 0.00 0.00
09:35:01 PM 63.42 655.80 131740.74 0.00 46182.74 0.00 0.00 0.00 0.00
09:40:01 PM 76.87 400.62 122375.34 0.00 42096.27 0.00 0.00 0.00 0.00
09:45:01 PM 58.49 198.33 121922.86 0.00 42765.27 0.00 0.00 0.00 0.00
09:50:01 PM 52.21 485.45 136775.65 0.15 49098.65 0.00 0.00 0.00 0.00
09:55:01 PM 49.68 476.75 130159.24 0.00 45192.54 0.00 0.00 0.00 0.00
10:00:01 PM 41.35 295.34 118655.80 0.00 40786.52 0.00 0.00 0.00 0.00
10:05:01 PM 60.84 593.85 129890.83 0.00 44170.92 0.00 0.00 0.00 0.00
10:10:01 PM 52.08 471.36 132773.63 0.00 46019.13 0.00 2.41 2.41 100.00
10:15:01 PM 73.93 196.50 129384.21 0.33 45255.76 65.92 1.19 66.87 99.64
10:20:02 PM 70.35 473.16 121940.38 0.11 44061.52 81.95 37.79 119.42 99.73
10:25:01 PM 57.84 471.69 130583.33 0.01 46093.33 0.00 0.00 0.00 0.00
10:30:01 PM 52.91 321.62 119264.34 0.01 41748.19 0.00 0.00 0.00 0.00
10:35:01 PM 47.13 451.78 114625.62 0.02 40600.98 0.00 0.00 0.00 0.00
10:40:01 PM 48.96 472.41 102352.79 0.00 35402.17 0.00 0.00 0.00 0.00
10:45:01 PM 70.07 321.33 121423.02 0.00 43052.04 0.00 0.00 0.00 0.00
10:50:01 PM 46.78 479.95 128938.09 0.02 37864.07 116.64 48.97 165.07 99.67
10:55:02 PM 104.84 453.55 109189.98 0.00 37583.50 0.00 0.00 0.00 0.00
11:00:01 PM 46.23 248.75 107313.26 0.00 37278.10 0.00 0.00 0.00 0.00
11:05:01 PM 44.28 446.41 115598.61 0.01 40070.61 0.00 0.00 0.00 0.00
11:10:01 PM 38.86 457.32 100240.71 0.00 34407.29 0.00 0.00 0.00 0.00
11:15:01 PM 48.23 275.60 104780.84 0.00 36183.84 0.00 0.00 0.00 0.00
11:20:01 PM 92.74 432.49 114698.74 0.01 40413.14 0.00 0.00 0.00 0.00
11:25:01 PM 42.76 428.50 87769.28 0.00 29379.87 0.00 0.00 0.00 0.00
11:30:01 PM 36.83 260.34 85072.46 0.00 28234.50 0.00 0.00 0.00 0.00
11:35:01 PM 62.52 481.56 93150.67 0.00 31137.13 0.00 0.00 0.00 0.00
11:40:01 PM 43.50 459.10 90407.34 0.00 30241.70 0.00 0.00 0.00 0.00
12:00:01 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
09:30:01 PM 73.17 302.72 134790.16 0.00 46809.73 0.00 0.00 0.00 0.00
09:35:01 PM 63.42 655.80 131740.74 0.00 46182.74 0.00 0.00 0.00 0.00
09:40:01 PM 76.87 400.62 122375.34 0.00 42096.27 0.00 0.00 0.00 0.00
09:45:01 PM 58.49 198.33 121922.86 0.00 42765.27 0.00 0.00 0.00 0.00
09:50:01 PM 52.21 485.45 136775.65 0.15 49098.65 0.00 0.00 0.00 0.00
09:55:01 PM 49.68 476.75 130159.24 0.00 45192.54 0.00 0.00 0.00 0.00
10:00:01 PM 41.35 295.34 118655.80 0.00 40786.52 0.00 0.00 0.00 0.00
10:05:01 PM 60.84 593.85 129890.83 0.00 44170.92 0.00 0.00 0.00 0.00
10:10:01 PM 52.08 471.36 132773.63 0.00 46019.13 0.00 2.41 2.41 100.00
10:15:01 PM 73.93 196.50 129384.21 0.33 45255.76 65.92 1.19 66.87 99.64
10:20:02 PM 70.35 473.16 121940.38 0.11 44061.52 81.95 37.79 119.42 99.73
10:25:01 PM 57.84 471.69 130583.33 0.01 46093.33 0.00 0.00 0.00 0.00
10:30:01 PM 52.91 321.62 119264.34 0.01 41748.19 0.00 0.00 0.00 0.00
10:35:01 PM 47.13 451.78 114625.62 0.02 40600.98 0.00 0.00 0.00 0.00
10:40:01 PM 48.96 472.41 102352.79 0.00 35402.17 0.00 0.00 0.00 0.00
10:45:01 PM 70.07 321.33 121423.02 0.00 43052.04 0.00 0.00 0.00 0.00
10:50:01 PM 46.78 479.95 128938.09 0.02 37864.07 116.64 48.97 165.07 99.67
10:55:02 PM 104.84 453.55 109189.98 0.00 37583.50 0.00 0.00 0.00 0.00
11:00:01 PM 46.23 248.75 107313.26 0.00 37278.10 0.00 0.00 0.00 0.00
11:05:01 PM 44.28 446.41 115598.61 0.01 40070.61 0.00 0.00 0.00 0.00
11:10:01 PM 38.86 457.32 100240.71 0.00 34407.29 0.00 0.00 0.00 0.00
11:15:01 PM 48.23 275.60 104780.84 0.00 36183.84 0.00 0.00 0.00 0.00
11:20:01 PM 92.74 432.49 114698.74 0.01 40413.14 0.00 0.00 0.00 0.00
11:25:01 PM 42.76 428.50 87769.28 0.00 29379.87 0.00 0.00 0.00 0.00
11:30:01 PM 36.83 260.34 85072.46 0.00 28234.50 0.00 0.00 0.00 0.00
11:35:01 PM 62.52 481.56 93150.67 0.00 31137.13 0.00 0.00 0.00 0.00
11:40:01 PM 43.50 459.10 90407.34 0.00 30241.70 0.00 0.00 0.00 0.00
12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
09:30:01 PM 531792 32345400 98.38 475504 29583340 10211064 27.62
09:35:01 PM 512096 32365096 98.44 475896 29608660 10200916 27.59
09:40:01 PM 455584 32421608 98.61 476276 29638952 10211652 27.62
09:45:01 PM 425744 32451448 98.71 476604 29662384 10206044 27.60
09:50:01 PM 380960 32496232 98.84 477004 29684296 10243704 27.71
09:55:01 PM 385644 32491548 98.83 477312 29706940 10204776 27.60
10:00:01 PM 348604 32528588 98.94 477672 29725476 10228984 27.67
10:05:01 PM 279216 32597976 99.15 478104 29751016 10281748 27.81
10:10:01 PM 255168 32622024 99.22 478220 29769924 10247404 27.72
10:15:01 PM 321188 32556004 99.02 475124 29721912 10234500 27.68
10:20:02 PM 441660 32435532 98.66 472336 29610476 10246288 27.71
10:25:01 PM 440636 32436556 98.66 472636 29634960 10219940 27.64
10:30:01 PM 469872 32407320 98.57 473016 29651476 10208520 27.61
10:35:01 PM 414540 32462652 98.74 473424 29672728 10223964 27.65
10:40:01 PM 354632 32522560 98.92 473772 29693016 10247752 27.72
10:45:01 PM 333708 32543484 98.98 474092 29720256 10227204 27.66
10:50:01 PM 528004 32349188 98.39 469396 29549832 10219536 27.64
10:55:02 PM 499068 32378124 98.48 469692 29587140 10204836 27.60
11:00:01 PM 462980 32414212 98.59 470032 29606764 10235820 27.68
11:05:01 PM 449540 32427652 98.63 470368 29626136 10209788 27.61
11:10:01 PM 419984 32457208 98.72 470772 29644248 10214480 27.63
11:15:01 PM 429900 32447292 98.69 471104 29664292 10202344 27.59
11:20:01 PM 394852 32482340 98.80 471528 29698052 10207604 27.61
11:25:01 PM 345328 32531864 98.95 471904 29717264 10215632 27.63
11:30:01 PM 368224 32508968 98.88 472236 29733544 10206468 27.61
11:35:01 PM 321800 32555392 99.02 472528 29758548 10211820 27.62
11:40:01 PM 282520 32594672 99.14 472860 29776952 10243516 27.71
12:00:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
09:30:01 PM dev253-5 66.29 146.33 483.33 9.50 6.27 94.53 2.08 13.78
09:35:01 PM dev253-5 154.80 126.85 1192.96 8.53 28.57 184.59 1.45 22.43
09:40:01 PM dev253-5 92.21 153.75 686.75 9.11 11.53 125.00 1.87 17.21
09:45:01 PM dev253-5 39.66 116.99 279.32 9.99 0.42 10.66 2.61 10.36
09:50:01 PM dev253-5 106.73 95.58 820.70 8.58 16.77 157.12 1.68 17.88
09:55:01 PM dev253-5 107.90 99.36 831.46 8.63 16.05 148.76 1.71 18.42
10:00:01 PM dev253-5 62.48 82.70 471.28 8.87 5.91 94.52 2.10 13.11
10:05:01 PM dev253-5 137.84 121.69 1064.03 8.60 24.48 177.31 1.56 21.52
10:10:01 PM dev253-5 107.93 104.16 827.83 8.64 16.69 155.04 1.68 18.11
10:15:01 PM dev253-5 40.55 126.12 277.57 9.96 0.41 10.13 2.57 10.42
10:20:02 PM dev253-5 104.33 136.77 793.49 8.92 16.97 162.69 1.76 18.35
10:25:01 PM dev253-5 108.04 115.36 825.26 8.71 16.68 154.36 1.76 19.05
10:30:01 PM dev253-5 69.72 105.66 523.05 9.02 7.45 106.92 1.90 13.25
10:35:01 PM dev253-5 101.58 91.59 781.85 8.60 15.00 147.68 1.67 16.97
10:40:01 PM dev253-5 107.50 97.91 827.17 8.61 17.68 164.49 1.77 19.06
10:45:01 PM dev253-5 69.98 140.13 519.57 9.43 7.09 101.25 1.96 13.72
10:50:01 PM dev253-5 104.30 83.31 806.12 8.53 16.18 155.10 1.65 17.16
10:55:02 PM dev253-5 106.86 209.65 790.27 9.36 15.59 145.08 1.74 18.60
11:00:01 PM dev253-5 50.42 92.08 371.52 9.19 3.05 62.16 2.28 11.52
11:05:01 PM dev253-5 101.06 88.31 776.57 8.56 15.12 149.58 1.67 16.90
11:10:01 PM dev253-5 103.08 77.73 798.23 8.50 17.14 166.25 1.74 17.90
11:15:01 PM dev253-5 57.74 96.45 428.62 9.09 5.23 90.52 2.13 12.32
11:20:01 PM dev253-5 97.73 185.18 727.38 9.34 14.64 149.84 1.94 18.92
11:25:01 PM dev253-5 95.03 85.52 730.31 8.58 14.42 151.79 1.79 16.97
11:30:01 PM dev253-5 53.76 73.65 404.47 8.89 3.94 73.25 2.17 11.64
11:35:01 PM dev253-5 110.37 125.05 842.17 8.76 16.96 153.63 1.66 18.30
11:40:01 PM dev253-5 103.93 87.00 801.59 8.55 16.01 154.00 1.73 18.00
09:30:01 PM 531792 32345400 98.38 475504 29583340 10211064 27.62
09:35:01 PM 512096 32365096 98.44 475896 29608660 10200916 27.59
09:40:01 PM 455584 32421608 98.61 476276 29638952 10211652 27.62
09:45:01 PM 425744 32451448 98.71 476604 29662384 10206044 27.60
09:50:01 PM 380960 32496232 98.84 477004 29684296 10243704 27.71
09:55:01 PM 385644 32491548 98.83 477312 29706940 10204776 27.60
10:00:01 PM 348604 32528588 98.94 477672 29725476 10228984 27.67
10:05:01 PM 279216 32597976 99.15 478104 29751016 10281748 27.81
10:10:01 PM 255168 32622024 99.22 478220 29769924 10247404 27.72
10:15:01 PM 321188 32556004 99.02 475124 29721912 10234500 27.68
10:20:02 PM 441660 32435532 98.66 472336 29610476 10246288 27.71
10:25:01 PM 440636 32436556 98.66 472636 29634960 10219940 27.64
10:30:01 PM 469872 32407320 98.57 473016 29651476 10208520 27.61
10:35:01 PM 414540 32462652 98.74 473424 29672728 10223964 27.65
10:40:01 PM 354632 32522560 98.92 473772 29693016 10247752 27.72
10:45:01 PM 333708 32543484 98.98 474092 29720256 10227204 27.66
10:50:01 PM 528004 32349188 98.39 469396 29549832 10219536 27.64
10:55:02 PM 499068 32378124 98.48 469692 29587140 10204836 27.60
11:00:01 PM 462980 32414212 98.59 470032 29606764 10235820 27.68
11:05:01 PM 449540 32427652 98.63 470368 29626136 10209788 27.61
11:10:01 PM 419984 32457208 98.72 470772 29644248 10214480 27.63
11:15:01 PM 429900 32447292 98.69 471104 29664292 10202344 27.59
11:20:01 PM 394852 32482340 98.80 471528 29698052 10207604 27.61
11:25:01 PM 345328 32531864 98.95 471904 29717264 10215632 27.63
11:30:01 PM 368224 32508968 98.88 472236 29733544 10206468 27.61
11:35:01 PM 321800 32555392 99.02 472528 29758548 10211820 27.62
11:40:01 PM 282520 32594672 99.14 472860 29776952 10243516 27.71
12:00:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
09:30:01 PM dev253-5 66.29 146.33 483.33 9.50 6.27 94.53 2.08 13.78
09:35:01 PM dev253-5 154.80 126.85 1192.96 8.53 28.57 184.59 1.45 22.43
09:40:01 PM dev253-5 92.21 153.75 686.75 9.11 11.53 125.00 1.87 17.21
09:45:01 PM dev253-5 39.66 116.99 279.32 9.99 0.42 10.66 2.61 10.36
09:50:01 PM dev253-5 106.73 95.58 820.70 8.58 16.77 157.12 1.68 17.88
09:55:01 PM dev253-5 107.90 99.36 831.46 8.63 16.05 148.76 1.71 18.42
10:00:01 PM dev253-5 62.48 82.70 471.28 8.87 5.91 94.52 2.10 13.11
10:05:01 PM dev253-5 137.84 121.69 1064.03 8.60 24.48 177.31 1.56 21.52
10:10:01 PM dev253-5 107.93 104.16 827.83 8.64 16.69 155.04 1.68 18.11
10:15:01 PM dev253-5 40.55 126.12 277.57 9.96 0.41 10.13 2.57 10.42
10:20:02 PM dev253-5 104.33 136.77 793.49 8.92 16.97 162.69 1.76 18.35
10:25:01 PM dev253-5 108.04 115.36 825.26 8.71 16.68 154.36 1.76 19.05
10:30:01 PM dev253-5 69.72 105.66 523.05 9.02 7.45 106.92 1.90 13.25
10:35:01 PM dev253-5 101.58 91.59 781.85 8.60 15.00 147.68 1.67 16.97
10:40:01 PM dev253-5 107.50 97.91 827.17 8.61 17.68 164.49 1.77 19.06
10:45:01 PM dev253-5 69.98 140.13 519.57 9.43 7.09 101.25 1.96 13.72
10:50:01 PM dev253-5 104.30 83.31 806.12 8.53 16.18 155.10 1.65 17.16
10:55:02 PM dev253-5 106.86 209.65 790.27 9.36 15.59 145.08 1.74 18.60
11:00:01 PM dev253-5 50.42 92.08 371.52 9.19 3.05 62.16 2.28 11.52
11:05:01 PM dev253-5 101.06 88.31 776.57 8.56 15.12 149.58 1.67 16.90
11:10:01 PM dev253-5 103.08 77.73 798.23 8.50 17.14 166.25 1.74 17.90
11:15:01 PM dev253-5 57.74 96.45 428.62 9.09 5.23 90.52 2.13 12.32
11:20:01 PM dev253-5 97.73 185.18 727.38 9.34 14.64 149.84 1.94 18.92
11:25:01 PM dev253-5 95.03 85.52 730.31 8.58 14.42 151.79 1.79 16.97
11:30:01 PM dev253-5 53.76 73.65 404.47 8.89 3.94 73.25 2.17 11.64
11:35:01 PM dev253-5 110.37 125.05 842.17 8.76 16.96 153.63 1.66 18.30
11:40:01 PM dev253-5 103.93 87.00 801.59 8.55 16.01 154.00 1.73 18.00
As you can see there is no high io activity in this period of time but db is frozen. My opinion that i have incorrect kernel setting and/or i have a mistake in postgresql.conf. Because there is not high activity on db. load avg is about 1. When there is high traffic is about 1.15. This is from nagios monitoring system.
But sometimes load is about 4 and this time matches with sar %vmeff = 100% and database response time increase.
--
С уважением Селявка Евгений
Hello desmodemone, i look again and again through my sar statistics and i don't think that my db swapping in freeze time. For example:
sar -B
12:00:02 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
09:40:01 PM 66.13 352.43 195070.33 0.00 70627.21 0.00 0.00 0.00 0.00
09:45:01 PM 54.55 526.87 190893.02 0.00 67850.63 25.76 2.40 28.16 100.00
09:50:01 PM 74.97 509.70 200564.75 0.00 71807.80 0.00 0.00 0.00 0.00
09:55:01 PM 81.54 443.57 186335.43 0.05 68486.83 127.33 160.26 287.35 99.92
10:00:01 PM 62.03 528.46 169701.41 0.00 60487.37 0.00 15.62 15.62 100.00
10:05:01 PM 64.61 504.76 178725.60 0.00 66251.26 0.00 15.80 15.80 100.00
10:10:01 PM 80.06 336.47 172819.14 0.00 62379.45 0.00 0.00 0.00 0.00
10:15:01 PM 59.69 512.85 180228.56 0.00 64091.90 0.00 0.00 0.00 0.00
sar -S sar -B
12:00:02 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
09:40:01 PM 66.13 352.43 195070.33 0.00 70627.21 0.00 0.00 0.00 0.00
09:45:01 PM 54.55 526.87 190893.02 0.00 67850.63 25.76 2.40 28.16 100.00
09:50:01 PM 74.97 509.70 200564.75 0.00 71807.80 0.00 0.00 0.00 0.00
09:55:01 PM 81.54 443.57 186335.43 0.05 68486.83 127.33 160.26 287.35 99.92
10:00:01 PM 62.03 528.46 169701.41 0.00 60487.37 0.00 15.62 15.62 100.00
10:05:01 PM 64.61 504.76 178725.60 0.00 66251.26 0.00 15.80 15.80 100.00
10:10:01 PM 80.06 336.47 172819.14 0.00 62379.45 0.00 0.00 0.00 0.00
10:15:01 PM 59.69 512.85 180228.56 0.00 64091.90 0.00 0.00 0.00 0.00
12:00:02 AM kbswpfree kbswpused %swpused kbswpcad %swpcad
09:40:01 PM 4095420 572 0.01 252 44.06
09:45:01 PM 4095420 572 0.01 252 44.06
09:50:01 PM 4095420 572 0.01 252 44.06
09:55:01 PM 4095420 572 0.01 252 44.06
10:00:01 PM 4095420 572 0.01 252 44.06
10:05:01 PM 4095420 572 0.01 252 44.06
10:10:01 PM 4095420 572 0.01 252 44.06
10:15:01 PM 4095420 572 0.01 252 44.06
I set work_mem to 1/4 from available RAM. I have 32Gb RAM so i set shared_buffers to 8Gb.
Now i also set
vm.dirty_bytes=67108864 this value equal my Smart Array E200i Cache Size.
vm.dirty_bytes=67108864 this value equal my Smart Array E200i Cache Size.
vm.dirty_background_bytes = 16777216 - 1/4 from vm.dirty_bytes
Next step try to set correct values for:
bgwriter_delay
bgwriter_lru_maxpages
bgwriter_lru_multiplier
bgwriter_delay
bgwriter_lru_maxpages
bgwriter_lru_multiplier
And one more fact, if i cleanup fs cache. sync && echo 1 > /proc/sys/vm/drop_caches. OS releases about 10-12Gb memory and freeze time comes when fs cache comes again full. For example one or two day there is no freeze on DB.
2013/11/4 desmodemone <desmodemone@gmail.com>
Have a nice dayI think you have to analyze why you are swapping. Are ther eonly postgresql inside ? is it possible you are using too much work_mem memory ?Hello,I see your request on performance mailing list. I think your server is swapping and because yoru swap is in the same RAID disk with all (/ , database satastore etc ) you ncounter a freeze of system.2013/11/2 Евгений Селявка <evg.selyavka@gmail.com>RAID1 2x146GB 10k rpmPlease help with advice!32GB RAM
Server
HP ProLiant BL460c G1
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
CPU socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 23
Stepping: 6
CPU MHz: 3000.105
BogoMIPS: 6000.04
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 6144K
NUMA node0 CPU(s): 0-7
[root@db3 ~]# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 32765 MB
node 0 free: 317 MB
node distances:
node 0
0: 10
CentOS release 6.3 (Final)
Linux 2.6.32-279.11.1.el6.x86_64 #1 SMP x86_64 GNU/Linux
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
vm.swappiness = 30
vm.dirty_background_bytes = 67108864
vm.dirty_bytes = 536870912PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit
listen_addresses = '*'
port = 5433
max_connections = 350
shared_buffers = 8GB
temp_buffers = 64MB
max_prepared_transactions = 350
work_mem = 256MB
maintenance_work_mem = 1GB
max_stack_depth = 4MB
max_files_per_process = 5000
effective_io_concurrency = 2
wal_level = hot_standby
synchronous_commit = off
checkpoint_segments = 64
checkpoint_timeout = 15min
checkpoint_completion_target = 0.75
max_wal_senders = 4
wal_sender_delay = 100ms
wal_keep_segments = 128
random_page_cost = 3.0
effective_cache_size = 18GB
autovacuum = on
autovacuum_max_workers = 5
autovacuum_vacuum_threshold = 900
autovacuum_analyze_threshold = 350
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05
log_min_duration_statement = 500
deadlock_timeout = 1sDB size is about 20GB. There is no high write activity on DB. But periodically in postgresql log i see for example: "select 1" duration is about 500-1000 ms.In this period of time response time from db terribly. This period of time not bound with high traffic. It is not other app on the server. There is not specific cron job on server.Our app written on java and use jdbc to connect to DB and internal pooling. There is about 100 connection to DB. This is sar output:
12:00:01 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
09:30:01 PM 73.17 302.72 134790.16 0.00 46809.73 0.00 0.00 0.00 0.00
09:35:01 PM 63.42 655.80 131740.74 0.00 46182.74 0.00 0.00 0.00 0.00
09:40:01 PM 76.87 400.62 122375.34 0.00 42096.27 0.00 0.00 0.00 0.00
09:45:01 PM 58.49 198.33 121922.86 0.00 42765.27 0.00 0.00 0.00 0.00
09:50:01 PM 52.21 485.45 136775.65 0.15 49098.65 0.00 0.00 0.00 0.00
09:55:01 PM 49.68 476.75 130159.24 0.00 45192.54 0.00 0.00 0.00 0.00
10:00:01 PM 41.35 295.34 118655.80 0.00 40786.52 0.00 0.00 0.00 0.00
10:05:01 PM 60.84 593.85 129890.83 0.00 44170.92 0.00 0.00 0.00 0.00
10:10:01 PM 52.08 471.36 132773.63 0.00 46019.13 0.00 2.41 2.41 100.00
10:15:01 PM 73.93 196.50 129384.21 0.33 45255.76 65.92 1.19 66.87 99.64
10:20:02 PM 70.35 473.16 121940.38 0.11 44061.52 81.95 37.79 119.42 99.73
10:25:01 PM 57.84 471.69 130583.33 0.01 46093.33 0.00 0.00 0.00 0.00
10:30:01 PM 52.91 321.62 119264.34 0.01 41748.19 0.00 0.00 0.00 0.00
10:35:01 PM 47.13 451.78 114625.62 0.02 40600.98 0.00 0.00 0.00 0.00
10:40:01 PM 48.96 472.41 102352.79 0.00 35402.17 0.00 0.00 0.00 0.00
10:45:01 PM 70.07 321.33 121423.02 0.00 43052.04 0.00 0.00 0.00 0.00
10:50:01 PM 46.78 479.95 128938.09 0.02 37864.07 116.64 48.97 165.07 99.67
10:55:02 PM 104.84 453.55 109189.98 0.00 37583.50 0.00 0.00 0.00 0.00
11:00:01 PM 46.23 248.75 107313.26 0.00 37278.10 0.00 0.00 0.00 0.00
11:05:01 PM 44.28 446.41 115598.61 0.01 40070.61 0.00 0.00 0.00 0.00
11:10:01 PM 38.86 457.32 100240.71 0.00 34407.29 0.00 0.00 0.00 0.00
11:15:01 PM 48.23 275.60 104780.84 0.00 36183.84 0.00 0.00 0.00 0.00
11:20:01 PM 92.74 432.49 114698.74 0.01 40413.14 0.00 0.00 0.00 0.00
11:25:01 PM 42.76 428.50 87769.28 0.00 29379.87 0.00 0.00 0.00 0.00
11:30:01 PM 36.83 260.34 85072.46 0.00 28234.50 0.00 0.00 0.00 0.00
11:35:01 PM 62.52 481.56 93150.67 0.00 31137.13 0.00 0.00 0.00 0.00
11:40:01 PM 43.50 459.10 90407.34 0.00 30241.70 0.00 0.00 0.00 0.0012:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
09:30:01 PM 531792 32345400 98.38 475504 29583340 10211064 27.62
09:35:01 PM 512096 32365096 98.44 475896 29608660 10200916 27.59
09:40:01 PM 455584 32421608 98.61 476276 29638952 10211652 27.62
09:45:01 PM 425744 32451448 98.71 476604 29662384 10206044 27.60
09:50:01 PM 380960 32496232 98.84 477004 29684296 10243704 27.71
09:55:01 PM 385644 32491548 98.83 477312 29706940 10204776 27.60
10:00:01 PM 348604 32528588 98.94 477672 29725476 10228984 27.67
10:05:01 PM 279216 32597976 99.15 478104 29751016 10281748 27.81
10:10:01 PM 255168 32622024 99.22 478220 29769924 10247404 27.72
10:15:01 PM 321188 32556004 99.02 475124 29721912 10234500 27.68
10:20:02 PM 441660 32435532 98.66 472336 29610476 10246288 27.71
10:25:01 PM 440636 32436556 98.66 472636 29634960 10219940 27.64
10:30:01 PM 469872 32407320 98.57 473016 29651476 10208520 27.61
10:35:01 PM 414540 32462652 98.74 473424 29672728 10223964 27.65
10:40:01 PM 354632 32522560 98.92 473772 29693016 10247752 27.72
10:45:01 PM 333708 32543484 98.98 474092 29720256 10227204 27.66
10:50:01 PM 528004 32349188 98.39 469396 29549832 10219536 27.64
10:55:02 PM 499068 32378124 98.48 469692 29587140 10204836 27.60
11:00:01 PM 462980 32414212 98.59 470032 29606764 10235820 27.68
11:05:01 PM 449540 32427652 98.63 470368 29626136 10209788 27.61
11:10:01 PM 419984 32457208 98.72 470772 29644248 10214480 27.63
11:15:01 PM 429900 32447292 98.69 471104 29664292 10202344 27.59
11:20:01 PM 394852 32482340 98.80 471528 29698052 10207604 27.61
11:25:01 PM 345328 32531864 98.95 471904 29717264 10215632 27.63
11:30:01 PM 368224 32508968 98.88 472236 29733544 10206468 27.61
11:35:01 PM 321800 32555392 99.02 472528 29758548 10211820 27.62
11:40:01 PM 282520 32594672 99.14 472860 29776952 10243516 27.71
12:00:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
09:30:01 PM dev253-5 66.29 146.33 483.33 9.50 6.27 94.53 2.08 13.78
09:35:01 PM dev253-5 154.80 126.85 1192.96 8.53 28.57 184.59 1.45 22.43
09:40:01 PM dev253-5 92.21 153.75 686.75 9.11 11.53 125.00 1.87 17.21
09:45:01 PM dev253-5 39.66 116.99 279.32 9.99 0.42 10.66 2.61 10.36
09:50:01 PM dev253-5 106.73 95.58 820.70 8.58 16.77 157.12 1.68 17.88
09:55:01 PM dev253-5 107.90 99.36 831.46 8.63 16.05 148.76 1.71 18.42
10:00:01 PM dev253-5 62.48 82.70 471.28 8.87 5.91 94.52 2.10 13.11
10:05:01 PM dev253-5 137.84 121.69 1064.03 8.60 24.48 177.31 1.56 21.52
10:10:01 PM dev253-5 107.93 104.16 827.83 8.64 16.69 155.04 1.68 18.11
10:15:01 PM dev253-5 40.55 126.12 277.57 9.96 0.41 10.13 2.57 10.42
10:20:02 PM dev253-5 104.33 136.77 793.49 8.92 16.97 162.69 1.76 18.35
10:25:01 PM dev253-5 108.04 115.36 825.26 8.71 16.68 154.36 1.76 19.05
10:30:01 PM dev253-5 69.72 105.66 523.05 9.02 7.45 106.92 1.90 13.25
10:35:01 PM dev253-5 101.58 91.59 781.85 8.60 15.00 147.68 1.67 16.97
10:40:01 PM dev253-5 107.50 97.91 827.17 8.61 17.68 164.49 1.77 19.06
10:45:01 PM dev253-5 69.98 140.13 519.57 9.43 7.09 101.25 1.96 13.72
10:50:01 PM dev253-5 104.30 83.31 806.12 8.53 16.18 155.10 1.65 17.16
10:55:02 PM dev253-5 106.86 209.65 790.27 9.36 15.59 145.08 1.74 18.60
11:00:01 PM dev253-5 50.42 92.08 371.52 9.19 3.05 62.16 2.28 11.52
11:05:01 PM dev253-5 101.06 88.31 776.57 8.56 15.12 149.58 1.67 16.90
11:10:01 PM dev253-5 103.08 77.73 798.23 8.50 17.14 166.25 1.74 17.90
11:15:01 PM dev253-5 57.74 96.45 428.62 9.09 5.23 90.52 2.13 12.32
11:20:01 PM dev253-5 97.73 185.18 727.38 9.34 14.64 149.84 1.94 18.92
11:25:01 PM dev253-5 95.03 85.52 730.31 8.58 14.42 151.79 1.79 16.97
11:30:01 PM dev253-5 53.76 73.65 404.47 8.89 3.94 73.25 2.17 11.64
11:35:01 PM dev253-5 110.37 125.05 842.17 8.76 16.96 153.63 1.66 18.30
11:40:01 PM dev253-5 103.93 87.00 801.59 8.55 16.01 154.00 1.73 18.00As you can see there is no high io activity in this period of time but db is frozen. My opinion that i have incorrect kernel setting and/or i have a mistake in postgresql.conf. Because there is not high activity on db. load avg is about 1. When there is high traffic is about 1.15. This is from nagios monitoring system.But sometimes load is about 4 and this time matches with sar %vmeff = 100% and database response time increase.
--
С уважением Селявка Евгений
--
С уважением Селявка Евгений
> PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 > 20120305 (Red Hat 4.4.6-4), 64-bit First, you should be using the latest update version. You are currently missing multiple patch updates. > listen_addresses = '*' > port = 5433 > max_connections = 350 > shared_buffers = 8GB Try dropping shared_buffers to 2GB. We've seen some issues on certain systems with 8GB shared buffers. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Tue, Nov 5, 2013 at 8:37 AM, Евгений Селявка <evg.selyavka@gmail.com> wrote: > I set work_mem to 1/4 from available RAM. I have 32Gb RAM so i set > shared_buffers to 8Gb. I am sure you are mentioning shared_buffers here and not work_mem. work_mem is a per-operation parameter. So if you are using an operation involving work_mem more than 4 times simultaneously on different sessions you'll swap pretty quickly ;) -- Michael
On Sat, Nov 2, 2013 at 12:54 PM, Евгений Селявка <evg.selyavka@gmail.com> wrote: SNIP > max_connections = 350 SNIP > work_mem = 256MB These two settings together are quite dangerous. 1: Look into a db pooler to get your connections needed down to no more than 2x # of cores in your machine. I recommend pgbouncer 2: Your current settings mean that if you max out connections and each of those connections does a large sort at the same time, they'll try to allocated 256MB*250 or 89,600MB. If you run one job that can use that much work_mem, then set it by connection or user for that one connection or user only. Allowing any process to allocate 256MB is usually a bad idea, and doubly so if you allow 350 incoming connections. Dropping work_mem to 16MB means a ceiling of about 5G memory if you get swamped and each query is averaging 1 sort. Note that a single query CAN run > 1 sort, so it's not a hard limit and you could still swamp your machine, but it's less likely. 3: Turn off the OOM killer. On a production DB it is unwelcome and could cause some serious issues. 4: vm.swappiness = 0 is my normal setting. I also tend to just turn off swap on big memory linux boxes because linux cirtual memory is often counterproductive on db servers. Some people might even say it is broken, I tend to agree. Better to have a process fail to allocate memory and report it in logs than have a machine slow to a crawl under load. But that's your choice. And 64G isn't that big, so you're in the in between zone for me on whether to just turn off swap. 5: turn down shared_buffers to 1 or 2G. 6: lower all your vm dirty ratio / size settings so that the machine never has to write a lot at one time. Basically don't TRY to allocate all the memory, try to leave 75% or so free for the OS to allocate as buffers. After getting a baseline for performance under load then make bigger changes.
Thank you for advice.
1)
First off all, we use java app with jdbc driver wich can pool connection, thats why i don't think that this is good decision to put one more pooler between app and DB. May be someone have an experience with pgbouncer and jdbc and could give a good advice with advantage and disadvantage of this architecture.1)
3)
4)
5)
I will be planing downtime and decrease max_connection and shared_buffers.
vm.dirty_bytes=67108864 this value equal my Smart Array E200i Cache Size.
<Basically don't TRY to allocate all the memory, try to leave 75% or so
<free for the OS to allocate as buffers. After getting a baseline for
<performance under load then make bigger changes
vm.dirty_background_bytes = 16777216 - 1/4 from vm.dirty_bytes
<Basically don't TRY to allocate all the memory, try to leave 75% or so
<free for the OS to allocate as buffers. After getting a baseline for
<performance under load then make bigger changes
This means that i should set effective_cache_size to 75% of my RAM?
2013/11/6 Scott Marlowe <scott.marlowe@gmail.com>
On Sat, Nov 2, 2013 at 12:54 PM, Евгений Селявка <evg.selyavka@gmail.com> wrote:
SNIP
> max_connections = 350
SNIP
> work_mem = 256MB
These two settings together are quite dangerous.
1: Look into a db pooler to get your connections needed down to no
more than 2x # of cores in your machine. I recommend pgbouncer
2: Your current settings mean that if you max out connections and each
of those connections does a large sort at the same time, they'll try
to allocated 256MB*250 or 89,600MB. If you run one job that can use
that much work_mem, then set it by connection or user for that one
connection or user only. Allowing any process to allocate 256MB is
usually a bad idea, and doubly so if you allow 350 incoming
connections. Dropping work_mem to 16MB means a ceiling of about 5G
memory if you get swamped and each query is averaging 1 sort. Note
that a single query CAN run > 1 sort, so it's not a hard limit and you
could still swamp your machine, but it's less likely.
3: Turn off the OOM killer. On a production DB it is unwelcome and
could cause some serious issues.
4: vm.swappiness = 0 is my normal setting. I also tend to just turn
off swap on big memory linux boxes because linux cirtual memory is
often counterproductive on db servers. Some people might even say it
is broken, I tend to agree. Better to have a process fail to allocate
memory and report it in logs than have a machine slow to a crawl under
load. But that's your choice. And 64G isn't that big, so you're in the
in between zone for me on whether to just turn off swap.
5: turn down shared_buffers to 1 or 2G.
6: lower all your vm dirty ratio / size settings so that the machine
never has to write a lot at one time.
Basically don't TRY to allocate all the memory, try to leave 75% or so
free for the OS to allocate as buffers. After getting a baseline for
performance under load then make bigger changes.
--
С уважением Селявка Евгений
On Wed, Nov 6, 2013 at 1:53 AM, Евгений Селявка <evg.selyavka@gmail.com> wrote: > Thank you for advice. > > 1) > First off all, we use java app with jdbc driver wich can pool connection, > thats why i don't think that this is good decision to put one more pooler > between app and DB. May be someone have an experience with pgbouncer and > jdbc and could give a good advice with advantage and disadvantage of this > architecture. That's a mostly religious argument. I.e. you're going on feeling here that pooling in jdbc alone is better than either jdbc/pgbouncer or plain pgbouncer alone. My experience is that jdbc pooling is not in the same category as pgbouncer for configuration and performance. Either way, get that connection count down to something reasonable. If you've routinely got 200+ connections you've got too many. Again, 2x cores is about max on most machines for maximum throughput. 3x to 4x is the absolute max really. Unless you've got a machine with 40+ cores, you should be running a lot fewer connections. Basically pgbouncer is veyr lightweight and can take thousands of incoming connections and balance them into a few dozen connections to the database. > 2) Yes this is my error in configuration and every day or two i decrease > work_mem and monitor for my system and postgresql log try to find record > about temp files. I will decrease work_mem to 16MB or may be 32MB. But > really i repeat that i have about 100 concurrent connections to my DB. I set > this value with big reserve. I can't change this setting because db in > production. If you've got one job that needs lots of mem and lot of jobs that don't, look at my recommendation to lower work_mem for all the low mem requiring jobs. If you can split those heavy lifting jobs out to another user, then you can use a pooler like pgbouncer to do admission control by limiting that heavy lifter to a few connections at a time. The rest will wait in line behind it. > 3) > I also read about disabling OOM killer but when i set > vm.overcommit_memory=2. My DB work couple of a day and then pacemaker stop > it because i set wrong value for vm.overcommit_ratio i set it to 90. And > when pacemaker try to execute psql -c 'select 1' postmaster return 'out of > memory' and pacemaker stop my production DB. I need to know what is the > correct value for vm.overcommit_ratio or how postmaster allocate memory when > fork may be formula or something? If i get answers on this question i can > pick up vm.overcommit_ratio. You are definitely running your server out of memory then. Can you throw say 256G into it? It's usually worth every penny to throw memory at the problem. Reducing usage will help a lot for now tho. > 4) > About vm.swappiness totally agree and i turn it on for experiment goals, > because i have problem and my db freeze. I play with different kernel > setting try to pick up correct value. In the beginning i set it to 0 and > all works fine. The linux kernel is crazy about swapping. even with swappinness set to 0, it'll swap stuff out once it's gotten old. suddenly shared_buffers are on disk not in ram etc. On big memory machines (we use 128G to 1TB memory at work) I just turn it off because the bigger the memory the dumber it seems to get. > 6) I set to this values > vm.dirty_bytes=67108864 this value equal my Smart Array E200i Cache Size. > vm.dirty_background_bytes = 16777216 - 1/4 from vm.dirty_bytes Ahh I usually set the ratio for dirty_ratio to something small like 5 to 10 and dirty_background_ratio to 1. The less bursty the dirty background stuff is the better in db land. Your numbers are fairly low assuming dirty bytes in in bytes and not kilobytes or something. :) I never use it do I'm not sure one way or the other. > <Basically don't TRY to allocate all the memory, try to leave 75% or so > <free for the OS to allocate as buffers. After getting a baseline for > <performance under load then make bigger changes > > This means that i should set effective_cache_size to 75% of my RAM? That is a reasonable number, especially once you get the machine to stop using so much memory for sorts and shared_buffers. The idea is that when you look at free, after the db's been up for a day or two, you should see 75% or so of your RAM allocated to cache / buffers. Good luck, keep us informed on your progress.
As a followup to my previous message, here's a response curve on a 48 core server I used at my last job. https://picasaweb.google.com/lh/photo/aPYHPWPivPsS79fG3AKtZNMTjNZETYmyPJy0liipFm0?feat=directlink Note the peak at around 38 to 48 cores. This is the sweetspot on this server for connections. If I allow this server to get to 68 to 70 hard working connections my throughput drops by half. It's far better to have inbound connections sit in a queue in a connection pooler and wait their turn than to have them all clobber this server at one time. Of course the issue here is active connections, not idle ones. So if you have say 150 idle and 50 active connections on this server you'd be doing fine. Until load started to climb. Then as the number of active connections went past 50, it would get slower in a non-linear fashion. Since the throughput is half, at 70 or so connections each query would now be 1/4th or so as fast as they had been when we had 35 or so. So it's a good idea to get some idea of where that sweet spot is on your server and stay under it with a good pooler.
Also also, the definitive page for postgres and dirty pages etc is here: http://www.westnet.com/~gsmith/content/linux-pdflush.htm Not sure if it's out of date with more modern kernels. Maybe Greg will chime in.
On Sat, Nov 2, 2013 at 1:54 PM, Евгений Селявка <evg.selyavka@gmail.com> wrote: > Please help with advice! > > Server > HP ProLiant BL460c G1 > > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 8 > On-line CPU(s) list: 0-7 > Thread(s) per core: 1 > Core(s) per socket: 4 > CPU socket(s): 2 > NUMA node(s): 1 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 23 > Stepping: 6 > CPU MHz: 3000.105 > BogoMIPS: 6000.04 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 6144K > NUMA node0 CPU(s): 0-7 > > 32GB RAM > [root@db3 ~]# numactl --hardware > available: 1 nodes (0) > node 0 cpus: 0 1 2 3 4 5 6 7 > node 0 size: 32765 MB > node 0 free: 317 MB > node distances: > node 0 > 0: 10 > > > RAID1 2x146GB 10k rpm > > CentOS release 6.3 (Final) > Linux 2.6.32-279.11.1.el6.x86_64 #1 SMP x86_64 GNU/Linux > > > kernel.msgmnb = 65536 > kernel.msgmax = 65536 > kernel.shmmax = 68719476736 > kernel.shmall = 4294967296 > vm.swappiness = 30 > vm.dirty_background_bytes = 67108864 > vm.dirty_bytes = 536870912 > > > PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 > 20120305 (Red Hat 4.4.6-4), 64-bit > > listen_addresses = '*' > port = 5433 > max_connections = 350 > shared_buffers = 8GB > temp_buffers = 64MB > max_prepared_transactions = 350 > work_mem = 256MB > maintenance_work_mem = 1GB > max_stack_depth = 4MB > max_files_per_process = 5000 > effective_io_concurrency = 2 > wal_level = hot_standby > synchronous_commit = off > checkpoint_segments = 64 > checkpoint_timeout = 15min > checkpoint_completion_target = 0.75 > max_wal_senders = 4 > wal_sender_delay = 100ms > wal_keep_segments = 128 > random_page_cost = 3.0 > effective_cache_size = 18GB > autovacuum = on > autovacuum_max_workers = 5 > autovacuum_vacuum_threshold = 900 > autovacuum_analyze_threshold = 350 > autovacuum_vacuum_scale_factor = 0.1 > autovacuum_analyze_scale_factor = 0.05 > log_min_duration_statement = 500 > deadlock_timeout = 1s > > > DB size is about 20GB. There is no high write activity on DB. But > periodically in postgresql log i see for example: "select 1" duration is > about 500-1000 ms. > > In this period of time response time from db terribly. This period of time > not bound with high traffic. It is not other app on the server. There is not > specific cron job on server. > > Our app written on java and use jdbc to connect to DB and internal pooling. > There is about 100 connection to DB. This is sar output: > > 12:00:01 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s > pgscand/s pgsteal/s %vmeff > 09:30:01 PM 73.17 302.72 134790.16 0.00 46809.73 0.00 > 0.00 0.00 0.00 > 09:35:01 PM 63.42 655.80 131740.74 0.00 46182.74 0.00 > 0.00 0.00 0.00 > 09:40:01 PM 76.87 400.62 122375.34 0.00 42096.27 0.00 > 0.00 0.00 0.00 > 09:45:01 PM 58.49 198.33 121922.86 0.00 42765.27 0.00 > 0.00 0.00 0.00 > 09:50:01 PM 52.21 485.45 136775.65 0.15 49098.65 0.00 > 0.00 0.00 0.00 > 09:55:01 PM 49.68 476.75 130159.24 0.00 45192.54 0.00 > 0.00 0.00 0.00 > 10:00:01 PM 41.35 295.34 118655.80 0.00 40786.52 0.00 > 0.00 0.00 0.00 > 10:05:01 PM 60.84 593.85 129890.83 0.00 44170.92 0.00 > 0.00 0.00 0.00 > 10:10:01 PM 52.08 471.36 132773.63 0.00 46019.13 0.00 > 2.41 2.41 100.00 > 10:15:01 PM 73.93 196.50 129384.21 0.33 45255.76 65.92 > 1.19 66.87 99.64 > 10:20:02 PM 70.35 473.16 121940.38 0.11 44061.52 81.95 > 37.79 119.42 99.73 > 10:25:01 PM 57.84 471.69 130583.33 0.01 46093.33 0.00 > 0.00 0.00 0.00 > 10:30:01 PM 52.91 321.62 119264.34 0.01 41748.19 0.00 > 0.00 0.00 0.00 > 10:35:01 PM 47.13 451.78 114625.62 0.02 40600.98 0.00 > 0.00 0.00 0.00 > 10:40:01 PM 48.96 472.41 102352.79 0.00 35402.17 0.00 > 0.00 0.00 0.00 > 10:45:01 PM 70.07 321.33 121423.02 0.00 43052.04 0.00 > 0.00 0.00 0.00 > 10:50:01 PM 46.78 479.95 128938.09 0.02 37864.07 116.64 > 48.97 165.07 99.67 > 10:55:02 PM 104.84 453.55 109189.98 0.00 37583.50 0.00 > 0.00 0.00 0.00 > 11:00:01 PM 46.23 248.75 107313.26 0.00 37278.10 0.00 > 0.00 0.00 0.00 > 11:05:01 PM 44.28 446.41 115598.61 0.01 40070.61 0.00 > 0.00 0.00 0.00 > 11:10:01 PM 38.86 457.32 100240.71 0.00 34407.29 0.00 > 0.00 0.00 0.00 > 11:15:01 PM 48.23 275.60 104780.84 0.00 36183.84 0.00 > 0.00 0.00 0.00 > 11:20:01 PM 92.74 432.49 114698.74 0.01 40413.14 0.00 > 0.00 0.00 0.00 > 11:25:01 PM 42.76 428.50 87769.28 0.00 29379.87 0.00 > 0.00 0.00 0.00 > 11:30:01 PM 36.83 260.34 85072.46 0.00 28234.50 0.00 > 0.00 0.00 0.00 > 11:35:01 PM 62.52 481.56 93150.67 0.00 31137.13 0.00 > 0.00 0.00 0.00 > 11:40:01 PM 43.50 459.10 90407.34 0.00 30241.70 0.00 > 0.00 0.00 0.00 > > 12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit > %commit > 09:30:01 PM 531792 32345400 98.38 475504 29583340 10211064 > 27.62 > 09:35:01 PM 512096 32365096 98.44 475896 29608660 10200916 > 27.59 > 09:40:01 PM 455584 32421608 98.61 476276 29638952 10211652 > 27.62 > 09:45:01 PM 425744 32451448 98.71 476604 29662384 10206044 > 27.60 > 09:50:01 PM 380960 32496232 98.84 477004 29684296 10243704 > 27.71 > 09:55:01 PM 385644 32491548 98.83 477312 29706940 10204776 > 27.60 > 10:00:01 PM 348604 32528588 98.94 477672 29725476 10228984 > 27.67 > 10:05:01 PM 279216 32597976 99.15 478104 29751016 10281748 > 27.81 > 10:10:01 PM 255168 32622024 99.22 478220 29769924 10247404 > 27.72 > 10:15:01 PM 321188 32556004 99.02 475124 29721912 10234500 > 27.68 > 10:20:02 PM 441660 32435532 98.66 472336 29610476 10246288 > 27.71 > 10:25:01 PM 440636 32436556 98.66 472636 29634960 10219940 > 27.64 > 10:30:01 PM 469872 32407320 98.57 473016 29651476 10208520 > 27.61 > 10:35:01 PM 414540 32462652 98.74 473424 29672728 10223964 > 27.65 > 10:40:01 PM 354632 32522560 98.92 473772 29693016 10247752 > 27.72 > 10:45:01 PM 333708 32543484 98.98 474092 29720256 10227204 > 27.66 > 10:50:01 PM 528004 32349188 98.39 469396 29549832 10219536 > 27.64 > 10:55:02 PM 499068 32378124 98.48 469692 29587140 10204836 > 27.60 > 11:00:01 PM 462980 32414212 98.59 470032 29606764 10235820 > 27.68 > 11:05:01 PM 449540 32427652 98.63 470368 29626136 10209788 > 27.61 > 11:10:01 PM 419984 32457208 98.72 470772 29644248 10214480 > 27.63 > 11:15:01 PM 429900 32447292 98.69 471104 29664292 10202344 > 27.59 > 11:20:01 PM 394852 32482340 98.80 471528 29698052 10207604 > 27.61 > 11:25:01 PM 345328 32531864 98.95 471904 29717264 10215632 > 27.63 > 11:30:01 PM 368224 32508968 98.88 472236 29733544 10206468 > 27.61 > 11:35:01 PM 321800 32555392 99.02 472528 29758548 10211820 > 27.62 > 11:40:01 PM 282520 32594672 99.14 472860 29776952 10243516 > 27.71 > > 12:00:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz > await svctm %util > 09:30:01 PM dev253-5 66.29 146.33 483.33 9.50 6.27 > 94.53 2.08 13.78 > 09:35:01 PM dev253-5 154.80 126.85 1192.96 8.53 28.57 > 184.59 1.45 22.43 > 09:40:01 PM dev253-5 92.21 153.75 686.75 9.11 11.53 > 125.00 1.87 17.21 > 09:45:01 PM dev253-5 39.66 116.99 279.32 9.99 0.42 > 10.66 2.61 10.36 > 09:50:01 PM dev253-5 106.73 95.58 820.70 8.58 16.77 > 157.12 1.68 17.88 > 09:55:01 PM dev253-5 107.90 99.36 831.46 8.63 16.05 > 148.76 1.71 18.42 > 10:00:01 PM dev253-5 62.48 82.70 471.28 8.87 5.91 > 94.52 2.10 13.11 > 10:05:01 PM dev253-5 137.84 121.69 1064.03 8.60 24.48 > 177.31 1.56 21.52 > 10:10:01 PM dev253-5 107.93 104.16 827.83 8.64 16.69 > 155.04 1.68 18.11 > 10:15:01 PM dev253-5 40.55 126.12 277.57 9.96 0.41 > 10.13 2.57 10.42 > 10:20:02 PM dev253-5 104.33 136.77 793.49 8.92 16.97 > 162.69 1.76 18.35 > 10:25:01 PM dev253-5 108.04 115.36 825.26 8.71 16.68 > 154.36 1.76 19.05 > 10:30:01 PM dev253-5 69.72 105.66 523.05 9.02 7.45 > 106.92 1.90 13.25 > 10:35:01 PM dev253-5 101.58 91.59 781.85 8.60 15.00 > 147.68 1.67 16.97 > 10:40:01 PM dev253-5 107.50 97.91 827.17 8.61 17.68 > 164.49 1.77 19.06 > 10:45:01 PM dev253-5 69.98 140.13 519.57 9.43 7.09 > 101.25 1.96 13.72 > 10:50:01 PM dev253-5 104.30 83.31 806.12 8.53 16.18 > 155.10 1.65 17.16 > 10:55:02 PM dev253-5 106.86 209.65 790.27 9.36 15.59 > 145.08 1.74 18.60 > 11:00:01 PM dev253-5 50.42 92.08 371.52 9.19 3.05 > 62.16 2.28 11.52 > 11:05:01 PM dev253-5 101.06 88.31 776.57 8.56 15.12 > 149.58 1.67 16.90 > 11:10:01 PM dev253-5 103.08 77.73 798.23 8.50 17.14 > 166.25 1.74 17.90 > 11:15:01 PM dev253-5 57.74 96.45 428.62 9.09 5.23 > 90.52 2.13 12.32 > 11:20:01 PM dev253-5 97.73 185.18 727.38 9.34 14.64 > 149.84 1.94 18.92 > 11:25:01 PM dev253-5 95.03 85.52 730.31 8.58 14.42 > 151.79 1.79 16.97 > 11:30:01 PM dev253-5 53.76 73.65 404.47 8.89 3.94 > 73.25 2.17 11.64 > 11:35:01 PM dev253-5 110.37 125.05 842.17 8.76 16.96 > 153.63 1.66 18.30 > 11:40:01 PM dev253-5 103.93 87.00 801.59 8.55 16.01 > 154.00 1.73 18.00 > > As you can see there is no high io activity in this period of time but db is > frozen. My opinion that i have incorrect kernel setting and/or i have a > mistake in postgresql.conf. Because there is not high activity on db. load > avg is about 1. When there is high traffic is about 1.15. This is from > nagios monitoring system. > > But sometimes load is about 4 and this time matches with sar %vmeff = 100% > and database response time increase. Need to see: iowait, system load. Also consider installing perf and grabbing a profile while issue is happening. Probably this problem will go way with 2GB shared buffers, but before doing that we'd like to diagnose this if possible. merlin
On Wed, Nov 6, 2013 at 8:35 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > That's a mostly religious argument. I.e. you're going on feeling here > that pooling in jdbc alone is better than either jdbc/pgbouncer or > plain pgbouncer alone. My experience is that jdbc pooling is not in > the same category as pgbouncer for configuration and performance. > Either way, get that connection count down to something reasonable. > > Basically pgbouncer is veyr lightweight and can take thousands of > incoming connections and balance them into a few dozen connections to > the database. I've had a look at the pgbouncer docs, and it appears that there are 3 modes: session, transaction and statement. Session pooling appears to be the most conservative, but unless I am missing something, I don't see how it will reduce the number of actual database connections when used in between a JDBC connection pool? If using Transaction pooling, it appears that you need to disable prepared statements in JDBC - any the FAQ stays you need to apply a patch to the JDBC driver to get it (admittedly, that FAQ appears that it might be out of date given the JDBC version it references). -Dave
Scott thank you for advice.
> If you've got one job that needs lots of mem and lot of jobs that
> don't, look at my recommendation to lower work_mem for all the low mem
> requiring jobs. If you can split those heavy lifting jobs out to
> another user, then you can use a pooler like pgbouncer to do admission
> control by limiting that heavy lifter to a few connections at a time.
> The rest will wait in line behind it.
--
С уважением Селявка Евгений
> If you've got one job that needs lots of mem and lot of jobs that
> don't, look at my recommendation to lower work_mem for all the low mem
> requiring jobs. If you can split those heavy lifting jobs out to
> another user, then you can use a pooler like pgbouncer to do admission
> control by limiting that heavy lifter to a few connections at a time.
> The rest will wait in line behind it.
I will deacrease this parametr to 32MB because this DB cluster is for WEB application, thats why there is notning to do heavyweight querys.
> You are definitely running your server out of memory then. Can you
> throw say 256G into it? It's usually worth every penny to throw memory
> at the problem. Reducing usage will help a lot for now tho.
Unfortunately no, all that i can grow up my memory for 72GB. If i set another 32GB to server, what shared_buffer i should use 8GB, 2GB or 18GB 1/4 from 72GB?
In this period of time i can set vm.overcommit_ratio=500 or 700 but this is very dangerous i think. Because all process can allocate (RAM+SWAP)*vm.overcommit_ratio/100 as i understand?
> throw say 256G into it? It's usually worth every penny to throw memory
> at the problem. Reducing usage will help a lot for now tho.
Unfortunately no, all that i can grow up my memory for 72GB. If i set another 32GB to server, what shared_buffer i should use 8GB, 2GB or 18GB 1/4 from 72GB?
In this period of time i can set vm.overcommit_ratio=500 or 700 but this is very dangerous i think. Because all process can allocate (RAM+SWAP)*vm.overcommit_ratio/100 as i understand?
Once again thank you very much for link, i am read about it and graph. About max_connection i reply later, now i calculate it.
2013/11/6 Scott Marlowe <scott.marlowe@gmail.com>
Also also, the definitive page for postgres and dirty pages etc is here:
http://www.westnet.com/~gsmith/content/linux-pdflush.htm
Not sure if it's out of date with more modern kernels. Maybe Greg will chime in.
--
С уважением Селявка Евгений
All my sar statistics
sar -r
11:40:02 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
01:15:01 PM 269108 32608084 99.18 367144 29707240 10289444 27.83
01:20:01 PM 293560 32583632 99.11 367428 29674272 10287136 27.82
01:25:01 PM 417640 32459552 98.73 366148 29563220 10289220 27.83
01:30:01 PM 399448 32477744 98.79 366812 29596520 10298876 27.85
01:35:01 PM 432332 32444860 98.69 367412 29616524 10277484 27.80
sar -d -p sar -r
11:40:02 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
01:15:01 PM 269108 32608084 99.18 367144 29707240 10289444 27.83
01:20:01 PM 293560 32583632 99.11 367428 29674272 10287136 27.82
01:25:01 PM 417640 32459552 98.73 366148 29563220 10289220 27.83
01:30:01 PM 399448 32477744 98.79 366812 29596520 10298876 27.85
01:35:01 PM 432332 32444860 98.69 367412 29616524 10277484 27.80
11:40:02 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
01:15:01 PM vg_root-lv_pgsql 73.10 116.59 540.15 8.98 6.98 95.44 1.61 11.78
01:20:01 PM vg_root-lv_pgsql 71.39 170.21 508.21 9.50 5.44 76.23 1.72 12.31
01:25:01 PM vg_root-lv_pgsql 54.32 136.21 381.53 9.53 3.58 65.98 1.81 9.85
01:30:01 PM vg_root-lv_pgsql 81.35 167.98 585.25 9.26 8.15 100.13 1.63 13.25
01:35:01 PM vg_root-lv_pgsql 66.75 126.02 482.72 9.12 5.59 83.73 1.78 11.90
sar -u ALL
11:40:02 AM CPU %usr %nice %sys %iowait %steal %irq %soft %guest %idle
01:15:01 PM all 8.57 0.00 1.52 1.46 0.00 0.00 0.05 0.00 88.40
01:20:01 PM all 8.50 0.00 1.53 1.61 0.00 0.00 0.05 0.00 88.31
01:25:01 PM all 9.00 0.00 1.78 1.27 0.00 0.00 0.06 0.00 87.89
01:30:01 PM all 9.58 0.00 1.63 1.71 0.00 0.00 0.06 0.00 87.01
01:35:01 PM all 8.75 0.00 1.47 1.57 0.00 0.00 0.06 0.00 88.15
01:20:01 PM all 8.50 0.00 1.53 1.61 0.00 0.00 0.05 0.00 88.31
01:25:01 PM all 9.00 0.00 1.78 1.27 0.00 0.00 0.06 0.00 87.89
01:30:01 PM all 9.58 0.00 1.63 1.71 0.00 0.00 0.06 0.00 87.01
01:35:01 PM all 8.75 0.00 1.47 1.57 0.00 0.00 0.06 0.00 88.15
As you say i install perf and get statistics with command
perf record -g -f -u postgres -e block:block_rq_*,syscalls:sys_enter_write,syscalls:sys_enter_fsync
perf record -g -f -u postgres -e block:block_rq_*,syscalls:sys_enter_write,syscalls:sys_enter_fsync
But i really don't understand perf report, what values i need to see. Could you help me with advice how to read perf report. What events from perf list i shoud trace, and what the good and bad values in this report depend of my hardware?
2013/11/7 Merlin Moncure <mmoncure@gmail.com>
Need to see: iowait, system load.On Sat, Nov 2, 2013 at 1:54 PM, Евгений Селявка <evg.selyavka@gmail.com> wrote:
> Please help with advice!
>
> Server
> HP ProLiant BL460c G1
>
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 8
> On-line CPU(s) list: 0-7
> Thread(s) per core: 1
> Core(s) per socket: 4
> CPU socket(s): 2
> NUMA node(s): 1
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 23
> Stepping: 6
> CPU MHz: 3000.105
> BogoMIPS: 6000.04
> Virtualization: VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 6144K
> NUMA node0 CPU(s): 0-7
>
> 32GB RAM
> [root@db3 ~]# numactl --hardware
> available: 1 nodes (0)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 32765 MB
> node 0 free: 317 MB
> node distances:
> node 0
> 0: 10
>
>
> RAID1 2x146GB 10k rpm
>
> CentOS release 6.3 (Final)
> Linux 2.6.32-279.11.1.el6.x86_64 #1 SMP x86_64 GNU/Linux
>
>
> kernel.msgmnb = 65536
> kernel.msgmax = 65536
> kernel.shmmax = 68719476736
> kernel.shmall = 4294967296
> vm.swappiness = 30
> vm.dirty_background_bytes = 67108864
> vm.dirty_bytes = 536870912
>
>
> PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6
> 20120305 (Red Hat 4.4.6-4), 64-bit
>
> listen_addresses = '*'
> port = 5433
> max_connections = 350
> shared_buffers = 8GB
> temp_buffers = 64MB
> max_prepared_transactions = 350
> work_mem = 256MB
> maintenance_work_mem = 1GB
> max_stack_depth = 4MB
> max_files_per_process = 5000
> effective_io_concurrency = 2
> wal_level = hot_standby
> synchronous_commit = off
> checkpoint_segments = 64
> checkpoint_timeout = 15min
> checkpoint_completion_target = 0.75
> max_wal_senders = 4
> wal_sender_delay = 100ms
> wal_keep_segments = 128
> random_page_cost = 3.0
> effective_cache_size = 18GB
> autovacuum = on
> autovacuum_max_workers = 5
> autovacuum_vacuum_threshold = 900
> autovacuum_analyze_threshold = 350
> autovacuum_vacuum_scale_factor = 0.1
> autovacuum_analyze_scale_factor = 0.05
> log_min_duration_statement = 500
> deadlock_timeout = 1s
>
>
> DB size is about 20GB. There is no high write activity on DB. But
> periodically in postgresql log i see for example: "select 1" duration is
> about 500-1000 ms.
>
> In this period of time response time from db terribly. This period of time
> not bound with high traffic. It is not other app on the server. There is not
> specific cron job on server.
>
> Our app written on java and use jdbc to connect to DB and internal pooling.
> There is about 100 connection to DB. This is sar output:
>
> 12:00:01 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s
> pgscand/s pgsteal/s %vmeff
> 09:30:01 PM 73.17 302.72 134790.16 0.00 46809.73 0.00
> 0.00 0.00 0.00
> 09:35:01 PM 63.42 655.80 131740.74 0.00 46182.74 0.00
> 0.00 0.00 0.00
> 09:40:01 PM 76.87 400.62 122375.34 0.00 42096.27 0.00
> 0.00 0.00 0.00
> 09:45:01 PM 58.49 198.33 121922.86 0.00 42765.27 0.00
> 0.00 0.00 0.00
> 09:50:01 PM 52.21 485.45 136775.65 0.15 49098.65 0.00
> 0.00 0.00 0.00
> 09:55:01 PM 49.68 476.75 130159.24 0.00 45192.54 0.00
> 0.00 0.00 0.00
> 10:00:01 PM 41.35 295.34 118655.80 0.00 40786.52 0.00
> 0.00 0.00 0.00
> 10:05:01 PM 60.84 593.85 129890.83 0.00 44170.92 0.00
> 0.00 0.00 0.00
> 10:10:01 PM 52.08 471.36 132773.63 0.00 46019.13 0.00
> 2.41 2.41 100.00
> 10:15:01 PM 73.93 196.50 129384.21 0.33 45255.76 65.92
> 1.19 66.87 99.64
> 10:20:02 PM 70.35 473.16 121940.38 0.11 44061.52 81.95
> 37.79 119.42 99.73
> 10:25:01 PM 57.84 471.69 130583.33 0.01 46093.33 0.00
> 0.00 0.00 0.00
> 10:30:01 PM 52.91 321.62 119264.34 0.01 41748.19 0.00
> 0.00 0.00 0.00
> 10:35:01 PM 47.13 451.78 114625.62 0.02 40600.98 0.00
> 0.00 0.00 0.00
> 10:40:01 PM 48.96 472.41 102352.79 0.00 35402.17 0.00
> 0.00 0.00 0.00
> 10:45:01 PM 70.07 321.33 121423.02 0.00 43052.04 0.00
> 0.00 0.00 0.00
> 10:50:01 PM 46.78 479.95 128938.09 0.02 37864.07 116.64
> 48.97 165.07 99.67
> 10:55:02 PM 104.84 453.55 109189.98 0.00 37583.50 0.00
> 0.00 0.00 0.00
> 11:00:01 PM 46.23 248.75 107313.26 0.00 37278.10 0.00
> 0.00 0.00 0.00
> 11:05:01 PM 44.28 446.41 115598.61 0.01 40070.61 0.00
> 0.00 0.00 0.00
> 11:10:01 PM 38.86 457.32 100240.71 0.00 34407.29 0.00
> 0.00 0.00 0.00
> 11:15:01 PM 48.23 275.60 104780.84 0.00 36183.84 0.00
> 0.00 0.00 0.00
> 11:20:01 PM 92.74 432.49 114698.74 0.01 40413.14 0.00
> 0.00 0.00 0.00
> 11:25:01 PM 42.76 428.50 87769.28 0.00 29379.87 0.00
> 0.00 0.00 0.00
> 11:30:01 PM 36.83 260.34 85072.46 0.00 28234.50 0.00
> 0.00 0.00 0.00
> 11:35:01 PM 62.52 481.56 93150.67 0.00 31137.13 0.00
> 0.00 0.00 0.00
> 11:40:01 PM 43.50 459.10 90407.34 0.00 30241.70 0.00
> 0.00 0.00 0.00
>
> 12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit
> %commit
> 09:30:01 PM 531792 32345400 98.38 475504 29583340 10211064
> 27.62
> 09:35:01 PM 512096 32365096 98.44 475896 29608660 10200916
> 27.59
> 09:40:01 PM 455584 32421608 98.61 476276 29638952 10211652
> 27.62
> 09:45:01 PM 425744 32451448 98.71 476604 29662384 10206044
> 27.60
> 09:50:01 PM 380960 32496232 98.84 477004 29684296 10243704
> 27.71
> 09:55:01 PM 385644 32491548 98.83 477312 29706940 10204776
> 27.60
> 10:00:01 PM 348604 32528588 98.94 477672 29725476 10228984
> 27.67
> 10:05:01 PM 279216 32597976 99.15 478104 29751016 10281748
> 27.81
> 10:10:01 PM 255168 32622024 99.22 478220 29769924 10247404
> 27.72
> 10:15:01 PM 321188 32556004 99.02 475124 29721912 10234500
> 27.68
> 10:20:02 PM 441660 32435532 98.66 472336 29610476 10246288
> 27.71
> 10:25:01 PM 440636 32436556 98.66 472636 29634960 10219940
> 27.64
> 10:30:01 PM 469872 32407320 98.57 473016 29651476 10208520
> 27.61
> 10:35:01 PM 414540 32462652 98.74 473424 29672728 10223964
> 27.65
> 10:40:01 PM 354632 32522560 98.92 473772 29693016 10247752
> 27.72
> 10:45:01 PM 333708 32543484 98.98 474092 29720256 10227204
> 27.66
> 10:50:01 PM 528004 32349188 98.39 469396 29549832 10219536
> 27.64
> 10:55:02 PM 499068 32378124 98.48 469692 29587140 10204836
> 27.60
> 11:00:01 PM 462980 32414212 98.59 470032 29606764 10235820
> 27.68
> 11:05:01 PM 449540 32427652 98.63 470368 29626136 10209788
> 27.61
> 11:10:01 PM 419984 32457208 98.72 470772 29644248 10214480
> 27.63
> 11:15:01 PM 429900 32447292 98.69 471104 29664292 10202344
> 27.59
> 11:20:01 PM 394852 32482340 98.80 471528 29698052 10207604
> 27.61
> 11:25:01 PM 345328 32531864 98.95 471904 29717264 10215632
> 27.63
> 11:30:01 PM 368224 32508968 98.88 472236 29733544 10206468
> 27.61
> 11:35:01 PM 321800 32555392 99.02 472528 29758548 10211820
> 27.62
> 11:40:01 PM 282520 32594672 99.14 472860 29776952 10243516
> 27.71
>
> 12:00:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz
> await svctm %util
> 09:30:01 PM dev253-5 66.29 146.33 483.33 9.50 6.27
> 94.53 2.08 13.78
> 09:35:01 PM dev253-5 154.80 126.85 1192.96 8.53 28.57
> 184.59 1.45 22.43
> 09:40:01 PM dev253-5 92.21 153.75 686.75 9.11 11.53
> 125.00 1.87 17.21
> 09:45:01 PM dev253-5 39.66 116.99 279.32 9.99 0.42
> 10.66 2.61 10.36
> 09:50:01 PM dev253-5 106.73 95.58 820.70 8.58 16.77
> 157.12 1.68 17.88
> 09:55:01 PM dev253-5 107.90 99.36 831.46 8.63 16.05
> 148.76 1.71 18.42
> 10:00:01 PM dev253-5 62.48 82.70 471.28 8.87 5.91
> 94.52 2.10 13.11
> 10:05:01 PM dev253-5 137.84 121.69 1064.03 8.60 24.48
> 177.31 1.56 21.52
> 10:10:01 PM dev253-5 107.93 104.16 827.83 8.64 16.69
> 155.04 1.68 18.11
> 10:15:01 PM dev253-5 40.55 126.12 277.57 9.96 0.41
> 10.13 2.57 10.42
> 10:20:02 PM dev253-5 104.33 136.77 793.49 8.92 16.97
> 162.69 1.76 18.35
> 10:25:01 PM dev253-5 108.04 115.36 825.26 8.71 16.68
> 154.36 1.76 19.05
> 10:30:01 PM dev253-5 69.72 105.66 523.05 9.02 7.45
> 106.92 1.90 13.25
> 10:35:01 PM dev253-5 101.58 91.59 781.85 8.60 15.00
> 147.68 1.67 16.97
> 10:40:01 PM dev253-5 107.50 97.91 827.17 8.61 17.68
> 164.49 1.77 19.06
> 10:45:01 PM dev253-5 69.98 140.13 519.57 9.43 7.09
> 101.25 1.96 13.72
> 10:50:01 PM dev253-5 104.30 83.31 806.12 8.53 16.18
> 155.10 1.65 17.16
> 10:55:02 PM dev253-5 106.86 209.65 790.27 9.36 15.59
> 145.08 1.74 18.60
> 11:00:01 PM dev253-5 50.42 92.08 371.52 9.19 3.05
> 62.16 2.28 11.52
> 11:05:01 PM dev253-5 101.06 88.31 776.57 8.56 15.12
> 149.58 1.67 16.90
> 11:10:01 PM dev253-5 103.08 77.73 798.23 8.50 17.14
> 166.25 1.74 17.90
> 11:15:01 PM dev253-5 57.74 96.45 428.62 9.09 5.23
> 90.52 2.13 12.32
> 11:20:01 PM dev253-5 97.73 185.18 727.38 9.34 14.64
> 149.84 1.94 18.92
> 11:25:01 PM dev253-5 95.03 85.52 730.31 8.58 14.42
> 151.79 1.79 16.97
> 11:30:01 PM dev253-5 53.76 73.65 404.47 8.89 3.94
> 73.25 2.17 11.64
> 11:35:01 PM dev253-5 110.37 125.05 842.17 8.76 16.96
> 153.63 1.66 18.30
> 11:40:01 PM dev253-5 103.93 87.00 801.59 8.55 16.01
> 154.00 1.73 18.00
>
> As you can see there is no high io activity in this period of time but db is
> frozen. My opinion that i have incorrect kernel setting and/or i have a
> mistake in postgresql.conf. Because there is not high activity on db. load
> avg is about 1. When there is high traffic is about 1.15. This is from
> nagios monitoring system.
>
> But sometimes load is about 4 and this time matches with sar %vmeff = 100%
> and database response time increase.
Also consider installing perf and grabbing a profile while issue is happening.
Probably this problem will go way with 2GB shared buffers, but before
doing that we'd like to diagnose this if possible.
merlin
--
С уважением Селявка Евгений
On Sat, Nov 2, 2013 at 11:54 AM, Евгений Селявка <evg.selyavka@gmail.com> wrote: > DB size is about 20GB. There is no high write activity on DB. But > periodically in postgresql log i see for example: "select 1" duration is > about 500-1000 ms. > > In this period of time response time from db terribly. This period of time > not bound with high traffic. It is not other app on the server. There is not > specific cron job on server. Have you shown all the modified kernel settings? Don't you use huge pages accidentally? It might be a transparent huge pages defragmentation issue, the symptoms look similar. Another thing that might cause it is network. Try to monitor it at the time of these stalls. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray.ru@gmail.com
Scott hi, i calculate all of my jdbc pool size. Maximum is 300 connections from components wich use jdbc. I don't think that this is a good idea use pgbouncer, because our application using spring framework which using jdbc and prepared statement. I try to talk with our developer about disabling prepared statement in this framework, they don't want do this. Thats why i will try to upgrade HW and buy CPU with more core as you say based on formula 3-4xcore. But most of this connection is idle. This is a web based app not a datawarehouse, thats why all this connection is lightwear.
About my db freeze i set this kernel parameter:echo 1048576 > /proc/sys/vm/min_free_kbytes
echo 80 > /proc/sys/vm/vfs_cache_pressure
2013/11/6 Scott Marlowe <scott.marlowe@gmail.com>
Also also, the definitive page for postgres and dirty pages etc is here:
http://www.westnet.com/~gsmith/content/linux-pdflush.htm
Not sure if it's out of date with more modern kernels. Maybe Greg will chime in.
--
С уважением Селявка Евгений
On Mon, Nov 11, 2013 at 1:09 AM, Евгений Селявка <evg.selyavka@gmail.com> wrote: > Scott hi, i calculate all of my jdbc pool size. Maximum is 300 connections > from components wich use jdbc. I don't think that this is a good idea use > pgbouncer, because our application using spring framework which using jdbc > and prepared statement. I try to talk with our developer about disabling > prepared statement in this framework, they don't want do this. Thats why i > will try to upgrade HW and buy CPU with more core as you say based on > formula 3-4xcore. But most of this connection is idle. This is a web based > app not a datawarehouse, thats why all this connection is lightwear. > > About my db freeze i set this kernel parameter: > echo 1048576 > /proc/sys/vm/min_free_kbytes > echo 80 > /proc/sys/vm/vfs_cache_pressure > > And my freeze intervals is steel smaller. I try to dig deeper. well you can hopefully reduce connections from jdbc pooling then. The fact that the connections are idle is good. The problem you run into is what happens when things go into "overload" I.e. when the db server starts to slow down, more of those idle connections become not idle. If all 300 are then waiting on the db server, it will slow to a crawl and eventually fall over.
On Mon, Nov 11, 2013 at 09:14:43AM -0700, Scott Marlowe wrote: > On Mon, Nov 11, 2013 at 1:09 AM, Евгений Селявка <evg.selyavka@gmail.com> wrote: > > Scott hi, i calculate all of my jdbc pool size. Maximum is 300 connections > > from components wich use jdbc. I don't think that this is a good idea use > > pgbouncer, because our application using spring framework which using jdbc > > and prepared statement. I try to talk with our developer about disabling > > prepared statement in this framework, they don't want do this. Thats why i > > will try to upgrade HW and buy CPU with more core as you say based on > > formula 3-4xcore. But most of this connection is idle. This is a web based > > app not a datawarehouse, thats why all this connection is lightwear. > > > > About my db freeze i set this kernel parameter: > > echo 1048576 > /proc/sys/vm/min_free_kbytes > > echo 80 > /proc/sys/vm/vfs_cache_pressure > > > > And my freeze intervals is steel smaller. I try to dig deeper. > > well you can hopefully reduce connections from jdbc pooling then. The > fact that the connections are idle is good. > > The problem you run into is what happens when things go into > "overload" I.e. when the db server starts to slow down, more of those > idle connections become not idle. If all 300 are then waiting on the > db server, it will slow to a crawl and eventually fall over. > +1 I would definitely encourage the use of pgbouncer to map the 300 connections to a saner number that your DB can actually handle. We had a similar problem and very, very occasionally the server would "lockup". Once we put the resource management pooler in place, performance has been the same best-case and much, much better worse-case and NO lockups. Regards, Ken
On Sun, Nov 10, 2013 at 11:48 PM, Евгений Селявка <evg.selyavka@gmail.com> wrote: > Sergey, yes this is all of my kernel setting. I don't use THP intentionally. I think that i need a special library to useTHP with postgresql like this http://code.google.com/p/pgcookbook/wiki/Database_Server_Configuration. This is my valuesfor this kernel settings: Then it is definitely not THP. ps. BTW, pgcookbook has been moved to GitHub several weeks ago https://github.com/grayhemp/pgcookbook. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray.ru@gmail.com
On Mon, Nov 11, 2013 at 8:14 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > well you can hopefully reduce connections from jdbc pooling then. The > fact that the connections are idle is good. > > The problem you run into is what happens when things go into > "overload" I.e. when the db server starts to slow down, more of those > idle connections become not idle. If all 300 are then waiting on the > db server, it will slow to a crawl and eventually fall over. +1. Try to monitor your connections, for example like this while true; do echo -n "$(date): " psql -XAt -c "select count(1) from pg_stat_activity" sleep 1 done > activity.log and its correlation with slowdowns. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray.ru@gmail.com
On Thu, Nov 7, 2013 at 2:13 AM, Евгений Селявка <evg.selyavka@gmail.com> wrote:
All my sar statistics
...
sar -u ALL
11:40:02 AM CPU %usr %nice %sys %iowait %steal %irq %soft %guest %idle01:15:01 PM all 8.57 0.00 1.52 1.46 0.00 0.00 0.05 0.00 88.40
01:20:01 PM all 8.50 0.00 1.53 1.61 0.00 0.00 0.05 0.00 88.31
01:25:01 PM all 9.00 0.00 1.78 1.27 0.00 0.00 0.06 0.00 87.89
01:30:01 PM all 9.58 0.00 1.63 1.71 0.00 0.00 0.06 0.00 87.01
01:35:01 PM all 8.75 0.00 1.47 1.57 0.00 0.00 0.06 0.00 88.15
Did a freeze-up occur in there someplace? Otherwise, that is not not so useful.
You should try to decrease the sar interval to 1 min if you can. The extra overhead is negligible and the extra information can be very valuable. I'd also have something like "vmstat 5" running and capture that. Although perhaps one of the options to sar other than -u capture that same information, I know little of the other sar options.
Cheers,
Jeff
Sergey i will try to monitor my pgsql activity for several days.
Scott about pooling connection. Yesterday i start read about spring implementation of jdbc our app use dbcp implementation: http://commons.apache.org/proper/commons-dbcp/configuration.html<property name="maxActive" value="15" />
<property name="maxIdle" value="1" />
<property name="maxWait" value="10000" />
<property name="validationQuery" value="SELECT 1" />
<property name="removeAbandoned" value="true" />
<property name="removeAbandonedTimeout" value="60" />
I think that i should correctly configure my dbcp pool config xml file.
2013/11/12 Jeff Janes <jeff.janes@gmail.com>
On Thu, Nov 7, 2013 at 2:13 AM, Евгений Селявка <evg.selyavka@gmail.com> wrote:All my sar statistics...sar -u ALL
11:40:02 AM CPU %usr %nice %sys %iowait %steal %irq %soft %guest %idle01:15:01 PM all 8.57 0.00 1.52 1.46 0.00 0.00 0.05 0.00 88.40
01:20:01 PM all 8.50 0.00 1.53 1.61 0.00 0.00 0.05 0.00 88.31
01:25:01 PM all 9.00 0.00 1.78 1.27 0.00 0.00 0.06 0.00 87.89
01:30:01 PM all 9.58 0.00 1.63 1.71 0.00 0.00 0.06 0.00 87.01
01:35:01 PM all 8.75 0.00 1.47 1.57 0.00 0.00 0.06 0.00 88.15Did a freeze-up occur in there someplace? Otherwise, that is not not so useful.You should try to decrease the sar interval to 1 min if you can. The extra overhead is negligible and the extra information can be very valuable. I'd also have something like "vmstat 5" running and capture that. Although perhaps one of the options to sar other than -u capture that same information, I know little of the other sar options.Cheers,Jeff
--
С уважением Селявка Евгений