Thread: Re: pg_stat_bgwriter
On Sun, Oct 13, 2019 at 06:27:35PM -0700, dangal wrote: >Dear I would like to share with you to see what you think about the >statistics of pg_stat_bgwriter > >postgres = # select * from pg_stat_bgwriter; > checkpoints_timed | checkpoints_req | checkpoint_write_time | >checkpoint_sync_time | buffers_checkpoint | buffers_clean | maxwritten_clean >| buffers_backend | buffers_ >backend_fsync | buffers_alloc | stats_reset >------------------- + ----------------- + ------------ ----------- + >---------------------- + --------------- ----- + --------------- + >------------------ + --------- -------- + --------- >-------------- + --------------- + ------------------- ------------ > 338 | 6 | 247061792 | 89418 | 2939561 | 19872289 | 54876 | >6015787 | > 0 | 710682240 | 2019-10-06 19: 25: 30.688186-03 >(1 row) > >postgres = # show bgwriter_delay; > bgwriter_delay >---------------- > 200ms >(1 row) > >postgres = # show bgwriter_lru_maxpages; > bgwriter_lru_maxpages >----------------------- > 100 >(1 row) > >postgres = # show bgwriter_lru_multiplier; > bgwriter_lru_multiplier >------------------------- > 2 >(1 row) > > >Do you think it should increase bgwriter_lru_maxpages due to the value of >maxwritten_clean? >Do you think it should increase bgwriter_lru_maxpages, >bgwriter_lru_multiplier, and decrease bgwriter_delay due to the value of >buffers_backend compared to buffers_alloc? >Do you think a modification is necessary? >What values would you recommend? buffers_alloc does not really matter, here, IMO. You need to compare buffers_checkpoint, buffers_backend and buffers_clean, and ideally you'd have (checkpoints > clean > backend). In your case it's already buffers_checkpoint | buffers_clean | buffers_backend 2939561 | 19872289 | 6015787 You could make bgwriter even more aggressive, but that's unlikely to be a huge improvement. You should investigate why buffers_checkpoint is so low. This is usually a sign of shared_buffers being too small for the active set, so perhaps you need to increase shared_buffers, or see which queries are causing this and optimize them. Note: FWIW, a single snapshot of pg_stats* may be misleading, because it's cumulative, so it's not clear how accurately it reflects current state. Next time take two snapshots and subtract them. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks a lot, always helping I attached a snapshot that I take every 12 hours of the pg_stat_bgwriter select now,buffers_checkpoint,buffers_clean, buffers_backend from pg_stat_bgwriter_snapshot; now | buffers_checkpoint | buffers_clean | buffers_backend -------------------------------+--------------------+---------------+----------------- 2019-10-07 12:00:01.312067-03 | 288343 | 1182944 | 520101 2019-10-08 00:00:02.034129-03 | 475323 | 3890772 | 975454 2019-10-08 12:00:01.500756-03 | 616154 | 4774924 | 1205261 2019-10-09 00:00:01.520329-03 | 784840 | 7377771 | 1601278 2019-10-09 12:00:01.388113-03 | 1149560 | 8395288 | 2456249 2019-10-10 00:00:01.841054-03 | 1335747 | 11023014 | 2824740 2019-10-10 12:00:01.354555-03 | 1486963 | 11919462 | 2995211 2019-10-11 00:00:01.519538-03 | 1649066 | 14400593 | 3360700 2019-10-11 12:00:01.468203-03 | 1979781 | 15332086 | 4167663 2019-10-12 00:00:01.343714-03 | 2161116 | 17791871 | 4525957 2019-10-12 12:00:01.991429-03 | 2323194 | 18324723 | 5139418 2019-10-13 00:00:01.251191-03 | 2453939 | 19059149 | 5306894 2019-10-13 12:00:01.677379-03 | 2782606 | 19391676 | 5878981 2019-10-14 00:00:01.824249-03 | 2966021 | 19915346 | 6040316 2019-10-14 12:00:01.869126-03 | 3117659 | 20675018 | 6184214 I tell you that we have a server with 24 gb of ram and 6gb of shared_buffers When you tell me that maybe I am running too low of shared_buffers, the query I run to see what is happening is the following: The first 10 are insert, update and an autovaccum select calls, shared_blks_hit, shared_blks_read, shared_blks_dirtied from pg_stat_statements where shared_blks_dirtied> 0 order by shared_blks_dirtied desc limit 10 calls | shared_blks_hit | shared_blks_read | shared_blks_dirtied -----------+-----------------+------------------+--------------------- 41526844 | 1524091324 | 74477743 | 40568348 22707516 | 1317743612 | 33153916 | 28106071 517309 | 539285911 | 24583841 | 24408950 23 | 23135504 | 187638126 | 15301103 11287105 | 383864219 | 18369813 | 13879956 2247661 | 275357344 | 9252598 | 6084363 13070036 | 244904154 | 5557321 | 5871613 54158879 | 324425993 | 5054200 | 4676472 24955177 | 125421833 | 5775788 | 4517367 142807488 | 14401507751 | 81965894 | 2661358 (10 filas) Another query SELECT pg_size_pretty(count(*) * 8192) as buffered, round(100.0 * count(*) / (SELECT setting FROM pg_settings WHERE name = 'shared_buffers') ::integer, 1) AS buffers_percent, round(100.0 * count(*) * 8192 / pg_table_size(c.oid), 1) AS percent_of_relation FROM pg_class c INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database()) GROUP BY c.oid, c.relname ORDER BY 3 DESC LIMIT 10; buffered buffers_percent percent_of_relation 3938 MB; 64.1; 53.2 479 MB; 7.8; 21.3 261 MB; 4.3; 99.3 163 MB; 2.6; 0.1 153 MB; 2.5; 6.7 87 MB; 1.4; 1.2 82 MB; 1.3; 81.6 65 MB; 1.1; 100.0 64 MB; 1.0; 0.1 53 MB; 0.9; 73.5 -- Sent from: https://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
On Mon, Oct 14, 2019 at 08:18:47PM +0200, Tomas Vondra wrote: > Note: FWIW, a single snapshot of pg_stats* may be misleading, because > it's cumulative, so it's not clear how accurately it reflects current > state. Next time take two snapshots and subtract them. For bonus points, capture it with timestamp and make RRD graphs. I took me awhile to get around to following this advice, but now I have 12+ months of history at 5 minute granularity across all our customers, and I've used my own implementation to track down inefficient queries being run periodically from cron, and notice other radical changes in writes/reads I recall seeing that the pgCluu project does this. http://pgcluu.darold.net/ Justin
On Mon, Oct 14, 2019 at 01:12:43PM -0700, dangal wrote: >Thanks a lot, always helping >I attached a snapshot that I take every 12 hours of the pg_stat_bgwriter > >select now,buffers_checkpoint,buffers_clean, buffers_backend from >pg_stat_bgwriter_snapshot; Please show us the deltas, i.e. subtract the preceding value (using a window function, or something). FWIW 12 hours may be a bit too coarse, but it's better than nothing. > now | buffers_checkpoint | buffers_clean | >buffers_backend >-------------------------------+--------------------+---------------+----------------- > 2019-10-07 12:00:01.312067-03 | 288343 | 1182944 | >520101 > 2019-10-08 00:00:02.034129-03 | 475323 | 3890772 | >975454 > 2019-10-08 12:00:01.500756-03 | 616154 | 4774924 | >1205261 > 2019-10-09 00:00:01.520329-03 | 784840 | 7377771 | >1601278 > 2019-10-09 12:00:01.388113-03 | 1149560 | 8395288 | >2456249 > 2019-10-10 00:00:01.841054-03 | 1335747 | 11023014 | >2824740 > 2019-10-10 12:00:01.354555-03 | 1486963 | 11919462 | >2995211 > 2019-10-11 00:00:01.519538-03 | 1649066 | 14400593 | >3360700 > 2019-10-11 12:00:01.468203-03 | 1979781 | 15332086 | >4167663 > 2019-10-12 00:00:01.343714-03 | 2161116 | 17791871 | >4525957 > 2019-10-12 12:00:01.991429-03 | 2323194 | 18324723 | >5139418 > 2019-10-13 00:00:01.251191-03 | 2453939 | 19059149 | >5306894 > 2019-10-13 12:00:01.677379-03 | 2782606 | 19391676 | >5878981 > 2019-10-14 00:00:01.824249-03 | 2966021 | 19915346 | >6040316 > 2019-10-14 12:00:01.869126-03 | 3117659 | 20675018 | >6184214 > >I tell you that we have a server with 24 gb of ram and 6gb of shared_buffers >When you tell me that maybe I am running too low of shared_buffers, the >query I run to see what is happening is the following: The question is how that compared to database size, and size of the active set (fraction of the database accessed by the application / queries). I suggest you also track & compute shared_buffers cache hit ratio. >The first 10 are insert, update and an autovaccum > >select calls, shared_blks_hit, shared_blks_read, shared_blks_dirtied > from pg_stat_statements > where shared_blks_dirtied> 0 order by shared_blks_dirtied desc > limit 10 > > > calls | shared_blks_hit | shared_blks_read | shared_blks_dirtied >-----------+-----------------+------------------+--------------------- > 41526844 | 1524091324 | 74477743 | 40568348 > 22707516 | 1317743612 | 33153916 | 28106071 > 517309 | 539285911 | 24583841 | 24408950 > 23 | 23135504 | 187638126 | 15301103 > 11287105 | 383864219 | 18369813 | 13879956 > 2247661 | 275357344 | 9252598 | 6084363 > 13070036 | 244904154 | 5557321 | 5871613 > 54158879 | 324425993 | 5054200 | 4676472 > 24955177 | 125421833 | 5775788 | 4517367 > 142807488 | 14401507751 | 81965894 | 2661358 >(10 filas) > Unfortunately, this has the same issue as the data you shared in the first message - it's a snapshot with data accumulated since the database was created. It's unclear whether the workload changed over time etc. But I guess you can use this to identify queries producing the most dirty buffers and maybe see if you can optimize that somehow (e.g. by removing unnecessary indexes or something). >Another query > >SELECT pg_size_pretty(count(*) * 8192) as buffered, > round(100.0 * count(*) / > (SELECT setting FROM pg_settings WHERE name = 'shared_buffers') > ::integer, > 1) AS buffers_percent, > round(100.0 * count(*) * 8192 / pg_table_size(c.oid), 1) AS >percent_of_relation > FROM pg_class c > INNER JOIN pg_buffercache b > ON b.relfilenode = c.relfilenode > INNER JOIN pg_database d > ON (b.reldatabase = d.oid AND d.datname = current_database()) > GROUP BY c.oid, c.relname > ORDER BY 3 DESC LIMIT 10; > >buffered buffers_percent percent_of_relation >3938 MB; 64.1; 53.2 >479 MB; 7.8; 21.3 >261 MB; 4.3; 99.3 >163 MB; 2.6; 0.1 >153 MB; 2.5; 6.7 >87 MB; 1.4; 1.2 >82 MB; 1.3; 81.6 >65 MB; 1.1; 100.0 >64 MB; 1.0; 0.1 >53 MB; 0.9; 73.5 > It's generally a good idea to explain what a query is supposed to do, instead of just leaving the users to figure that out. In any case, this is a snapshot at a particular moment in time, it's unclear how how that correlates to the activity. The fact that you've removed names of tables and even queries is does not really help either. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Tomas, restart the statistics and take 24-hour samples to see if you can help me 24 gb server memory 6 gb sharred buffers # select now, # pg_size_pretty(buffers_checkpoint*8192)AS buffers_checkpoint, # pg_size_pretty(buffers_clean*8192)AS buffers_clean, # pg_size_pretty(buffers_backend*8192)AS buffers_backend, # (buffers_checkpoint*100)/(buffers_checkpoint+buffers_clean+buffers_backend)AS buffers_checkpoint_pct, # (buffers_clean*100)/(buffers_checkpoint+buffers_clean+buffers_backend)AS buffers_clean_pct, # (buffers_backend*100)/(buffers_checkpoint+buffers_clean+buffers_backend)AS buffers_backend_pct, # pg_size_pretty(buffers_checkpoint * 8192 /(checkpoints_timed + checkpoints_req)) AS avg_checkpoint_write, # pg_size_pretty(8192 *(buffers_checkpoint + buffers_clean + buffers_backend)) AS total_write # from pg_stat_bgwriter_snapshot # ; now | buffers_checkpoint | buffers_clean | buffers_backend | buffers_checkpoint_pct | buffers_clean_pct | buffers_backend_pct | avg_checkpoint_write | total_write -------------------------------+--------------------+---------------+-----------------+------------------------+-------------------+---------------------+----------------------+------------- 2019-10-15 15:00:02.070105-03 | 33 MB | 1190 MB | 144 MB | 2 | 87 | 10 | 33 MB | 1367 MB 2019-10-15 16:00:01.477785-03 | 109 MB | 3543 MB | 393 MB | 2 | 87 | 9 | 36 MB | 4045 MB 2019-10-15 17:00:01.960162-03 | 179 MB | 6031 MB | 703 MB | 2 | 87 | 10 | 36 MB | 6913 MB 2019-10-15 18:00:01.558404-03 | 252 MB | 8363 MB | 1000 MB | 2 | 86 | 10 | 36 MB | 9615 MB 2019-10-15 19:00:01.170866-03 | 327 MB | 10019 MB | 1232 MB | 2 | 86 | 10 | 36 MB | 11 GB 2019-10-15 20:00:01.397473-03 | 417 MB | 11 GB | 1407 MB | 3 | 85 | 10 | 38 MB | 13 GB 2019-10-15 21:00:01.211047-03 | 522 MB | 12 GB | 1528 MB | 3 | 85 | 11 | 40 MB | 14 GB 2019-10-15 22:00:01.164853-03 | 658 MB | 12 GB | 1691 MB | 4 | 83 | 11 | 44 MB | 14 GB 2019-10-15 23:00:01.116564-03 | 782 MB | 13 GB | 1797 MB | 5 | 83 | 11 | 46 MB | 15 GB 2019-10-16 00:00:01.19203-03 | 887 MB | 13 GB | 2016 MB | 5 | 82 | 12 | 47 MB | 16 GB 2019-10-16 01:00:01.329851-03 | 1003 MB | 14 GB | 2104 MB | 5 | 81 | 12 | 48 MB | 17 GB 2019-10-16 02:00:01.518606-03 | 1114 MB | 14 GB | 2222 MB | 6 | 81 | 12 | 48 MB | 17 GB 2019-10-16 03:00:01.673498-03 | 1227 MB | 14 GB | 2314 MB | 6 | 80 | 12 | 49 MB | 18 GB 2019-10-16 04:00:01.936604-03 | 1354 MB | 15 GB | 2468 MB | 7 | 79 | 12 | 50 MB | 19 GB 2019-10-16 05:00:01.854888-03 | 1465 MB | 15 GB | 2518 MB | 7 | 79 | 13 | 51 MB | 19 GB 2019-10-16 06:00:01.804182-03 | 1585 MB | 15 GB | 2581 MB | 8 | 78 | 13 | 51 MB | 19 GB 2019-10-16 07:00:01.889345-03 | 1677 MB | 15 GB | 2649 MB | 8 | 78 | 13 | 51 MB | 20 GB 2019-10-16 08:00:01.248247-03 | 1756 MB | 16 GB | 2707 MB | 8 | 78 | 13 | 50 MB | 20 GB 2019-10-16 09:00:01.258408-03 | 1826 MB | 16 GB | 2763 MB | 8 | 78 | 13 | 49 MB | 21 GB 2019-10-16 10:00:01.418323-03 | 1881 MB | 17 GB | 2872 MB | 8 | 78 | 13 | 48 MB | 21 GB 2019-10-16 11:00:02.077084-03 | 1951 MB | 18 GB | 3140 MB | 8 | 78 | 13 | 48 MB | 23 GB 2019-10-16 12:00:01.83188-03 | 2026 MB | 20 GB | 3322 MB | 7 | 79 | 12 | 47 MB | 25 GB 2019-10-16 13:00:01.628877-03 | 2109 MB | 22 GB | 3638 MB | 7 | 79 | 12 | 47 MB | 28 GB 2019-10-16 14:00:02.351529-03 | 2179 MB | 24 GB | 3934 MB | 6 | 80 | 12 | 46 MB | 30 GB (24 filas) # SELECT # sum(heap_blks_read) as heap_read, # sum(heap_blks_hit) as heap_hit, # sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio # FROM # pg_statio_user_tables; heap_read | heap_hit | ratio -------------+---------------+------------------------ 80203672248 | 4689023850651 | 0.98318308953328194824 (1 fila) # SELECT # sum(idx_blks_read) as idx_read, # sum(idx_blks_hit) as idx_hit, # (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit) as ratio # FROM # pg_statio_user_indexes; idx_read | idx_hit | ratio ------------+--------------+------------------------ 3307622770 | 653969845259 | 0.99494223962468783241 (1 fila) =# -- perform a "select pg_stat_reset();" when you want to reset counter statistics =# with -# all_tables as -# ( (# SELECT * (# FROM ( (# SELECT 'all'::text as table_name, (# sum( (coalesce(heap_blks_read,0) + coalesce(idx_blks_read,0) + coalesce(toast_blks_read,0) + coalesce(tidx_blks_read,0)) ) as from_disk, (# sum( (coalesce(heap_blks_hit,0) + coalesce(idx_blks_hit,0) + coalesce(toast_blks_hit,0) + coalesce(tidx_blks_hit,0)) ) as from_cache (# FROM pg_statio_all_tables --> change to pg_statio_USER_tables if you want to check only user tables (excluding postgres's own tables) (# ) a (# WHERE (from_disk + from_cache) > 0 -- discard tables without hits (# ), -# tables as -# ( (# SELECT * (# FROM ( (# SELECT relname as table_name, (# ( (coalesce(heap_blks_read,0) + coalesce(idx_blks_read,0) + coalesce(toast_blks_read,0) + coalesce(tidx_blks_read,0)) ) as from_disk, (# ( (coalesce(heap_blks_hit,0) + coalesce(idx_blks_hit,0) + coalesce(toast_blks_hit,0) + coalesce(tidx_blks_hit,0)) ) as from_cache (# FROM pg_statio_all_tables --> change to pg_statio_USER_tables if you want to check only user tables (excluding postgres's own tables) (# ) a (# WHERE (from_disk + from_cache) > 0 -- discard tables without hits (# ) -# SELECT table_name as "table name", -# from_disk as "disk hits", -# round((from_disk::numeric / (from_disk + from_cache)::numeric)*100.0,2) as "% disk hits", -# round((from_cache::numeric / (from_disk + from_cache)::numeric)*100.0,2) as "% cache hits", -# (from_disk + from_cache) as "total hits" -# FROM (SELECT * FROM all_tables UNION ALL SELECT * FROM tables) a -# ORDER BY (case when table_name = 'all' then 0 else 1 end), from_disk desc -# ; table name | disk hits | % disk hits | % cache hits | total hits ---------------------------------------------+-------------+-------------+--------------+--------------- all | 88000266877 | 1.60 | 98.40 | 5489558628019 b_e_i | 38269990257 | 2.88 | 97.12 | 1329542407426 n_c_r_o | 32839222402 | 1.44 | 98.56 | 2278801314997 b_e_i_a | 6372214550 | 4.76 | 95.24 | 133916822424 d_d | 2101245550 | 6.58 | 93.42 | 31936220932 pg_toast_550140 | 2055940284 | 32.63 | 67.37 | 6300424824 p_i | 1421254520 | 0.36 | 99.64 | 393348432350 n_c_e_s | 1164509701 | 27.85 | 72.15 | 4180714300 s_b_c_a | 1116814156 | 0.19 | 99.81 | 595617511928 b_e_i_l | 624945696 | 41.13 | 58.87 | 1519594743 p_e_i | 525580057 | 5.27 | 94.73 | 9968414493 =# select -# s.relname, -# pg_size_pretty(pg_relation_size(relid)), -# coalesce(n_tup_ins,0) + 2 * coalesce(n_tup_upd,0) - -# coalesce(n_tup_hot_upd,0) + coalesce(n_tup_del,0) AS total_writes, -# (coalesce(n_tup_hot_upd,0)::float * 100 / (case when n_tup_upd > 0 (# then n_tup_upd else 1 end)::float)::numeric(10,2) AS hot_rate, -# (select v[1] FROM regexp_matches(reloptions::text,E'fillfactor=(\\d+)') as (# r(v) limit 1) AS fillfactor -# from pg_stat_all_tables s -# join pg_class c ON c.oid=relid -# order by total_writes desc limit 50; relname | pg_size_pretty | total_writes | hot_rate | fillfactor ----------------------------------+----------------+--------------+----------+------------ pg_toast_550140 | 1637 GB | 820414234 | 0.00 | b_e_i_a | 168 GB | 454229502 | 0.00 | s_b_c_a | 26 MB | 419253909 | 96.94 | b_e_i_a_l | 71 GB | 305584644 | 0.00 | s_b_c_a_l | 965 MB | 203361185 | 0.00 | b_e_i | 7452 MB | 194861425 | 62.88 | b_e_i_l | 57 GB | 144929408 | 0.00 | o_i_n | 3344 kB | 98435081 | 99.38 | r_h | 1140 MB | 33209351 | 0.11 | b_e | 5808 kB | 29608085 | 99.65 | =# select calls,shared_blks_hit,shared_blks_read,shared_blks_dirtied,query--, shared_blks_dirtied -# from pg_stat_statements -# where shared_blks_dirtied > 0 order by shared_blks_dirtied desc -# limit 10; calls | shared_blks_hit | shared_blks_read | shared_blks_dirtied | query 43374691 | 1592513886 | 77060029 | 42096885 | INSERT INTO b_e_i_a 23762881 | 1367338973 | 34351016 | 29131240 | UPDATE b_e_i 541120 | 564550710 | 25726748 | 25551138 | INSERT INTO d_d 23 | 23135504 | 187638126 | 15301103 | VACUUM ANALYZE VERBOSE b_e_i; 11804481 | 401558460 | 19124307 | 14492182 | UPDATE b_e_i_a 2352159 | 287732134 | 9462460 | 6250734 | INSERT INTO b_e_i 13701688 | 256215340 | 5803881 | 6142119 | INSERT into I_C_M 56582737 | 338943996 | 5272879 | 4882863 | INSERT INTO b_e_i_a_l 26115040 | 131274217 | 6016404 | 4712060 | INSERT INTO b_e_i_l =# SELECT oid::REGCLASS::TEXT AS table_name, -# pg_size_pretty( (# pg_total_relation_size(oid) (# ) AS total_size -# FROM pg_class -# WHERE relkind = 'r' -# AND relpages > 0 -# ORDER BY pg_total_relation_size(oid) DESC -# LIMIT 20;; table_name | total_size ----------------------------------+------------ d_d | 1656 GB b_e_i_a | 547 GB b_e_i_a_l | 107 GB b_e_i_l | 71 GB b_e_i | 66 GB n_c_e_s | 28 GB p_e_i | 7807 MB n_c_s | 7344 MB e_i_n | 5971 MB p_e_d_i | 3695 MB -- Sent from: https://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
Excuse me, can you tell me how can I achieve this? "The question is how that compared to database size, and size of the active set (fraction of the database accessed by the application / queries)." -- Sent from: https://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
thank you very much justin, i am seeing install the product you recommended me! -- Sent from: https://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html