Thread: Tuning Checkpoints
My PG 9.4.5 server runs on Amazon RDS some times of the day we have a lot of checkpoints really close (less than 1 minute apart, see logs below) and we are trying to tune the DB to minimize the impact of the checkpoint or reduce the number of checkpoints.
Server Stats
· Instance Type db.r3.4xl
• 16 vCPUs 122GB of RAM
• PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit
Some PG Stats
• Shared Buffers = 31427608kB
• Checkpoint Segments = 64
• Checkpoint completion target = .9
• Rest of the configuration is below
Things we are doing
• We have a huge table where each row is over 1kB and its very busy. We are splitting that into multiple tables especially the one json field that making it large.
Questions
• Each checkpoint log writes out the following checkpoint complete: wrote 166481 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=32.441 s, sync=0.050 s, total=32.550 s; sync files=274, longest=0.049 s, average=0.000 s
• What does buffers mean? How do I find out how much RAM that is equivalent to?
• Based on my RDS stats I don't think IOPs will help, because I don't see any flat lines on my write operations / second graph. Is this a good assumption?
• What else can we tune to spread out checkpoints?
Here is the Postgres configuration
application_name | psql
archive_mode | on
archive_timeout | 5min
autovacuum_analyze_scale_factor | 0.05
autovacuum_naptime | 30s
autovacuum_vacuum_scale_factor | 0.1
checkpoint_completion_target | 0.9
checkpoint_segments | 64
client_encoding | UTF8
effective_cache_size | 62855216kB
fsync | on
full_page_writes | on
hot_standby | off
listen_addresses | *
lo_compat_privileges | off
log_checkpoints | on
log_error_verbosity | default
log_file_mode | 0644
log_hostname | on
log_line_prefix | %t:%r:%u@%d:[%p]:
log_rotation_age | 1h
log_timezone | UTC
log_truncate_on_rotation | off
logging_collector | on
maintenance_work_mem | 250MB
max_connections | 4092
max_locks_per_transaction | 64
max_prepared_transactions | 0
max_replication_slots | 5
max_stack_depth | 6MB
max_wal_senders | 10
port | 5432
rds.extensions | btree_gin,btree_gist,chkpass,citext,cube,dblink,dict_int,dict_xsyn,earthdistance,fuzzystrmatch,hstore,intagg,intarray,ip4r,isn,ltree,pgcrypto,pgrowlocks,pgstattuple,pg_buffercache,pg_prewarm,pg_stat_statements,pg_trgm,plcoffee,plls,plperl,plpgsql,pltcl,plv8,postgis,postgis_tiger_geocoder,postgis_topology,postgres_fdw,sslinfo,tablefunc,test_parser,tsearch2,unaccent,uuid-ossp
rds.internal_databases | rdsadmin,template0
rds.rds_superuser_reserved_connections | 0
rds.superuser_variables | session_replication_role
shared_buffers | 31427608kB
ssl | on
ssl_renegotiation_limit | 0
superuser_reserved_connections | 3
synchronous_commit | off
TimeZone | UTC
unix_socket_group | rdsdb
unix_socket_permissions | 0700
wal_keep_segments | 32
wal_level | hot_standby
wal_receiver_timeout | 30s
wal_sender_timeout | 30s
work_mem | 8MB
Here are some checkpoint logs from a busy period
2016-10-26 18:54:32 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:55:10 UTC::@:[5594]:LOG: checkpoint complete: wrote 168357 buffers (4.3%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=37.650 s, sync=0.036 s, total=37.879 s; sync files=266, longest=0.027 s, average=0.000 s
2016-10-26 18:55:15 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:55:53 UTC::@:[5594]:LOG: checkpoint complete: wrote 171087 buffers (4.4%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=38.147 s, sync=0.116 s, total=38.298 s; sync files=276, longest=0.098 s, average=0.000 s
2016-10-26 18:55:58 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:56:34 UTC::@:[5594]:LOG: checkpoint complete: wrote 171131 buffers (4.4%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=35.080 s, sync=0.142 s, total=35.323 s; sync files=274, longest=0.103 s, average=0.000 s
2016-10-26 18:56:39 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:57:14 UTC::@:[5594]:LOG: checkpoint complete: wrote 158818 buffers (4.0%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=35.545 s, sync=0.072 s, total=35.771 s; sync files=277, longest=0.036 s, average=0.000 s
2016-10-26 18:57:19 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:57:56 UTC::@:[5594]:LOG: checkpoint complete: wrote 179405 buffers (4.6%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=36.289 s, sync=0.069 s, total=36.405 s; sync files=295, longest=0.030 s, average=0.000 s
2016-10-26 18:58:01 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:58:37 UTC::@:[5594]:LOG: checkpoint complete: wrote 169711 buffers (4.3%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=35.248 s, sync=0.199 s, total=35.511 s; sync files=295, longest=0.146 s, average=0.000 s
2016-10-26 18:58:42 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:59:17 UTC::@:[5594]:LOG: checkpoint complete: wrote 174124 buffers (4.4%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=34.666 s, sync=0.059 s, total=34.790 s; sync files=279, longest=0.051 s, average=0.000 s
2016-10-26 18:59:23 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:59:55 UTC::@:[5594]:LOG: checkpoint complete: wrote 166481 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=32.441 s, sync=0.050 s, total=32.550 s; sync files=274, longest=0.049 s, average=0.000 s
On 10/31/2016 08:19 PM, Andre Henry wrote: > My PG 9.4.5 server runs on Amazon RDS some times of the day we have a > lot of checkpoints really close (less than 1 minute apart, see logs > below) and we are trying to tune the DB to minimize the impact of the > checkpoint or reduce the number of checkpoints. > > Server Stats > > · Instance Type db.r3.4xl > > • 16 vCPUs 122GB of RAM > > • PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc > (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit > > > > Some PG Stats > > • Shared Buffers = 31427608kB > > • Checkpoint Segments = 64 > > • Checkpoint completion target = .9 > > • Rest of the configuration is below > > > > Things we are doing > > • We have a huge table where each row is over 1kB and its very > busy. We are splitting that into multiple tables especially the one json > field that making it large. > > > > Questions > > • Each checkpoint log writes out the following checkpoint > complete: wrote 166481 buffers (4.2%); 0 transaction log file(s) added, > 0 removed, 64 recycled; write=32.441 s, sync=0.050 s, total=32.550 s; > sync files=274, longest=0.049 s, average=0.000 s > OK, each checkpoint has to write all dirty data from checkpoints. You have ~170k buffers worth of dirty data, i.e. ~1.3GB. > • What does buffers mean? How do I find out how much RAM that is > equivalent to? > Buffer holds 8kB of data, which is the "chunk" of data files. > • Based on my RDS stats I don't think IOPs will help, because I > don't see any flat lines on my write operations / second graph. Is this > a good assumption? > Not sure what you mean by this. Also, maybe you should talk to AWS if you're on RDS. > • What else can we tune to spread out checkpoints? > Based on the logs, your checkpoints are triggered by filling WAL. I see your checkpoints happen every 30 - 40 seconds, and you only have 64 segments. So to get checkpoints checkpoints triggered by timeout (which I assume is 5 minutes, because you have not mentioned checkpoint_timeout), you need to increase checkpoint_segments enough to hold 5 minutes worth of WAL. That means 300/30 * 64, i.e. roughly 640 segments (it's likely an overestimate, due to full page writes, but well). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services