Observing pg_stat_bgwriter on replicas, I've found that the checkpoints_req counter is incremented much quicker than restartpoints happen according logs.
Example:
select * from pg_stat_bgwriter ; \watch 60
Fri 12 Apr 2019 01:16:26 PM UTC (every 60s)
checkpoints_timed | checkpoints_req
-------------------+-----------------
2200 | 255224
(1 row)
Fri 12 Apr 2019 01:17:26 PM UTC (every 60s)
checkpoints_timed | checkpoints_req
-------------------+-----------------
2200 | 255240
(1 row)
Fri 12 Apr 2019 01:18:26 PM UTC (every 60s)
checkpoints_timed | checkpoints_req
-------------------+-----------------
2200 | 255291
(1 row)
Fri 12 Apr 2019 01:19:26 PM UTC (every 60s)
checkpoints_timed | checkpoints_req
-------------------+-----------------
2200 | 255323
(1 row)
– the counter is increasing by ~20-50 per minute.
At the same time, from logs on the same server we can learn that restartpoints happen only 1-2 times per minute:
$ sudo journalctl --since '2019-04-12 13:15' | grep "restart point" | awk -F'[: ]' '{print $1" "$2" "$3":"$4}' | uniq -c
1 Apr 12 13:16
2 Apr 12 13:18
1 Apr 12 13:19
1 Apr 12 13:20
Checking the source code:
So reading this code I guess we might have the problem with checkpoints_req on the master as well, counting "fullfledged" checkpoints. If checkpoint attempt has failed, the counter is incremented already as well. So it looks like attempts of checkpoints are being counted. However, documentation defines checkpoints_req as "Number of requested checkpoints that have been performed" (
https://www.postgresql.org/docs/9.6/monitoring-stats.html)
The master code looks similar, so this problem should not be only with 9.6. but just in case:
# select version();
version
-------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.11 on x86_64-pc-linux-gnu (Ubuntu 9.6.11-1.pgdg16.04+1), compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit
(1 row)
Thanks,
Nik