Publish checkpoint timing and sync files summary data to pg_stat_bgwriter - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Publish checkpoint timing and sync files summary data to pg_stat_bgwriter |
Date | |
Msg-id | 4F13C38D.2080905@2ndQuadrant.com Whole thread Raw |
Responses |
Re: Publish checkpoint timing and sync files summary data
to pg_stat_bgwriter
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter |
List | pgsql-hackers |
One of the most useful bits of feedback on how well checkpoint I/O is going is the amount of time taken to sync files to disk. Right now the only way to get that is to parse the logs. The attached patch publishes the most useful three bits of data you could only get from log_checkpoints before out to pg_stat_bgwriter. Easiest to just show an example: $ createdb pgbench $ psql -c "select pg_stat_reset_shared('bgwriter')" $ pgbench -i -s 10 pgbench $ psql -c "checkpoint" $ tail $PGLOG LOG: checkpoint complete: wrote 590 buffers (14.4%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.117 s, total=1.746 s; sync files=31, longest=0.104 s, average=0.003 s $ psql -c "select write_ms,sync_ms,sync_files from pg_stat_bgwriter" write_ms | sync_ms | sync_files ----------+---------+------------ 2 | 117 | 31 Like the other checkpoint statistics, these are updated in one lump when the checkpoint finishes. This includes rough documentation and the code isn't very complicated; a cat version bump is required. Some open concerns I still have about this: -The total amount of time in the sync phase is not the same as the total of all sync wait times. I have published the total elapsed sync time. The total of the times spent waiting on individual sync calls is also available at the point when the above data is collected, and it could be published as well. I don't think that's necessary yet, since they are almost identical right now. But any feature that staggers sync calls around might make that distinction important. -The total checkpoint time is larger than the total of write+sync. You can see that here: 2ms write + 117ms sync = 119ms; total checkpoint time is actually 1746ms though. While I don't normally find the total useful from a tuning perspective, it may be sensible to publish it just to further reduce any perceived need for log scraping here. -None of the debug1 level information about individual sync calls can be published this way usefully. I'd like that to come out somewhere eventually, but really what's needed there is a performance event logging history collector, not pg_stats. I'm content that this patch gets the big picture data out; it's sufficient to let you rule out checkpoint problems if they aren't happening. If you only have to drill into the logs when long times start showing up in this data, that's not so bad to live with. -I can't tell for sure if this is working properly when log_checkpoints is off. This now collects checkpoint end time data in all cases, whereas before it ignored that work if log_checkpoints was off. I think I will have to fake the logging in this area, make it log some things even when log_checkpoints is off, to confirm it gives the right results. I haven't done that yet. I would like to be able to tell people they need never turn on log_checkpoints if they monitor pg_stat_bgwriter instead, because that sets a good precedent for what direction the database is going. It would be nice for pg_stat_bgwriter's format to settle down for a long period too. In addition to my small list of concerns here, there seem to be an increasing number of write path feature patches floating around here lately. That all works against this being stable. The main consumer of this data I use regularly is the pg_stat_bgwriter graphs that Munin produces, and my expectation is that myself and/or Magnus will make sure it's compatible with any changes made here for 9.2. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
Attachment
pgsql-hackers by date: