Re: New statistics for tuning WAL buffer size - Mailing list pgsql-hackers
From | Masahiro Ikeda |
---|---|
Subject | Re: New statistics for tuning WAL buffer size |
Date | |
Msg-id | b2a6ceacd15eff5bfe48ba34f9193ef5@oss.nttdata.com Whole thread Raw |
In response to | Re: New statistics for tuning WAL buffer size (Fujii Masao <masao.fujii@oss.nttdata.com>) |
List | pgsql-hackers |
On 2020-10-15 19:49, Fujii Masao wrote: > On 2020/10/13 11:57, Masahiro Ikeda wrote: >> On 2020-10-06 15:57, Masahiro Ikeda wrote: >>> 2. Number of when new WAL file is created and zero-filled. >>> >>> As Fujii-san already commented, I think it's good for tuning. >>> >>>> Just idea; it may be worth exposing the number of when new WAL file >>>> is created and zero-filled. This initialization may have impact on >>>> the performance of write-heavy workload generating lots of WAL. If >>>> this number is reported high, to reduce the number of this >>>> initialization, we can tune WAL-related parameters so that more >>>> "recycled" WAL files can be hold. > > +1 > > But it might be better to track the number of when new WAL file is > created whether it's zero-filled or not, if file creation and sync > itself > takes time. OK, I changed to track the number of when new WAL file is created. >>> >>> >>> 3. Number of when to switch the WAL logfile segment. >>> >>> This is similar to 2, but this counts the number of when WAL file is >>> recylcled too. >>> I think it's useful for tuning "wal_segment_size" >>> if the number is high relative to the startup time, >>> "wal_segment_size" >>> must be bigger. > > You're thinking to count all the WAL file switch? That number is equal > to the number of WAL files generated since the last reset of > pg_stat_wal? Yes. I think it might be better to count it because I think the ratio in which a new WAL file is created is important. To calculate it, we need the count all the WAL file switch. >>> 4. Number of when WAL is flushed >>> >>> I think it's useful for tuning "synchronous_commit" and >>> "commit_delay" >>> for query executions. >>> If the number of WAL is flushed is high, users can know >>> "synchronous_commit" is useful for the workload. >>> >>> Also, it's useful for tuning "wal_writer_delay" and >>> "wal_writer_flush_after" for wal writer. >>> If the number is high, users can change the parameter for >>> performance. >>> >>> I think it's better to separate this for backends and wal writer. > > +1 Thanks, I separated the statistics for backends and wal writer. When checkpointer process flushes the WAL, the statistics for backends are counted now. Although I think its impact is not big, is it better to make statistics for checkpointer? >>> 5. Wait time when WAL is flushed. >>> >>> This is the accumulated time when wal is flushed. >>> If the time becomes much higher, users can detect the possibility of >>> disk failure. > > This should be tracked, e.g., only when track_io_timing is enabled? > Otherwise, tracking that may cause performance overhead. OK, I changed the implementation. >>> Since users can see how much flash time occupies of the query >>> execution time, >>> it may lead to query tuning and so on. >>> >>> Since there is the above reason, I think it's better to separate this >>> for backends and wal writer. > > > I'm afraid that this counter for a backend may be a bit confusing. > Because > when the counter indicates small time, we may think that walwriter > almost > write WAL data and a backend doesn't take time to write WAL. But a > backend > may be just waiting for walwriter to write WAL. Thanks for your comments. I agreed. Now, the following is the view implemented in the attached patch. If you have any other comments, please let me know. ``` postgres=# SELECT * FROM pg_stat_wal; -[ RECORD 1 ]-------+------------------------------ wal_records | 1000128 # Total number of WAL records generated wal_fpi | 1 # Total number of WAL full page images generated wal_bytes | 124013682 #Total amount of WAL bytes generated wal_buffers_full | 7952 #Total number of WAL data written to the disk because WAL buffers got full wal_file | 14 #Total number of WAL file segment created or opened a pre-existing one wal_init_file | 7 #Total number of WAL file segment created wal_write_backend | 7956 #Total number of WAL data written to the disk by backends wal_write_walwriter | 27 #Total number of WAL data written to the disk by walwriter wal_write_time | 40 # Total amount of time that has been spent in the portion of WAL data was written to disk by backend and walwriter, in milliseconds wal_sync_backend | 1 # Total number of WAL data synced to the disk by backends wal_sync_walwriter | 6 #Total number of WAL data synced to the disk by walwriter wal_sync_time | 0 # Total amount of time that has been spent in the portion of WAL data was synced to disk by backend and walwriter, in milliseconds stats_reset | 2020-10-16 19:41:01.892272+09 ``` Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
pgsql-hackers by date: