Re: Add statistics to pg_stat_wal view for wal related parameter tuning - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Add statistics to pg_stat_wal view for wal related parameter tuning |
Date | |
Msg-id | 451365b7-9e22-211e-428f-fc90ebc715bc@oss.nttdata.com Whole thread Raw |
In response to | Re: Add statistics to pg_stat_wal view for wal related parameter tuning (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Add statistics to pg_stat_wal view for wal related parameter tuning
|
List | pgsql-hackers |
On 2020/12/01 14:01, Fujii Masao wrote: > > > On 2020/11/26 16:07, Masahiro Ikeda wrote: >> On 2020-11-25 20:19, Fujii Masao wrote: >>> On 2020/11/19 16:31, Masahiro Ikeda wrote: >>>> On 2020-11-17 11:46, Fujii Masao wrote: >>>>> On 2020/11/16 16:35, Masahiro Ikeda wrote: >>>>>> On 2020-11-12 14:58, Fujii Masao wrote: >>>>>>> On 2020/11/06 10:25, Masahiro Ikeda wrote: >>>>>>>> On 2020-10-30 11:50, Fujii Masao wrote: >>>>>>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Thanks for your comments and advice. I updated the patch. >>>>>>>>>> >>>>>>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote: >>>>>>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda >>>>>>>>>>> <ikedamsh@oss.nttdata.com> wrote in >>>>>>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote: >>>>>>>>>>>> > I see that we also need to add extra code to capture these stats (some >>>>>>>>>>>> > of which is in performance-critical path especially in >>>>>>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might >>>>>>>>>>>> > be that it is all fine as it is very important to collect these stats >>>>>>>>>>>> > at cluster-level in spite that the same information can be gathered at >>>>>>>>>>>> > statement-level to help customers but I don't see a very strong case >>>>>>>>>>>> > for that in your proposal. >>>>>>>>>>> >>>>>>>>>>> We should avoid that duplication as possible even if the both number >>>>>>>>>>> are important. >>>>>>>>>>> >>>>>>>>>>>> Also about performance, I thought there are few impacts because it >>>>>>>>>>>> increments stats in memory. If I can implement to reuse pgWalUsage's >>>>>>>>>>>> value which already collects these stats, there is no impact in >>>>>>>>>>>> XLogInsertRecord. >>>>>>>>>>>> For example, how about pg_stat_wal() calculates the accumulated >>>>>>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's >>>>>>>>>>>> value? >>>>>>>>>>> >>>>>>>>>>> I don't think that works, but it would work that pgstat_send_wal() >>>>>>>>>>> takes the difference of that values between two successive calls. >>>>>>>>>>> >>>>>>>>>>> WalUsage prevWalUsage; >>>>>>>>>>> ... >>>>>>>>>>> pgstat_send_wal() >>>>>>>>>>> { >>>>>>>>>>> .. >>>>>>>>>>> /* fill in some values using pgWalUsage */ >>>>>>>>>>> WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes; >>>>>>>>>>> WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records; >>>>>>>>>>> WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi; >>>>>>>>>>> ... >>>>>>>>>>> pgstat_send(&WalStats, sizeof(WalStats)); >>>>>>>>>>> >>>>>>>>>>> /* remember the current numbers */ >>>>>>>>>>> prevWalUsage = pgWalUsage; >>>>>>>>>> >>>>>>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage >>>>>>>>>> which is already defined and eliminates the extra overhead. >>>>>>>>> >>>>>>>>> + /* fill in some values using pgWalUsage */ >>>>>>>>> + WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes; >>>>>>>>> + WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records; >>>>>>>>> + WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi; >>>>>>>>> >>>>>>>>> It's better to use WalUsageAccumDiff() here? >>>>>>>> >>>>>>>> Yes, thanks. I fixed it. >>>>>>>> >>>>>>>>> prevWalUsage needs to be initialized with pgWalUsage? >>>>>>>>> >>>>>>>>> + if (AmWalWriterProcess()){ >>>>>>>>> + WalStats.m_wal_write_walwriter++; >>>>>>>>> + } >>>>>>>>> + else >>>>>>>>> + { >>>>>>>>> + WalStats.m_wal_write_backend++; >>>>>>>>> + } >>>>>>>>> >>>>>>>>> I think that it's better not to separate m_wal_write_xxx into two for >>>>>>>>> walwriter and other processes. Instead, we can use one m_wal_write_xxx >>>>>>>>> counter and make pgstat_send_wal() send also the process type to >>>>>>>>> the stats collector. Then the stats collector can accumulate the counters >>>>>>>>> per process type if necessary. If we adopt this approach, we can easily >>>>>>>>> extend pg_stat_wal so that any fields can be reported per process type. >>>>>>>> >>>>>>>> I'll remove the above source code because these counters are not useful. >>>>>>>> >>>>>>>> >>>>>>>> On 2020-10-30 12:00, Fujii Masao wrote: >>>>>>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I think we need to add some statistics to pg_stat_wal view. >>>>>>>>>> >>>>>>>>>> Although there are some parameter related WAL, >>>>>>>>>> there are few statistics for tuning them. >>>>>>>>>> >>>>>>>>>> I think it's better to provide the following statistics. >>>>>>>>>> Please let me know your comments. >>>>>>>>>> >>>>>>>>>> ``` >>>>>>>>>> postgres=# SELECT * from pg_stat_wal; >>>>>>>>>> -[ RECORD 1 ]-------+------------------------------ >>>>>>>>>> wal_records | 2000224 >>>>>>>>>> wal_fpi | 47 >>>>>>>>>> wal_bytes | 248216337 >>>>>>>>>> wal_buffers_full | 20954 >>>>>>>>>> wal_init_file | 8 >>>>>>>>>> wal_write_backend | 20960 >>>>>>>>>> wal_write_walwriter | 46 >>>>>>>>>> wal_write_time | 51 >>>>>>>>>> wal_sync_backend | 7 >>>>>>>>>> wal_sync_walwriter | 8 >>>>>>>>>> wal_sync_time | 0 >>>>>>>>>> stats_reset | 2020-10-20 11:04:51.307771+09 >>>>>>>>>> ``` >>>>>>>>>> >>>>>>>>>> 1. Basic statistics of WAL activity >>>>>>>>>> >>>>>>>>>> - wal_records: Total number of WAL records generated >>>>>>>>>> - wal_fpi: Total number of WAL full page images generated >>>>>>>>>> - wal_bytes: Total amount of WAL bytes generated >>>>>>>>>> >>>>>>>>>> To understand DB's performance, first, we will check the performance >>>>>>>>>> trends for the entire database instance. >>>>>>>>>> For example, if the number of wal_fpi becomes higher, users may tune >>>>>>>>>> "wal_compression", "checkpoint_timeout" and so on. >>>>>>>>>> >>>>>>>>>> Although users can check the above statistics via EXPLAIN, auto_explain, >>>>>>>>>> autovacuum and pg_stat_statements now, >>>>>>>>>> if users want to see the performance trends for the entire database, >>>>>>>>>> they must recalculate the statistics. >>>>>>>>>> >>>>>>>>>> I think it is useful to add the sum of the basic statistics. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2. WAL segment file creation >>>>>>>>>> >>>>>>>>>> - wal_init_file: Total number of WAL segment files created. >>>>>>>>>> >>>>>>>>>> To create a new WAL file may have an impact on the performance of >>>>>>>>>> a write-heavy workload generating lots of WAL. If this number is reported high, >>>>>>>>>> to reduce the number of this initialization, we can tune WAL-related parameters >>>>>>>>>> so that more "recycled" WAL files can be held. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 3. Number of when WAL is flushed >>>>>>>>>> >>>>>>>>>> - wal_write_backend : Total number of WAL data written to the disk by backends >>>>>>>>>> - wal_write_walwriter : Total number of WAL data written to the disk by walwriter >>>>>>>>>> - wal_sync_backend : Total number of WAL data synced to the disk by backends >>>>>>>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk by walwrite >>>>>>>>>> >>>>>>>>>> I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions. >>>>>>>>>> If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload. >>>>>>>>> >>>>>>>>> I just wonder how useful these counters are. Even without these counters, >>>>>>>>> we already know synchronous_commit=off is likely to cause the better >>>>>>>>> performance (but has the risk of data loss). So ISTM that these counters are >>>>>>>>> not so useful when tuning synchronous_commit. >>>>>>>> >>>>>>>> Thanks, my understanding was wrong. >>>>>>>> I agreed that your comments. >>>>>>>> >>>>>>>> I merged the statistics of *_backend and *_walwriter. >>>>>>>> I think the sum of them is useful to calculate the average per write/sync time. >>>>>>>> For example, per write time is equals wal_write_time / wal_write. >>>>>>> >>>>>>> Understood. >>>>>>> >>>>>>> Thanks for updating the patch! >>>>>> >>>>>> Thanks for your comments. >>>>>> >>>>>>> patching file src/include/catalog/pg_proc.dat >>>>>>> Hunk #1 FAILED at 5491. >>>>>>> 1 out of 1 hunk FAILED -- saving rejects to file >>>>>>> src/include/catalog/pg_proc.dat.rej >>>>>>> >>>>>>> I got this failure when applying the patch. Could you update the patch? >>>>>> >>>>>> Thanks, I updated the patch. >>>>>> >>>>>>> - Number of times WAL data was written to the disk because WAL >>>>>>> buffers got full >>>>>>> + Total number of times WAL data written to the disk because WAL >>>>>>> buffers got full >>>>>>> >>>>>>> Isn't "was" necessary between "data" and "written"? >>>>>> >>>>>> Yes, I fixed it. >>>>>> >>>>>>> + <entry role="catalog_table_entry"><para role="column_definition"> >>>>>>> + <structfield>wal_bytes</structfield> <type>bigint</type> >>>>>>> >>>>>>> Shouldn't the type of wal_bytes be numeric because the total number of >>>>>>> WAL bytes can exceed the range of bigint? I think that the type of >>>>>>> pg_stat_statements.wal_bytes is also numeric for the same reason. >>>>>> >>>>>> Thanks, I fixed it. >>>>>> >>>>>> Since I cast the type of wal_bytes from PgStat_Counter to uint64, >>>>>> I changed the type of PgStat_MsgWal and PgStat_WalStats too. >>>>>> >>>>>>> + <entry role="catalog_table_entry"><para role="column_definition"> >>>>>>> + <structfield>wal_write_time</structfield> <type>bigint</type> >>>>>>> >>>>>>> Shouldn't the type of wal_xxx_time be double precision, >>>>>>> like pg_stat_database.blk_write_time? >>>>>> >>>>>> Thanks, I changed it. >>>>>> >>>>>>> Even when fsync is set to off or wal_sync_method is set to open_sync, >>>>>>> wal_sync is incremented. Isn't this behavior confusing? >>>>> >>>>> What do you think about this comment? >>>> >>>> Sorry, I'll change to increment wal_sync and wal_sync_time only >>>> if a specific fsync method is called. >>>> >>>>> I found that we discussed track-WAL-IO-timing feature at the past discussion >>>>> about the similar feature [1]. But the feature was droppped from the proposal >>>>> patch because there was the performance concern. So probably we need to >>>>> revisit the past discussion and benchmark the performance. Thought? >>>>> >>>>> If track-WAL-IO-timing feature may cause performance regression, >>>>> it might be an idea to extract wal_records, wal_fpi and wal_bytes parts >>>>> from the patch and commit it at first. >>>>> >>>>> [1] >>>>> https://postgr.es/m/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST+vwJcFtCSCEySnA@mail.gmail.com >>>> >>>> Thanks, I'll check the thread. >>>> I agree to add basic statistics at first and I attached the patch. >>> >>> Thanks! >>> >>> + /* Send WAL statistics */ >>> + pgstat_send_wal(); >>> >>> This is not necessary because walwriter generates no WAL data? >> >> No, it's not necessary. >> Thanks. I fixed it. > > Thanks for updating the patch! I applied cosmetic changes to it. > For example, I added more comments. Patch attached. > Barring no objection, I will commit this patch. Pushed. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
pgsql-hackers by date: