Re: Add statistics to pg_stat_wal view for wal related parameter tuning - Mailing list pgsql-hackers
From | Masahiro Ikeda |
---|---|
Subject | Re: Add statistics to pg_stat_wal view for wal related parameter tuning |
Date | |
Msg-id | 7eb4ebd487b53200bde98ba2f0a7e789@oss.nttdata.com Whole thread Raw |
In response to | Re: Add statistics to pg_stat_wal view for wal related parameter tuning (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Add statistics to pg_stat_wal view for wal related parameter tuning
|
List | pgsql-hackers |
On 2020-11-25 20:19, Fujii Masao wrote: > On 2020/11/19 16:31, Masahiro Ikeda wrote: >> On 2020-11-17 11:46, Fujii Masao wrote: >>> On 2020/11/16 16:35, Masahiro Ikeda wrote: >>>> On 2020-11-12 14:58, Fujii Masao wrote: >>>>> On 2020/11/06 10:25, Masahiro Ikeda wrote: >>>>>> On 2020-10-30 11:50, Fujii Masao wrote: >>>>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Thanks for your comments and advice. I updated the patch. >>>>>>>> >>>>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote: >>>>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda >>>>>>>>> <ikedamsh@oss.nttdata.com> wrote in >>>>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote: >>>>>>>>>> > I see that we also need to add extra code to capture these stats (some >>>>>>>>>> > of which is in performance-critical path especially in >>>>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might >>>>>>>>>> > be that it is all fine as it is very important to collect these stats >>>>>>>>>> > at cluster-level in spite that the same information can be gathered at >>>>>>>>>> > statement-level to help customers but I don't see a very strong case >>>>>>>>>> > for that in your proposal. >>>>>>>>> >>>>>>>>> We should avoid that duplication as possible even if the both >>>>>>>>> number >>>>>>>>> are important. >>>>>>>>> >>>>>>>>>> Also about performance, I thought there are few impacts >>>>>>>>>> because it >>>>>>>>>> increments stats in memory. If I can implement to reuse >>>>>>>>>> pgWalUsage's >>>>>>>>>> value which already collects these stats, there is no impact >>>>>>>>>> in >>>>>>>>>> XLogInsertRecord. >>>>>>>>>> For example, how about pg_stat_wal() calculates the >>>>>>>>>> accumulated >>>>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use >>>>>>>>>> pgWalUsage's >>>>>>>>>> value? >>>>>>>>> >>>>>>>>> I don't think that works, but it would work that >>>>>>>>> pgstat_send_wal() >>>>>>>>> takes the difference of that values between two successive >>>>>>>>> calls. >>>>>>>>> >>>>>>>>> WalUsage prevWalUsage; >>>>>>>>> ... >>>>>>>>> pgstat_send_wal() >>>>>>>>> { >>>>>>>>> .. >>>>>>>>> /* fill in some values using pgWalUsage */ >>>>>>>>> WalStats.m_wal_bytes = pgWalUsage.wal_bytes - >>>>>>>>> prevWalUsage.wal_bytes; >>>>>>>>> WalStats.m_wal_records = pgWalUsage.wal_records - >>>>>>>>> prevWalUsage.wal_records; >>>>>>>>> WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi - >>>>>>>>> prevWalUsage.wal_fpi; >>>>>>>>> ... >>>>>>>>> pgstat_send(&WalStats, sizeof(WalStats)); >>>>>>>>> >>>>>>>>> /* remember the current numbers */ >>>>>>>>> prevWalUsage = pgWalUsage; >>>>>>>> >>>>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage >>>>>>>> which is already defined and eliminates the extra overhead. >>>>>>> >>>>>>> + /* fill in some values using pgWalUsage */ >>>>>>> + WalStats.m_wal_bytes = pgWalUsage.wal_bytes - >>>>>>> prevWalUsage.wal_bytes; >>>>>>> + WalStats.m_wal_records = pgWalUsage.wal_records - >>>>>>> prevWalUsage.wal_records; >>>>>>> + WalStats.m_wal_fpi = pgWalUsage.wal_fpi - >>>>>>> prevWalUsage.wal_fpi; >>>>>>> >>>>>>> It's better to use WalUsageAccumDiff() here? >>>>>> >>>>>> Yes, thanks. I fixed it. >>>>>> >>>>>>> prevWalUsage needs to be initialized with pgWalUsage? >>>>>>> >>>>>>> + if (AmWalWriterProcess()){ >>>>>>> + WalStats.m_wal_write_walwriter++; >>>>>>> + } >>>>>>> + else >>>>>>> + { >>>>>>> + WalStats.m_wal_write_backend++; >>>>>>> + } >>>>>>> >>>>>>> I think that it's better not to separate m_wal_write_xxx into two >>>>>>> for >>>>>>> walwriter and other processes. Instead, we can use one >>>>>>> m_wal_write_xxx >>>>>>> counter and make pgstat_send_wal() send also the process type to >>>>>>> the stats collector. Then the stats collector can accumulate the >>>>>>> counters >>>>>>> per process type if necessary. If we adopt this approach, we can >>>>>>> easily >>>>>>> extend pg_stat_wal so that any fields can be reported per process >>>>>>> type. >>>>>> >>>>>> I'll remove the above source code because these counters are not >>>>>> useful. >>>>>> >>>>>> >>>>>> On 2020-10-30 12:00, Fujii Masao wrote: >>>>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I think we need to add some statistics to pg_stat_wal view. >>>>>>>> >>>>>>>> Although there are some parameter related WAL, >>>>>>>> there are few statistics for tuning them. >>>>>>>> >>>>>>>> I think it's better to provide the following statistics. >>>>>>>> Please let me know your comments. >>>>>>>> >>>>>>>> ``` >>>>>>>> postgres=# SELECT * from pg_stat_wal; >>>>>>>> -[ RECORD 1 ]-------+------------------------------ >>>>>>>> wal_records | 2000224 >>>>>>>> wal_fpi | 47 >>>>>>>> wal_bytes | 248216337 >>>>>>>> wal_buffers_full | 20954 >>>>>>>> wal_init_file | 8 >>>>>>>> wal_write_backend | 20960 >>>>>>>> wal_write_walwriter | 46 >>>>>>>> wal_write_time | 51 >>>>>>>> wal_sync_backend | 7 >>>>>>>> wal_sync_walwriter | 8 >>>>>>>> wal_sync_time | 0 >>>>>>>> stats_reset | 2020-10-20 11:04:51.307771+09 >>>>>>>> ``` >>>>>>>> >>>>>>>> 1. Basic statistics of WAL activity >>>>>>>> >>>>>>>> - wal_records: Total number of WAL records generated >>>>>>>> - wal_fpi: Total number of WAL full page images generated >>>>>>>> - wal_bytes: Total amount of WAL bytes generated >>>>>>>> >>>>>>>> To understand DB's performance, first, we will check the >>>>>>>> performance >>>>>>>> trends for the entire database instance. >>>>>>>> For example, if the number of wal_fpi becomes higher, users may >>>>>>>> tune >>>>>>>> "wal_compression", "checkpoint_timeout" and so on. >>>>>>>> >>>>>>>> Although users can check the above statistics via EXPLAIN, >>>>>>>> auto_explain, >>>>>>>> autovacuum and pg_stat_statements now, >>>>>>>> if users want to see the performance trends for the entire >>>>>>>> database, >>>>>>>> they must recalculate the statistics. >>>>>>>> >>>>>>>> I think it is useful to add the sum of the basic statistics. >>>>>>>> >>>>>>>> >>>>>>>> 2. WAL segment file creation >>>>>>>> >>>>>>>> - wal_init_file: Total number of WAL segment files created. >>>>>>>> >>>>>>>> To create a new WAL file may have an impact on the performance >>>>>>>> of >>>>>>>> a write-heavy workload generating lots of WAL. If this number is >>>>>>>> reported high, >>>>>>>> to reduce the number of this initialization, we can tune >>>>>>>> WAL-related parameters >>>>>>>> so that more "recycled" WAL files can be held. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 3. Number of when WAL is flushed >>>>>>>> >>>>>>>> - wal_write_backend : Total number of WAL data written to the >>>>>>>> disk by backends >>>>>>>> - wal_write_walwriter : Total number of WAL data written to the >>>>>>>> disk by walwriter >>>>>>>> - wal_sync_backend : Total number of WAL data synced to the disk >>>>>>>> by backends >>>>>>>> - wal_sync_walwriter : Total number of WAL data synced to the >>>>>>>> disk by walwrite >>>>>>>> >>>>>>>> I think it's useful for tuning "synchronous_commit" and >>>>>>>> "commit_delay" for query executions. >>>>>>>> If the number of WAL is flushed is high, users can know >>>>>>>> "synchronous_commit" is useful for the workload. >>>>>>> >>>>>>> I just wonder how useful these counters are. Even without these >>>>>>> counters, >>>>>>> we already know synchronous_commit=off is likely to cause the >>>>>>> better >>>>>>> performance (but has the risk of data loss). So ISTM that these >>>>>>> counters are >>>>>>> not so useful when tuning synchronous_commit. >>>>>> >>>>>> Thanks, my understanding was wrong. >>>>>> I agreed that your comments. >>>>>> >>>>>> I merged the statistics of *_backend and *_walwriter. >>>>>> I think the sum of them is useful to calculate the average per >>>>>> write/sync time. >>>>>> For example, per write time is equals wal_write_time / wal_write. >>>>> >>>>> Understood. >>>>> >>>>> Thanks for updating the patch! >>>> >>>> Thanks for your comments. >>>> >>>>> patching file src/include/catalog/pg_proc.dat >>>>> Hunk #1 FAILED at 5491. >>>>> 1 out of 1 hunk FAILED -- saving rejects to file >>>>> src/include/catalog/pg_proc.dat.rej >>>>> >>>>> I got this failure when applying the patch. Could you update the >>>>> patch? >>>> >>>> Thanks, I updated the patch. >>>> >>>>> - Number of times WAL data was written to the disk because >>>>> WAL >>>>> buffers got full >>>>> + Total number of times WAL data written to the disk because >>>>> WAL >>>>> buffers got full >>>>> >>>>> Isn't "was" necessary between "data" and "written"? >>>> >>>> Yes, I fixed it. >>>> >>>>> + <entry role="catalog_table_entry"><para >>>>> role="column_definition"> >>>>> + <structfield>wal_bytes</structfield> <type>bigint</type> >>>>> >>>>> Shouldn't the type of wal_bytes be numeric because the total number >>>>> of >>>>> WAL bytes can exceed the range of bigint? I think that the type of >>>>> pg_stat_statements.wal_bytes is also numeric for the same reason. >>>> >>>> Thanks, I fixed it. >>>> >>>> Since I cast the type of wal_bytes from PgStat_Counter to uint64, >>>> I changed the type of PgStat_MsgWal and PgStat_WalStats too. >>>> >>>>> + <entry role="catalog_table_entry"><para >>>>> role="column_definition"> >>>>> + <structfield>wal_write_time</structfield> >>>>> <type>bigint</type> >>>>> >>>>> Shouldn't the type of wal_xxx_time be double precision, >>>>> like pg_stat_database.blk_write_time? >>>> >>>> Thanks, I changed it. >>>> >>>>> Even when fsync is set to off or wal_sync_method is set to >>>>> open_sync, >>>>> wal_sync is incremented. Isn't this behavior confusing? >>> >>> What do you think about this comment? >> >> Sorry, I'll change to increment wal_sync and wal_sync_time only >> if a specific fsync method is called. >> >>> I found that we discussed track-WAL-IO-timing feature at the past >>> discussion >>> about the similar feature [1]. But the feature was droppped from the >>> proposal >>> patch because there was the performance concern. So probably we need >>> to >>> revisit the past discussion and benchmark the performance. Thought? >>> >>> If track-WAL-IO-timing feature may cause performance regression, >>> it might be an idea to extract wal_records, wal_fpi and wal_bytes >>> parts >>> from the patch and commit it at first. >>> >>> [1] >>> https://postgr.es/m/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST+vwJcFtCSCEySnA@mail.gmail.com >> >> Thanks, I'll check the thread. >> I agree to add basic statistics at first and I attached the patch. > > Thanks! > > + /* Send WAL statistics */ > + pgstat_send_wal(); > > This is not necessary because walwriter generates no WAL data? No, it's not necessary. Thanks. I fixed it. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
pgsql-hackers by date: