Re: Add statistics to pg_stat_wal view for wal related parameter tuning - Mailing list pgsql-hackers
From | Masahiro Ikeda |
---|---|
Subject | Re: Add statistics to pg_stat_wal view for wal related parameter tuning |
Date | |
Msg-id | fff4b32ce3c4f33aeca5787c81304cfc@oss.nttdata.com Whole thread Raw |
In response to | Re: Add statistics to pg_stat_wal view for wal related parameter tuning (Fujii Masao <masao.fujii@oss.nttdata.com>) |
List | pgsql-hackers |
On 2020-11-12 16:27, Fujii Masao wrote: > On 2020/11/12 14:58, Fujii Masao wrote: >> >> >> On 2020/11/06 10:25, Masahiro Ikeda wrote: >>> On 2020-10-30 11:50, Fujii Masao wrote: >>>> On 2020/10/29 17:03, Masahiro Ikeda wrote: >>>>> Hi, >>>>> >>>>> Thanks for your comments and advice. I updated the patch. >>>>> >>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote: >>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda >>>>>> <ikedamsh@oss.nttdata.com> wrote in >>>>>>> On 2020-10-20 12:46, Amit Kapila wrote: >>>>>>> > I see that we also need to add extra code to capture these stats (some >>>>>>> > of which is in performance-critical path especially in >>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might >>>>>>> > be that it is all fine as it is very important to collect these stats >>>>>>> > at cluster-level in spite that the same information can be gathered at >>>>>>> > statement-level to help customers but I don't see a very strong case >>>>>>> > for that in your proposal. >>>>>> >>>>>> We should avoid that duplication as possible even if the both >>>>>> number >>>>>> are important. >>>>>> >>>>>>> Also about performance, I thought there are few impacts because >>>>>>> it >>>>>>> increments stats in memory. If I can implement to reuse >>>>>>> pgWalUsage's >>>>>>> value which already collects these stats, there is no impact in >>>>>>> XLogInsertRecord. >>>>>>> For example, how about pg_stat_wal() calculates the accumulated >>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's >>>>>>> value? >>>>>> >>>>>> I don't think that works, but it would work that pgstat_send_wal() >>>>>> takes the difference of that values between two successive calls. >>>>>> >>>>>> WalUsage prevWalUsage; >>>>>> ... >>>>>> pgstat_send_wal() >>>>>> { >>>>>> .. >>>>>> /* fill in some values using pgWalUsage */ >>>>>> WalStats.m_wal_bytes = pgWalUsage.wal_bytes - >>>>>> prevWalUsage.wal_bytes; >>>>>> WalStats.m_wal_records = pgWalUsage.wal_records - >>>>>> prevWalUsage.wal_records; >>>>>> WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi - >>>>>> prevWalUsage.wal_fpi; >>>>>> ... >>>>>> pgstat_send(&WalStats, sizeof(WalStats)); >>>>>> >>>>>> /* remember the current numbers */ >>>>>> prevWalUsage = pgWalUsage; >>>>> >>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage >>>>> which is already defined and eliminates the extra overhead. >>>> >>>> + /* fill in some values using pgWalUsage */ >>>> + WalStats.m_wal_bytes = pgWalUsage.wal_bytes - >>>> prevWalUsage.wal_bytes; >>>> + WalStats.m_wal_records = pgWalUsage.wal_records - >>>> prevWalUsage.wal_records; >>>> + WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi; >>>> >>>> It's better to use WalUsageAccumDiff() here? >>> >>> Yes, thanks. I fixed it. >>> >>>> prevWalUsage needs to be initialized with pgWalUsage? >>>> >>>> + if (AmWalWriterProcess()){ >>>> + WalStats.m_wal_write_walwriter++; >>>> + } >>>> + else >>>> + { >>>> + WalStats.m_wal_write_backend++; >>>> + } >>>> >>>> I think that it's better not to separate m_wal_write_xxx into two >>>> for >>>> walwriter and other processes. Instead, we can use one >>>> m_wal_write_xxx >>>> counter and make pgstat_send_wal() send also the process type to >>>> the stats collector. Then the stats collector can accumulate the >>>> counters >>>> per process type if necessary. If we adopt this approach, we can >>>> easily >>>> extend pg_stat_wal so that any fields can be reported per process >>>> type. >>> >>> I'll remove the above source code because these counters are not >>> useful. >>> >>> >>> On 2020-10-30 12:00, Fujii Masao wrote: >>>> On 2020/10/20 11:31, Masahiro Ikeda wrote: >>>>> Hi, >>>>> >>>>> I think we need to add some statistics to pg_stat_wal view. >>>>> >>>>> Although there are some parameter related WAL, >>>>> there are few statistics for tuning them. >>>>> >>>>> I think it's better to provide the following statistics. >>>>> Please let me know your comments. >>>>> >>>>> ``` >>>>> postgres=# SELECT * from pg_stat_wal; >>>>> -[ RECORD 1 ]-------+------------------------------ >>>>> wal_records | 2000224 >>>>> wal_fpi | 47 >>>>> wal_bytes | 248216337 >>>>> wal_buffers_full | 20954 >>>>> wal_init_file | 8 >>>>> wal_write_backend | 20960 >>>>> wal_write_walwriter | 46 >>>>> wal_write_time | 51 >>>>> wal_sync_backend | 7 >>>>> wal_sync_walwriter | 8 >>>>> wal_sync_time | 0 >>>>> stats_reset | 2020-10-20 11:04:51.307771+09 >>>>> ``` >>>>> >>>>> 1. Basic statistics of WAL activity >>>>> >>>>> - wal_records: Total number of WAL records generated >>>>> - wal_fpi: Total number of WAL full page images generated >>>>> - wal_bytes: Total amount of WAL bytes generated >>>>> >>>>> To understand DB's performance, first, we will check the >>>>> performance >>>>> trends for the entire database instance. >>>>> For example, if the number of wal_fpi becomes higher, users may >>>>> tune >>>>> "wal_compression", "checkpoint_timeout" and so on. >>>>> >>>>> Although users can check the above statistics via EXPLAIN, >>>>> auto_explain, >>>>> autovacuum and pg_stat_statements now, >>>>> if users want to see the performance trends for the entire >>>>> database, >>>>> they must recalculate the statistics. >>>>> >>>>> I think it is useful to add the sum of the basic statistics. >>>>> >>>>> >>>>> 2. WAL segment file creation >>>>> >>>>> - wal_init_file: Total number of WAL segment files created. >>>>> >>>>> To create a new WAL file may have an impact on the performance of >>>>> a write-heavy workload generating lots of WAL. If this number is >>>>> reported high, >>>>> to reduce the number of this initialization, we can tune >>>>> WAL-related parameters >>>>> so that more "recycled" WAL files can be held. >>>>> >>>>> >>>>> >>>>> 3. Number of when WAL is flushed >>>>> >>>>> - wal_write_backend : Total number of WAL data written to the disk >>>>> by backends >>>>> - wal_write_walwriter : Total number of WAL data written to the >>>>> disk by walwriter >>>>> - wal_sync_backend : Total number of WAL data synced to the disk by >>>>> backends >>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk >>>>> by walwrite >>>>> >>>>> I think it's useful for tuning "synchronous_commit" and >>>>> "commit_delay" for query executions. >>>>> If the number of WAL is flushed is high, users can know >>>>> "synchronous_commit" is useful for the workload. >>>> >>>> I just wonder how useful these counters are. Even without these >>>> counters, >>>> we already know synchronous_commit=off is likely to cause the better >>>> performance (but has the risk of data loss). So ISTM that these >>>> counters are >>>> not so useful when tuning synchronous_commit. >>> >>> Thanks, my understanding was wrong. >>> I agreed that your comments. >>> >>> I merged the statistics of *_backend and *_walwriter. >>> I think the sum of them is useful to calculate the average per >>> write/sync time. >>> For example, per write time is equals wal_write_time / wal_write. >> >> Understood. >> >> Thanks for updating the patch! >> >> patching file src/include/catalog/pg_proc.dat >> Hunk #1 FAILED at 5491. >> 1 out of 1 hunk FAILED -- saving rejects to file >> src/include/catalog/pg_proc.dat.rej >> >> I got this failure when applying the patch. Could you update the >> patch? >> >> >> - Number of times WAL data was written to the disk because WAL >> buffers got full >> + Total number of times WAL data written to the disk because WAL >> buffers got full >> >> Isn't "was" necessary between "data" and "written"? >> >> >> + <entry role="catalog_table_entry"><para >> role="column_definition"> >> + <structfield>wal_bytes</structfield> <type>bigint</type> >> >> Shouldn't the type of wal_bytes be numeric because the total number of >> WAL bytes can exceed the range of bigint? I think that the type of >> pg_stat_statements.wal_bytes is also numeric for the same reason. >> >> >> + <entry role="catalog_table_entry"><para >> role="column_definition"> >> + <structfield>wal_write_time</structfield> <type>bigint</type> >> >> Shouldn't the type of wal_xxx_time be double precision, >> like pg_stat_database.blk_write_time? >> >> >> Even when fsync is set to off or wal_sync_method is set to open_sync, >> wal_sync is incremented. Isn't this behavior confusing? >> >> >> + Total amount of time that has been spent in the portion of >> + WAL data was written to disk by backend and walwriter, in >> milliseconds >> + (if <xref linkend="guc-track-io-timing"/> is enabled, >> otherwise zero) >> >> With the patch, track_io_timing controls both IO for data files and >> WAL files. But we may want to track only either of them. So it's >> better >> to extend track_io_timing so that we can specify the tracking target >> in the parameter? For example, we can make track_io_timing accept >> data, wal and all. Or we should introduce new GUC for WAL, e.g., >> track_wal_io_timing? Thought? >> >> I'm afraid that "by backend and walwriter" part can make us thinkg >> incorrectly that WAL writes by other processes like autovacuum >> are not tracked. > > pgstat_send_wal(void) > { > + /* fill in some values using pgWalUsage */ > + WalUsage walusage; > + memset(&walusage, 0, sizeof(WalUsage)); > + WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage); > > At the first call to pgstat_send_wal(), prevWalUsage has not been set > to > the previous value of pgWalUsage. So the calculation result of > WalUsageAccumDiff() can be incorrect. To address this issue, > prevWalUsage should be set to pgWalUsage or both should be initialized > with 0 at the beginning of the process, for example? I forgot to handle it, thanks. Although I initialized it in pgstat_initialize(), if there is better way please let me know. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
pgsql-hackers by date: