Re: Add statistics to pg_stat_wal view for wal related parameter tuning - Mailing list pgsql-hackers
From | Masahiro Ikeda |
---|---|
Subject | Re: Add statistics to pg_stat_wal view for wal related parameter tuning |
Date | |
Msg-id | 11e81f0b7d90dd69077d3d964a9a4da9@oss.nttdata.com Whole thread Raw |
In response to | Re: Add statistics to pg_stat_wal view for wal related parameter tuning (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Add statistics to pg_stat_wal view for wal related parameter tuning
|
List | pgsql-hackers |
On 2020-11-12 14:58, Fujii Masao wrote: > On 2020/11/06 10:25, Masahiro Ikeda wrote: >> On 2020-10-30 11:50, Fujii Masao wrote: >>> On 2020/10/29 17:03, Masahiro Ikeda wrote: >>>> Hi, >>>> >>>> Thanks for your comments and advice. I updated the patch. >>>> >>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote: >>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda >>>>> <ikedamsh@oss.nttdata.com> wrote in >>>>>> On 2020-10-20 12:46, Amit Kapila wrote: >>>>>> > I see that we also need to add extra code to capture these stats (some >>>>>> > of which is in performance-critical path especially in >>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might >>>>>> > be that it is all fine as it is very important to collect these stats >>>>>> > at cluster-level in spite that the same information can be gathered at >>>>>> > statement-level to help customers but I don't see a very strong case >>>>>> > for that in your proposal. >>>>> >>>>> We should avoid that duplication as possible even if the both >>>>> number >>>>> are important. >>>>> >>>>>> Also about performance, I thought there are few impacts because it >>>>>> increments stats in memory. If I can implement to reuse >>>>>> pgWalUsage's >>>>>> value which already collects these stats, there is no impact in >>>>>> XLogInsertRecord. >>>>>> For example, how about pg_stat_wal() calculates the accumulated >>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's >>>>>> value? >>>>> >>>>> I don't think that works, but it would work that pgstat_send_wal() >>>>> takes the difference of that values between two successive calls. >>>>> >>>>> WalUsage prevWalUsage; >>>>> ... >>>>> pgstat_send_wal() >>>>> { >>>>> .. >>>>> /* fill in some values using pgWalUsage */ >>>>> WalStats.m_wal_bytes = pgWalUsage.wal_bytes - >>>>> prevWalUsage.wal_bytes; >>>>> WalStats.m_wal_records = pgWalUsage.wal_records - >>>>> prevWalUsage.wal_records; >>>>> WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi - >>>>> prevWalUsage.wal_fpi; >>>>> ... >>>>> pgstat_send(&WalStats, sizeof(WalStats)); >>>>> >>>>> /* remember the current numbers */ >>>>> prevWalUsage = pgWalUsage; >>>> >>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage >>>> which is already defined and eliminates the extra overhead. >>> >>> + /* fill in some values using pgWalUsage */ >>> + WalStats.m_wal_bytes = pgWalUsage.wal_bytes - >>> prevWalUsage.wal_bytes; >>> + WalStats.m_wal_records = pgWalUsage.wal_records - >>> prevWalUsage.wal_records; >>> + WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi; >>> >>> It's better to use WalUsageAccumDiff() here? >> >> Yes, thanks. I fixed it. >> >>> prevWalUsage needs to be initialized with pgWalUsage? >>> >>> + if (AmWalWriterProcess()){ >>> + WalStats.m_wal_write_walwriter++; >>> + } >>> + else >>> + { >>> + WalStats.m_wal_write_backend++; >>> + } >>> >>> I think that it's better not to separate m_wal_write_xxx into two for >>> walwriter and other processes. Instead, we can use one >>> m_wal_write_xxx >>> counter and make pgstat_send_wal() send also the process type to >>> the stats collector. Then the stats collector can accumulate the >>> counters >>> per process type if necessary. If we adopt this approach, we can >>> easily >>> extend pg_stat_wal so that any fields can be reported per process >>> type. >> >> I'll remove the above source code because these counters are not >> useful. >> >> >> On 2020-10-30 12:00, Fujii Masao wrote: >>> On 2020/10/20 11:31, Masahiro Ikeda wrote: >>>> Hi, >>>> >>>> I think we need to add some statistics to pg_stat_wal view. >>>> >>>> Although there are some parameter related WAL, >>>> there are few statistics for tuning them. >>>> >>>> I think it's better to provide the following statistics. >>>> Please let me know your comments. >>>> >>>> ``` >>>> postgres=# SELECT * from pg_stat_wal; >>>> -[ RECORD 1 ]-------+------------------------------ >>>> wal_records | 2000224 >>>> wal_fpi | 47 >>>> wal_bytes | 248216337 >>>> wal_buffers_full | 20954 >>>> wal_init_file | 8 >>>> wal_write_backend | 20960 >>>> wal_write_walwriter | 46 >>>> wal_write_time | 51 >>>> wal_sync_backend | 7 >>>> wal_sync_walwriter | 8 >>>> wal_sync_time | 0 >>>> stats_reset | 2020-10-20 11:04:51.307771+09 >>>> ``` >>>> >>>> 1. Basic statistics of WAL activity >>>> >>>> - wal_records: Total number of WAL records generated >>>> - wal_fpi: Total number of WAL full page images generated >>>> - wal_bytes: Total amount of WAL bytes generated >>>> >>>> To understand DB's performance, first, we will check the performance >>>> trends for the entire database instance. >>>> For example, if the number of wal_fpi becomes higher, users may tune >>>> "wal_compression", "checkpoint_timeout" and so on. >>>> >>>> Although users can check the above statistics via EXPLAIN, >>>> auto_explain, >>>> autovacuum and pg_stat_statements now, >>>> if users want to see the performance trends for the entire >>>> database, >>>> they must recalculate the statistics. >>>> >>>> I think it is useful to add the sum of the basic statistics. >>>> >>>> >>>> 2. WAL segment file creation >>>> >>>> - wal_init_file: Total number of WAL segment files created. >>>> >>>> To create a new WAL file may have an impact on the performance of >>>> a write-heavy workload generating lots of WAL. If this number is >>>> reported high, >>>> to reduce the number of this initialization, we can tune WAL-related >>>> parameters >>>> so that more "recycled" WAL files can be held. >>>> >>>> >>>> >>>> 3. Number of when WAL is flushed >>>> >>>> - wal_write_backend : Total number of WAL data written to the disk >>>> by backends >>>> - wal_write_walwriter : Total number of WAL data written to the disk >>>> by walwriter >>>> - wal_sync_backend : Total number of WAL data synced to the disk by >>>> backends >>>> - wal_sync_walwriter : Total number of WAL data synced to the disk >>>> by walwrite >>>> >>>> I think it's useful for tuning "synchronous_commit" and >>>> "commit_delay" for query executions. >>>> If the number of WAL is flushed is high, users can know >>>> "synchronous_commit" is useful for the workload. >>> >>> I just wonder how useful these counters are. Even without these >>> counters, >>> we already know synchronous_commit=off is likely to cause the better >>> performance (but has the risk of data loss). So ISTM that these >>> counters are >>> not so useful when tuning synchronous_commit. >> >> Thanks, my understanding was wrong. >> I agreed that your comments. >> >> I merged the statistics of *_backend and *_walwriter. >> I think the sum of them is useful to calculate the average per >> write/sync time. >> For example, per write time is equals wal_write_time / wal_write. > > Understood. > > Thanks for updating the patch! Thanks for your comments. > patching file src/include/catalog/pg_proc.dat > Hunk #1 FAILED at 5491. > 1 out of 1 hunk FAILED -- saving rejects to file > src/include/catalog/pg_proc.dat.rej > > I got this failure when applying the patch. Could you update the patch? Thanks, I updated the patch. > - Number of times WAL data was written to the disk because WAL > buffers got full > + Total number of times WAL data written to the disk because WAL > buffers got full > > Isn't "was" necessary between "data" and "written"? Yes, I fixed it. > + <entry role="catalog_table_entry"><para > role="column_definition"> > + <structfield>wal_bytes</structfield> <type>bigint</type> > > Shouldn't the type of wal_bytes be numeric because the total number of > WAL bytes can exceed the range of bigint? I think that the type of > pg_stat_statements.wal_bytes is also numeric for the same reason. Thanks, I fixed it. Since I cast the type of wal_bytes from PgStat_Counter to uint64, I changed the type of PgStat_MsgWal and PgStat_WalStats too. > + <entry role="catalog_table_entry"><para > role="column_definition"> > + <structfield>wal_write_time</structfield> <type>bigint</type> > > Shouldn't the type of wal_xxx_time be double precision, > like pg_stat_database.blk_write_time? Thanks, I changed it. > Even when fsync is set to off or wal_sync_method is set to open_sync, > wal_sync is incremented. Isn't this behavior confusing? > > > + Total amount of time that has been spent in the portion of > + WAL data was written to disk by backend and walwriter, in > milliseconds > + (if <xref linkend="guc-track-io-timing"/> is enabled, otherwise > zero) > > With the patch, track_io_timing controls both IO for data files and > WAL files. But we may want to track only either of them. So it's better > to extend track_io_timing so that we can specify the tracking target > in the parameter? For example, we can make track_io_timing accept > data, wal and all. Or we should introduce new GUC for WAL, e.g., > track_wal_io_timing? Thought? OK, I introduced the new GUC "track_wal_io_timing". > I'm afraid that "by backend and walwriter" part can make us thinkg > incorrectly that WAL writes by other processes like autovacuum > are not tracked. Sorry, I removed "by backend and walwriter". Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
pgsql-hackers by date: