Re: Add statistics to pg_stat_wal view for wal related parameter tuning - Mailing list pgsql-hackers

From Masahiro Ikeda
Subject Re: Add statistics to pg_stat_wal view for wal related parameter tuning
Date
Msg-id fff4b32ce3c4f33aeca5787c81304cfc@oss.nttdata.com
Whole thread Raw
In response to Re: Add statistics to pg_stat_wal view for wal related parameter tuning  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
On 2020-11-12 16:27, Fujii Masao wrote:
> On 2020/11/12 14:58, Fujii Masao wrote:
>> 
>> 
>> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>>> On 2020-10-30 11:50, Fujii Masao wrote:
>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>>> Hi,
>>>>> 
>>>>> Thanks for your comments and advice. I updated the patch.
>>>>> 
>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>>> <ikedamsh@oss.nttdata.com> wrote in
>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>>> > of which is in performance-critical path especially in
>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>>> > for that in your proposal.
>>>>>> 
>>>>>> We should avoid that duplication as possible even if the both 
>>>>>> number
>>>>>> are important.
>>>>>> 
>>>>>>> Also about performance, I thought there are few impacts because 
>>>>>>> it
>>>>>>> increments stats in memory. If I can implement to reuse 
>>>>>>> pgWalUsage's
>>>>>>> value which already collects these stats, there is no impact in
>>>>>>> XLogInsertRecord.
>>>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>>>> value?
>>>>>> 
>>>>>> I don't think that works, but it would work that pgstat_send_wal()
>>>>>> takes the difference of that values between two successive calls.
>>>>>> 
>>>>>> WalUsage prevWalUsage;
>>>>>> ...
>>>>>> pgstat_send_wal()
>>>>>> {
>>>>>> ..
>>>>>>    /* fill in some values using pgWalUsage */
>>>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - 
>>>>>> prevWalUsage.wal_bytes;
>>>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>>>>> prevWalUsage.wal_records;
>>>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - 
>>>>>> prevWalUsage.wal_fpi;
>>>>>> ...
>>>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>>> 
>>>>>>    /* remember the current numbers */
>>>>>>    prevWalUsage = pgWalUsage;
>>>>> 
>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>>> which is already defined and eliminates the extra overhead.
>>>> 
>>>> +    /* fill in some values using pgWalUsage */
>>>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - 
>>>> prevWalUsage.wal_bytes;
>>>> +    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>>> prevWalUsage.wal_records;
>>>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
>>>> 
>>>> It's better to use WalUsageAccumDiff() here?
>>> 
>>> Yes, thanks. I fixed it.
>>> 
>>>> prevWalUsage needs to be initialized with pgWalUsage?
>>>> 
>>>> +                if (AmWalWriterProcess()){
>>>> +                    WalStats.m_wal_write_walwriter++;
>>>> +                }
>>>> +                else
>>>> +                {
>>>> +                    WalStats.m_wal_write_backend++;
>>>> +                }
>>>> 
>>>> I think that it's better not to separate m_wal_write_xxx into two 
>>>> for
>>>> walwriter and other processes. Instead, we can use one 
>>>> m_wal_write_xxx
>>>> counter and make pgstat_send_wal() send also the process type to
>>>> the stats collector. Then the stats collector can accumulate the 
>>>> counters
>>>> per process type if necessary. If we adopt this approach, we can 
>>>> easily
>>>> extend pg_stat_wal so that any fields can be reported per process 
>>>> type.
>>> 
>>> I'll remove the above source code because these counters are not 
>>> useful.
>>> 
>>> 
>>> On 2020-10-30 12:00, Fujii Masao wrote:
>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>>> Hi,
>>>>> 
>>>>> I think we need to add some statistics to pg_stat_wal view.
>>>>> 
>>>>> Although there are some parameter related WAL,
>>>>> there are few statistics for tuning them.
>>>>> 
>>>>> I think it's better to provide the following statistics.
>>>>> Please let me know your comments.
>>>>> 
>>>>> ```
>>>>> postgres=# SELECT * from pg_stat_wal;
>>>>> -[ RECORD 1 ]-------+------------------------------
>>>>> wal_records         | 2000224
>>>>> wal_fpi             | 47
>>>>> wal_bytes           | 248216337
>>>>> wal_buffers_full    | 20954
>>>>> wal_init_file       | 8
>>>>> wal_write_backend   | 20960
>>>>> wal_write_walwriter | 46
>>>>> wal_write_time      | 51
>>>>> wal_sync_backend    | 7
>>>>> wal_sync_walwriter  | 8
>>>>> wal_sync_time       | 0
>>>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>>>> ```
>>>>> 
>>>>> 1. Basic statistics of WAL activity
>>>>> 
>>>>> - wal_records: Total number of WAL records generated
>>>>> - wal_fpi: Total number of WAL full page images generated
>>>>> - wal_bytes: Total amount of WAL bytes generated
>>>>> 
>>>>> To understand DB's performance, first, we will check the 
>>>>> performance
>>>>> trends for the entire database instance.
>>>>> For example, if the number of wal_fpi becomes higher, users may 
>>>>> tune
>>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>>> 
>>>>> Although users can check the above statistics via EXPLAIN, 
>>>>> auto_explain,
>>>>> autovacuum and pg_stat_statements now,
>>>>> if users want to see the performance trends  for the entire 
>>>>> database,
>>>>> they must recalculate the statistics.
>>>>> 
>>>>> I think it is useful to add the sum of the basic statistics.
>>>>> 
>>>>> 
>>>>> 2.  WAL segment file creation
>>>>> 
>>>>> - wal_init_file: Total number of WAL segment files created.
>>>>> 
>>>>> To create a new WAL file may have an impact on the performance of
>>>>> a write-heavy workload generating lots of WAL. If this number is 
>>>>> reported high,
>>>>> to reduce the number of this initialization, we can tune 
>>>>> WAL-related parameters
>>>>> so that more "recycled" WAL files can be held.
>>>>> 
>>>>> 
>>>>> 
>>>>> 3. Number of when WAL is flushed
>>>>> 
>>>>> - wal_write_backend : Total number of WAL data written to the disk 
>>>>> by backends
>>>>> - wal_write_walwriter : Total number of WAL data written to the 
>>>>> disk by walwriter
>>>>> - wal_sync_backend : Total number of WAL data synced to the disk by 
>>>>> backends
>>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk 
>>>>> by walwrite
>>>>> 
>>>>> I think it's useful for tuning "synchronous_commit" and 
>>>>> "commit_delay" for query executions.
>>>>> If the number of WAL is flushed is high, users can know 
>>>>> "synchronous_commit" is useful for the workload.
>>>> 
>>>> I just wonder how useful these counters are. Even without these 
>>>> counters,
>>>> we already know synchronous_commit=off is likely to cause the better
>>>> performance (but has the risk of data loss). So ISTM that these 
>>>> counters are
>>>> not so useful when tuning synchronous_commit.
>>> 
>>> Thanks, my understanding was wrong.
>>> I agreed that your comments.
>>> 
>>> I merged the statistics of *_backend and *_walwriter.
>>> I think the sum of them is useful to calculate the average per 
>>> write/sync time.
>>> For example, per write time is equals wal_write_time / wal_write.
>> 
>> Understood.
>> 
>> Thanks for updating the patch!
>> 
>> patching file src/include/catalog/pg_proc.dat
>> Hunk #1 FAILED at 5491.
>> 1 out of 1 hunk FAILED -- saving rejects to file 
>> src/include/catalog/pg_proc.dat.rej
>> 
>> I got this failure when applying the patch. Could you update the 
>> patch?
>> 
>> 
>> -       Number of times WAL data was written to the disk because WAL 
>> buffers got full
>> +       Total number of times WAL data written to the disk because WAL 
>> buffers got full
>> 
>> Isn't "was" necessary between "data" and "written"?
>> 
>> 
>> +      <entry role="catalog_table_entry"><para 
>> role="column_definition">
>> +       <structfield>wal_bytes</structfield> <type>bigint</type>
>> 
>> Shouldn't the type of wal_bytes be numeric because the total number of
>> WAL bytes can exceed the range of bigint? I think that the type of
>> pg_stat_statements.wal_bytes is also numeric for the same reason.
>> 
>> 
>> +      <entry role="catalog_table_entry"><para 
>> role="column_definition">
>> +       <structfield>wal_write_time</structfield> <type>bigint</type>
>> 
>> Shouldn't the type of wal_xxx_time be double precision,
>> like pg_stat_database.blk_write_time?
>> 
>> 
>> Even when fsync is set to off or wal_sync_method is set to open_sync,
>> wal_sync is incremented. Isn't this behavior confusing?
>> 
>> 
>> +       Total amount of time that has been spent in the portion of
>> +       WAL data was written to disk by backend and walwriter, in 
>> milliseconds
>> +       (if <xref linkend="guc-track-io-timing"/> is enabled, 
>> otherwise zero)
>> 
>> With the patch, track_io_timing controls both IO for data files and
>> WAL files. But we may want to track only either of them. So it's 
>> better
>> to extend track_io_timing so that we can specify the tracking target
>> in the parameter? For example, we can make track_io_timing accept
>> data, wal and all. Or we should introduce new GUC for WAL, e.g.,
>> track_wal_io_timing? Thought?
>> 
>> I'm afraid that "by backend and walwriter" part can make us thinkg
>> incorrectly that WAL writes by other processes like autovacuum
>> are not tracked.
> 
>  pgstat_send_wal(void)
>  {
> +    /* fill in some values using pgWalUsage */
> +    WalUsage walusage;
> +    memset(&walusage, 0, sizeof(WalUsage));
> +    WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);
> 
> At the first call to pgstat_send_wal(), prevWalUsage has not been set 
> to
> the previous value of pgWalUsage. So the calculation result of
> WalUsageAccumDiff() can be incorrect. To address this issue,
> prevWalUsage should be set to pgWalUsage or both should be initialized
> with 0 at the beginning of the process, for example?

I forgot to handle it, thanks.
Although I initialized it in pgstat_initialize(),
if there is better way please let me know.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Ajin Cherian
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Next
From: Masahiro Ikeda
Date:
Subject: Re: Add statistics to pg_stat_wal view for wal related parameter tuning