Re: Add statistics to pg_stat_wal view for wal related parameter tuning - Mailing list pgsql-hackers

From Masahiro Ikeda
Subject Re: Add statistics to pg_stat_wal view for wal related parameter tuning
Date
Msg-id 7eb4ebd487b53200bde98ba2f0a7e789@oss.nttdata.com
Whole thread Raw
In response to Re: Add statistics to pg_stat_wal view for wal related parameter tuning  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Add statistics to pg_stat_wal view for wal related parameter tuning
List pgsql-hackers
On 2020-11-25 20:19, Fujii Masao wrote:
> On 2020/11/19 16:31, Masahiro Ikeda wrote:
>> On 2020-11-17 11:46, Fujii Masao wrote:
>>> On 2020/11/16 16:35, Masahiro Ikeda wrote:
>>>> On 2020-11-12 14:58, Fujii Masao wrote:
>>>>> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>>>>>> On 2020-10-30 11:50, Fujii Masao wrote:
>>>>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Thanks for your comments and advice. I updated the patch.
>>>>>>>> 
>>>>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>>>>>> <ikedamsh@oss.nttdata.com> wrote in
>>>>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>>>>>> > of which is in performance-critical path especially in
>>>>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>>>>>> > for that in your proposal.
>>>>>>>>> 
>>>>>>>>> We should avoid that duplication as possible even if the both 
>>>>>>>>> number
>>>>>>>>> are important.
>>>>>>>>> 
>>>>>>>>>> Also about performance, I thought there are few impacts 
>>>>>>>>>> because it
>>>>>>>>>> increments stats in memory. If I can implement to reuse 
>>>>>>>>>> pgWalUsage's
>>>>>>>>>> value which already collects these stats, there is no impact 
>>>>>>>>>> in
>>>>>>>>>> XLogInsertRecord.
>>>>>>>>>> For example, how about pg_stat_wal() calculates the 
>>>>>>>>>> accumulated
>>>>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use 
>>>>>>>>>> pgWalUsage's
>>>>>>>>>> value?
>>>>>>>>> 
>>>>>>>>> I don't think that works, but it would work that 
>>>>>>>>> pgstat_send_wal()
>>>>>>>>> takes the difference of that values between two successive 
>>>>>>>>> calls.
>>>>>>>>> 
>>>>>>>>> WalUsage prevWalUsage;
>>>>>>>>> ...
>>>>>>>>> pgstat_send_wal()
>>>>>>>>> {
>>>>>>>>> ..
>>>>>>>>>    /* fill in some values using pgWalUsage */
>>>>>>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - 
>>>>>>>>> prevWalUsage.wal_bytes;
>>>>>>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>>>>>>>> prevWalUsage.wal_records;
>>>>>>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - 
>>>>>>>>> prevWalUsage.wal_fpi;
>>>>>>>>> ...
>>>>>>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>>>>>> 
>>>>>>>>>    /* remember the current numbers */
>>>>>>>>>    prevWalUsage = pgWalUsage;
>>>>>>>> 
>>>>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>>>>>> which is already defined and eliminates the extra overhead.
>>>>>>> 
>>>>>>> +    /* fill in some values using pgWalUsage */
>>>>>>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - 
>>>>>>> prevWalUsage.wal_bytes;
>>>>>>> +    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>>>>>> prevWalUsage.wal_records;
>>>>>>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - 
>>>>>>> prevWalUsage.wal_fpi;
>>>>>>> 
>>>>>>> It's better to use WalUsageAccumDiff() here?
>>>>>> 
>>>>>> Yes, thanks. I fixed it.
>>>>>> 
>>>>>>> prevWalUsage needs to be initialized with pgWalUsage?
>>>>>>> 
>>>>>>> +                if (AmWalWriterProcess()){
>>>>>>> +                    WalStats.m_wal_write_walwriter++;
>>>>>>> +                }
>>>>>>> +                else
>>>>>>> +                {
>>>>>>> +                    WalStats.m_wal_write_backend++;
>>>>>>> +                }
>>>>>>> 
>>>>>>> I think that it's better not to separate m_wal_write_xxx into two 
>>>>>>> for
>>>>>>> walwriter and other processes. Instead, we can use one 
>>>>>>> m_wal_write_xxx
>>>>>>> counter and make pgstat_send_wal() send also the process type to
>>>>>>> the stats collector. Then the stats collector can accumulate the 
>>>>>>> counters
>>>>>>> per process type if necessary. If we adopt this approach, we can 
>>>>>>> easily
>>>>>>> extend pg_stat_wal so that any fields can be reported per process 
>>>>>>> type.
>>>>>> 
>>>>>> I'll remove the above source code because these counters are not 
>>>>>> useful.
>>>>>> 
>>>>>> 
>>>>>> On 2020-10-30 12:00, Fujii Masao wrote:
>>>>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I think we need to add some statistics to pg_stat_wal view.
>>>>>>>> 
>>>>>>>> Although there are some parameter related WAL,
>>>>>>>> there are few statistics for tuning them.
>>>>>>>> 
>>>>>>>> I think it's better to provide the following statistics.
>>>>>>>> Please let me know your comments.
>>>>>>>> 
>>>>>>>> ```
>>>>>>>> postgres=# SELECT * from pg_stat_wal;
>>>>>>>> -[ RECORD 1 ]-------+------------------------------
>>>>>>>> wal_records         | 2000224
>>>>>>>> wal_fpi             | 47
>>>>>>>> wal_bytes           | 248216337
>>>>>>>> wal_buffers_full    | 20954
>>>>>>>> wal_init_file       | 8
>>>>>>>> wal_write_backend   | 20960
>>>>>>>> wal_write_walwriter | 46
>>>>>>>> wal_write_time      | 51
>>>>>>>> wal_sync_backend    | 7
>>>>>>>> wal_sync_walwriter  | 8
>>>>>>>> wal_sync_time       | 0
>>>>>>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>>>>>>> ```
>>>>>>>> 
>>>>>>>> 1. Basic statistics of WAL activity
>>>>>>>> 
>>>>>>>> - wal_records: Total number of WAL records generated
>>>>>>>> - wal_fpi: Total number of WAL full page images generated
>>>>>>>> - wal_bytes: Total amount of WAL bytes generated
>>>>>>>> 
>>>>>>>> To understand DB's performance, first, we will check the 
>>>>>>>> performance
>>>>>>>> trends for the entire database instance.
>>>>>>>> For example, if the number of wal_fpi becomes higher, users may 
>>>>>>>> tune
>>>>>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>>>>>> 
>>>>>>>> Although users can check the above statistics via EXPLAIN, 
>>>>>>>> auto_explain,
>>>>>>>> autovacuum and pg_stat_statements now,
>>>>>>>> if users want to see the performance trends  for the entire 
>>>>>>>> database,
>>>>>>>> they must recalculate the statistics.
>>>>>>>> 
>>>>>>>> I think it is useful to add the sum of the basic statistics.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2.  WAL segment file creation
>>>>>>>> 
>>>>>>>> - wal_init_file: Total number of WAL segment files created.
>>>>>>>> 
>>>>>>>> To create a new WAL file may have an impact on the performance 
>>>>>>>> of
>>>>>>>> a write-heavy workload generating lots of WAL. If this number is 
>>>>>>>> reported high,
>>>>>>>> to reduce the number of this initialization, we can tune 
>>>>>>>> WAL-related parameters
>>>>>>>> so that more "recycled" WAL files can be held.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 3. Number of when WAL is flushed
>>>>>>>> 
>>>>>>>> - wal_write_backend : Total number of WAL data written to the 
>>>>>>>> disk by backends
>>>>>>>> - wal_write_walwriter : Total number of WAL data written to the 
>>>>>>>> disk by walwriter
>>>>>>>> - wal_sync_backend : Total number of WAL data synced to the disk 
>>>>>>>> by backends
>>>>>>>> - wal_sync_walwriter : Total number of WAL data synced to the 
>>>>>>>> disk by walwrite
>>>>>>>> 
>>>>>>>> I think it's useful for tuning "synchronous_commit" and 
>>>>>>>> "commit_delay" for query executions.
>>>>>>>> If the number of WAL is flushed is high, users can know 
>>>>>>>> "synchronous_commit" is useful for the workload.
>>>>>>> 
>>>>>>> I just wonder how useful these counters are. Even without these 
>>>>>>> counters,
>>>>>>> we already know synchronous_commit=off is likely to cause the 
>>>>>>> better
>>>>>>> performance (but has the risk of data loss). So ISTM that these 
>>>>>>> counters are
>>>>>>> not so useful when tuning synchronous_commit.
>>>>>> 
>>>>>> Thanks, my understanding was wrong.
>>>>>> I agreed that your comments.
>>>>>> 
>>>>>> I merged the statistics of *_backend and *_walwriter.
>>>>>> I think the sum of them is useful to calculate the average per 
>>>>>> write/sync time.
>>>>>> For example, per write time is equals wal_write_time / wal_write.
>>>>> 
>>>>> Understood.
>>>>> 
>>>>> Thanks for updating the patch!
>>>> 
>>>> Thanks for your comments.
>>>> 
>>>>> patching file src/include/catalog/pg_proc.dat
>>>>> Hunk #1 FAILED at 5491.
>>>>> 1 out of 1 hunk FAILED -- saving rejects to file
>>>>> src/include/catalog/pg_proc.dat.rej
>>>>> 
>>>>> I got this failure when applying the patch. Could you update the 
>>>>> patch?
>>>> 
>>>> Thanks, I updated the patch.
>>>> 
>>>>> -       Number of times WAL data was written to the disk because 
>>>>> WAL
>>>>> buffers got full
>>>>> +       Total number of times WAL data written to the disk because 
>>>>> WAL
>>>>> buffers got full
>>>>> 
>>>>> Isn't "was" necessary between "data" and "written"?
>>>> 
>>>> Yes, I fixed it.
>>>> 
>>>>> +      <entry role="catalog_table_entry"><para 
>>>>> role="column_definition">
>>>>> +       <structfield>wal_bytes</structfield> <type>bigint</type>
>>>>> 
>>>>> Shouldn't the type of wal_bytes be numeric because the total number 
>>>>> of
>>>>> WAL bytes can exceed the range of bigint? I think that the type of
>>>>> pg_stat_statements.wal_bytes is also numeric for the same reason.
>>>> 
>>>> Thanks, I fixed it.
>>>> 
>>>> Since I cast the type of wal_bytes from PgStat_Counter to uint64,
>>>> I changed the type of PgStat_MsgWal and PgStat_WalStats too.
>>>> 
>>>>> +      <entry role="catalog_table_entry"><para 
>>>>> role="column_definition">
>>>>> +       <structfield>wal_write_time</structfield> 
>>>>> <type>bigint</type>
>>>>> 
>>>>> Shouldn't the type of wal_xxx_time be double precision,
>>>>> like pg_stat_database.blk_write_time?
>>>> 
>>>> Thanks, I changed it.
>>>> 
>>>>> Even when fsync is set to off or wal_sync_method is set to 
>>>>> open_sync,
>>>>> wal_sync is incremented. Isn't this behavior confusing?
>>> 
>>> What do you think about this comment?
>> 
>> Sorry, I'll change to increment wal_sync and wal_sync_time only
>> if a specific fsync method is called.
>> 
>>> I found that we discussed track-WAL-IO-timing feature at the past 
>>> discussion
>>> about the similar feature [1]. But the feature was droppped from the 
>>> proposal
>>> patch because there was the performance concern. So probably we need 
>>> to
>>> revisit the past discussion and benchmark the performance. Thought?
>>> 
>>> If track-WAL-IO-timing feature may cause performance regression,
>>> it might be an idea to extract wal_records, wal_fpi and wal_bytes 
>>> parts
>>> from the patch and commit it at first.
>>> 
>>> [1]
>>> https://postgr.es/m/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST+vwJcFtCSCEySnA@mail.gmail.com
>> 
>> Thanks, I'll check the thread.
>> I agree to add basic statistics at first and I attached the patch.
> 
> Thanks!
> 
> +        /* Send WAL statistics */
> +        pgstat_send_wal();
> 
> This is not necessary because walwriter generates no WAL data?

No, it's not necessary.
Thanks. I fixed it.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

pgsql-hackers by date:

Previous
From: Luc Vlaming
Date:
Subject: Re: Multi Inserts in CREATE TABLE AS - revived patch
Next
From: Julien Rouhaud
Date:
Subject: Have collation versioning to ignore hash and similar AM