Re: Add statistics to pg_stat_wal view for wal related parameter tuning - Mailing list pgsql-hackers

From Masahiro Ikeda
Subject Re: Add statistics to pg_stat_wal view for wal related parameter tuning
Date
Msg-id 89de5056627ebdd144967959bd789bd6@oss.nttdata.com
Whole thread Raw
In response to Re: Add statistics to pg_stat_wal view for wal related parameter tuning  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Add statistics to pg_stat_wal view for wal related parameter tuning  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
On 2020-11-17 11:46, Fujii Masao wrote:
> On 2020/11/16 16:35, Masahiro Ikeda wrote:
>> On 2020-11-12 14:58, Fujii Masao wrote:
>>> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>>>> On 2020-10-30 11:50, Fujii Masao wrote:
>>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> Thanks for your comments and advice. I updated the patch.
>>>>>> 
>>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>>>> <ikedamsh@oss.nttdata.com> wrote in
>>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>>>> > of which is in performance-critical path especially in
>>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>>>> > for that in your proposal.
>>>>>>> 
>>>>>>> We should avoid that duplication as possible even if the both 
>>>>>>> number
>>>>>>> are important.
>>>>>>> 
>>>>>>>> Also about performance, I thought there are few impacts because 
>>>>>>>> it
>>>>>>>> increments stats in memory. If I can implement to reuse 
>>>>>>>> pgWalUsage's
>>>>>>>> value which already collects these stats, there is no impact in
>>>>>>>> XLogInsertRecord.
>>>>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>>>>> value?
>>>>>>> 
>>>>>>> I don't think that works, but it would work that 
>>>>>>> pgstat_send_wal()
>>>>>>> takes the difference of that values between two successive calls.
>>>>>>> 
>>>>>>> WalUsage prevWalUsage;
>>>>>>> ...
>>>>>>> pgstat_send_wal()
>>>>>>> {
>>>>>>> ..
>>>>>>>    /* fill in some values using pgWalUsage */
>>>>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - 
>>>>>>> prevWalUsage.wal_bytes;
>>>>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>>>>>> prevWalUsage.wal_records;
>>>>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - 
>>>>>>> prevWalUsage.wal_fpi;
>>>>>>> ...
>>>>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>>>> 
>>>>>>>    /* remember the current numbers */
>>>>>>>    prevWalUsage = pgWalUsage;
>>>>>> 
>>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>>>> which is already defined and eliminates the extra overhead.
>>>>> 
>>>>> +    /* fill in some values using pgWalUsage */
>>>>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - 
>>>>> prevWalUsage.wal_bytes;
>>>>> +    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>>>> prevWalUsage.wal_records;
>>>>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - 
>>>>> prevWalUsage.wal_fpi;
>>>>> 
>>>>> It's better to use WalUsageAccumDiff() here?
>>>> 
>>>> Yes, thanks. I fixed it.
>>>> 
>>>>> prevWalUsage needs to be initialized with pgWalUsage?
>>>>> 
>>>>> +                if (AmWalWriterProcess()){
>>>>> +                    WalStats.m_wal_write_walwriter++;
>>>>> +                }
>>>>> +                else
>>>>> +                {
>>>>> +                    WalStats.m_wal_write_backend++;
>>>>> +                }
>>>>> 
>>>>> I think that it's better not to separate m_wal_write_xxx into two 
>>>>> for
>>>>> walwriter and other processes. Instead, we can use one 
>>>>> m_wal_write_xxx
>>>>> counter and make pgstat_send_wal() send also the process type to
>>>>> the stats collector. Then the stats collector can accumulate the 
>>>>> counters
>>>>> per process type if necessary. If we adopt this approach, we can 
>>>>> easily
>>>>> extend pg_stat_wal so that any fields can be reported per process 
>>>>> type.
>>>> 
>>>> I'll remove the above source code because these counters are not 
>>>> useful.
>>>> 
>>>> 
>>>> On 2020-10-30 12:00, Fujii Masao wrote:
>>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I think we need to add some statistics to pg_stat_wal view.
>>>>>> 
>>>>>> Although there are some parameter related WAL,
>>>>>> there are few statistics for tuning them.
>>>>>> 
>>>>>> I think it's better to provide the following statistics.
>>>>>> Please let me know your comments.
>>>>>> 
>>>>>> ```
>>>>>> postgres=# SELECT * from pg_stat_wal;
>>>>>> -[ RECORD 1 ]-------+------------------------------
>>>>>> wal_records         | 2000224
>>>>>> wal_fpi             | 47
>>>>>> wal_bytes           | 248216337
>>>>>> wal_buffers_full    | 20954
>>>>>> wal_init_file       | 8
>>>>>> wal_write_backend   | 20960
>>>>>> wal_write_walwriter | 46
>>>>>> wal_write_time      | 51
>>>>>> wal_sync_backend    | 7
>>>>>> wal_sync_walwriter  | 8
>>>>>> wal_sync_time       | 0
>>>>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>>>>> ```
>>>>>> 
>>>>>> 1. Basic statistics of WAL activity
>>>>>> 
>>>>>> - wal_records: Total number of WAL records generated
>>>>>> - wal_fpi: Total number of WAL full page images generated
>>>>>> - wal_bytes: Total amount of WAL bytes generated
>>>>>> 
>>>>>> To understand DB's performance, first, we will check the 
>>>>>> performance
>>>>>> trends for the entire database instance.
>>>>>> For example, if the number of wal_fpi becomes higher, users may 
>>>>>> tune
>>>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>>>> 
>>>>>> Although users can check the above statistics via EXPLAIN, 
>>>>>> auto_explain,
>>>>>> autovacuum and pg_stat_statements now,
>>>>>> if users want to see the performance trends  for the entire 
>>>>>> database,
>>>>>> they must recalculate the statistics.
>>>>>> 
>>>>>> I think it is useful to add the sum of the basic statistics.
>>>>>> 
>>>>>> 
>>>>>> 2.  WAL segment file creation
>>>>>> 
>>>>>> - wal_init_file: Total number of WAL segment files created.
>>>>>> 
>>>>>> To create a new WAL file may have an impact on the performance of
>>>>>> a write-heavy workload generating lots of WAL. If this number is 
>>>>>> reported high,
>>>>>> to reduce the number of this initialization, we can tune 
>>>>>> WAL-related parameters
>>>>>> so that more "recycled" WAL files can be held.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 3. Number of when WAL is flushed
>>>>>> 
>>>>>> - wal_write_backend : Total number of WAL data written to the disk 
>>>>>> by backends
>>>>>> - wal_write_walwriter : Total number of WAL data written to the 
>>>>>> disk by walwriter
>>>>>> - wal_sync_backend : Total number of WAL data synced to the disk 
>>>>>> by backends
>>>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk 
>>>>>> by walwrite
>>>>>> 
>>>>>> I think it's useful for tuning "synchronous_commit" and 
>>>>>> "commit_delay" for query executions.
>>>>>> If the number of WAL is flushed is high, users can know 
>>>>>> "synchronous_commit" is useful for the workload.
>>>>> 
>>>>> I just wonder how useful these counters are. Even without these 
>>>>> counters,
>>>>> we already know synchronous_commit=off is likely to cause the 
>>>>> better
>>>>> performance (but has the risk of data loss). So ISTM that these 
>>>>> counters are
>>>>> not so useful when tuning synchronous_commit.
>>>> 
>>>> Thanks, my understanding was wrong.
>>>> I agreed that your comments.
>>>> 
>>>> I merged the statistics of *_backend and *_walwriter.
>>>> I think the sum of them is useful to calculate the average per 
>>>> write/sync time.
>>>> For example, per write time is equals wal_write_time / wal_write.
>>> 
>>> Understood.
>>> 
>>> Thanks for updating the patch!
>> 
>> Thanks for your comments.
>> 
>>> patching file src/include/catalog/pg_proc.dat
>>> Hunk #1 FAILED at 5491.
>>> 1 out of 1 hunk FAILED -- saving rejects to file
>>> src/include/catalog/pg_proc.dat.rej
>>> 
>>> I got this failure when applying the patch. Could you update the 
>>> patch?
>> 
>> Thanks, I updated the patch.
>> 
>>> -       Number of times WAL data was written to the disk because WAL
>>> buffers got full
>>> +       Total number of times WAL data written to the disk because 
>>> WAL
>>> buffers got full
>>> 
>>> Isn't "was" necessary between "data" and "written"?
>> 
>> Yes, I fixed it.
>> 
>>> +      <entry role="catalog_table_entry"><para 
>>> role="column_definition">
>>> +       <structfield>wal_bytes</structfield> <type>bigint</type>
>>> 
>>> Shouldn't the type of wal_bytes be numeric because the total number 
>>> of
>>> WAL bytes can exceed the range of bigint? I think that the type of
>>> pg_stat_statements.wal_bytes is also numeric for the same reason.
>> 
>> Thanks, I fixed it.
>> 
>> Since I cast the type of wal_bytes from PgStat_Counter to uint64,
>> I changed the type of PgStat_MsgWal and PgStat_WalStats too.
>> 
>>> +      <entry role="catalog_table_entry"><para 
>>> role="column_definition">
>>> +       <structfield>wal_write_time</structfield> <type>bigint</type>
>>> 
>>> Shouldn't the type of wal_xxx_time be double precision,
>>> like pg_stat_database.blk_write_time?
>> 
>> Thanks, I changed it.
>> 
>>> Even when fsync is set to off or wal_sync_method is set to open_sync,
>>> wal_sync is incremented. Isn't this behavior confusing?
> 
> What do you think about this comment?

Sorry, I'll change to increment wal_sync and wal_sync_time only
if a specific fsync method is called.

> I found that we discussed track-WAL-IO-timing feature at the past 
> discussion
> about the similar feature [1]. But the feature was droppped from the 
> proposal
> patch because there was the performance concern. So probably we need to
> revisit the past discussion and benchmark the performance. Thought?
> 
> If track-WAL-IO-timing feature may cause performance regression,
> it might be an idea to extract wal_records, wal_fpi and wal_bytes parts
> from the patch and commit it at first.
> 
> [1]
> https://postgr.es/m/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST+vwJcFtCSCEySnA@mail.gmail.com

Thanks, I'll check the thread.
I agree to add basic statistics at first and I attached the patch.

>>> 
>>> 
>>> +       Total amount of time that has been spent in the portion of
>>> +       WAL data was written to disk by backend and walwriter, in 
>>> milliseconds
>>> +       (if <xref linkend="guc-track-io-timing"/> is enabled, 
>>> otherwise zero)
>>> 
>>> With the patch, track_io_timing controls both IO for data files and
>>> WAL files. But we may want to track only either of them. So it's 
>>> better
>>> to extend track_io_timing so that we can specify the tracking target
>>> in the parameter? For example, we can make track_io_timing accept
>>> data, wal and all. Or we should introduce new GUC for WAL, e.g.,
>>> track_wal_io_timing? Thought?
>> 
>> OK, I introduced the new GUC "track_wal_io_timing".
>> 
>>> I'm afraid that "by backend and walwriter" part can make us thinkg
>>> incorrectly that WAL writes by other processes like autovacuum
>>> are not tracked.
>> 
>> Sorry, I removed "by backend and walwriter".
> 
> Thanks for updating the patch!
> 
> +WalUsage prevWalUsage;
> 
> ISTM that we can declare this as static variable because
> it's used only in pgstat.c.

Thanks, I fixed it.

> +    memset(&walusage, 0, sizeof(WalUsage));
> +    WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);
> 
> This memset seems unnecessary.

I couldn't understand why this memset is unnecessary.
Since WalUsageAccumDiff not only calculates the difference but also adds 
the value,
I thought walusage needs to be initialized.


>      /* We assume this initializes to zeroes */
>      static const PgStat_MsgWal all_zeroes;
> 
> This declaration of the variable should be placed around
> the top of pgstat_send_wal().

Sorry, I fixed it.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

pgsql-hackers by date:

Previous
From: Pavel Borisov
Date:
Subject: Re: Is postgres ready for 2038?
Next
From: Tom Lane
Date:
Subject: Re: Is postgres ready for 2038?