Re: Add statistics to pg_stat_wal view for wal related parameter tuning - Mailing list pgsql-hackers

From Masahiro Ikeda
Subject Re: Add statistics to pg_stat_wal view for wal related parameter tuning
Date
Msg-id 11e81f0b7d90dd69077d3d964a9a4da9@oss.nttdata.com
Whole thread Raw
In response to Re: Add statistics to pg_stat_wal view for wal related parameter tuning  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Add statistics to pg_stat_wal view for wal related parameter tuning
List pgsql-hackers
On 2020-11-12 14:58, Fujii Masao wrote:
> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>> On 2020-10-30 11:50, Fujii Masao wrote:
>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>> Hi,
>>>> 
>>>> Thanks for your comments and advice. I updated the patch.
>>>> 
>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>> <ikedamsh@oss.nttdata.com> wrote in
>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>> > of which is in performance-critical path especially in
>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>> > for that in your proposal.
>>>>> 
>>>>> We should avoid that duplication as possible even if the both 
>>>>> number
>>>>> are important.
>>>>> 
>>>>>> Also about performance, I thought there are few impacts because it
>>>>>> increments stats in memory. If I can implement to reuse 
>>>>>> pgWalUsage's
>>>>>> value which already collects these stats, there is no impact in
>>>>>> XLogInsertRecord.
>>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>>> value?
>>>>> 
>>>>> I don't think that works, but it would work that pgstat_send_wal()
>>>>> takes the difference of that values between two successive calls.
>>>>> 
>>>>> WalUsage prevWalUsage;
>>>>> ...
>>>>> pgstat_send_wal()
>>>>> {
>>>>> ..
>>>>>    /* fill in some values using pgWalUsage */
>>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - 
>>>>> prevWalUsage.wal_bytes;
>>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>>>> prevWalUsage.wal_records;
>>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - 
>>>>> prevWalUsage.wal_fpi;
>>>>> ...
>>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>> 
>>>>>    /* remember the current numbers */
>>>>>    prevWalUsage = pgWalUsage;
>>>> 
>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>> which is already defined and eliminates the extra overhead.
>>> 
>>> +    /* fill in some values using pgWalUsage */
>>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - 
>>> prevWalUsage.wal_bytes;
>>> +    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>> prevWalUsage.wal_records;
>>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
>>> 
>>> It's better to use WalUsageAccumDiff() here?
>> 
>> Yes, thanks. I fixed it.
>> 
>>> prevWalUsage needs to be initialized with pgWalUsage?
>>> 
>>> +                if (AmWalWriterProcess()){
>>> +                    WalStats.m_wal_write_walwriter++;
>>> +                }
>>> +                else
>>> +                {
>>> +                    WalStats.m_wal_write_backend++;
>>> +                }
>>> 
>>> I think that it's better not to separate m_wal_write_xxx into two for
>>> walwriter and other processes. Instead, we can use one 
>>> m_wal_write_xxx
>>> counter and make pgstat_send_wal() send also the process type to
>>> the stats collector. Then the stats collector can accumulate the 
>>> counters
>>> per process type if necessary. If we adopt this approach, we can 
>>> easily
>>> extend pg_stat_wal so that any fields can be reported per process 
>>> type.
>> 
>> I'll remove the above source code because these counters are not 
>> useful.
>> 
>> 
>> On 2020-10-30 12:00, Fujii Masao wrote:
>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>> Hi,
>>>> 
>>>> I think we need to add some statistics to pg_stat_wal view.
>>>> 
>>>> Although there are some parameter related WAL,
>>>> there are few statistics for tuning them.
>>>> 
>>>> I think it's better to provide the following statistics.
>>>> Please let me know your comments.
>>>> 
>>>> ```
>>>> postgres=# SELECT * from pg_stat_wal;
>>>> -[ RECORD 1 ]-------+------------------------------
>>>> wal_records         | 2000224
>>>> wal_fpi             | 47
>>>> wal_bytes           | 248216337
>>>> wal_buffers_full    | 20954
>>>> wal_init_file       | 8
>>>> wal_write_backend   | 20960
>>>> wal_write_walwriter | 46
>>>> wal_write_time      | 51
>>>> wal_sync_backend    | 7
>>>> wal_sync_walwriter  | 8
>>>> wal_sync_time       | 0
>>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>>> ```
>>>> 
>>>> 1. Basic statistics of WAL activity
>>>> 
>>>> - wal_records: Total number of WAL records generated
>>>> - wal_fpi: Total number of WAL full page images generated
>>>> - wal_bytes: Total amount of WAL bytes generated
>>>> 
>>>> To understand DB's performance, first, we will check the performance
>>>> trends for the entire database instance.
>>>> For example, if the number of wal_fpi becomes higher, users may tune
>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>> 
>>>> Although users can check the above statistics via EXPLAIN, 
>>>> auto_explain,
>>>> autovacuum and pg_stat_statements now,
>>>> if users want to see the performance trends  for the entire 
>>>> database,
>>>> they must recalculate the statistics.
>>>> 
>>>> I think it is useful to add the sum of the basic statistics.
>>>> 
>>>> 
>>>> 2.  WAL segment file creation
>>>> 
>>>> - wal_init_file: Total number of WAL segment files created.
>>>> 
>>>> To create a new WAL file may have an impact on the performance of
>>>> a write-heavy workload generating lots of WAL. If this number is 
>>>> reported high,
>>>> to reduce the number of this initialization, we can tune WAL-related 
>>>> parameters
>>>> so that more "recycled" WAL files can be held.
>>>> 
>>>> 
>>>> 
>>>> 3. Number of when WAL is flushed
>>>> 
>>>> - wal_write_backend : Total number of WAL data written to the disk 
>>>> by backends
>>>> - wal_write_walwriter : Total number of WAL data written to the disk 
>>>> by walwriter
>>>> - wal_sync_backend : Total number of WAL data synced to the disk by 
>>>> backends
>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk 
>>>> by walwrite
>>>> 
>>>> I think it's useful for tuning "synchronous_commit" and 
>>>> "commit_delay" for query executions.
>>>> If the number of WAL is flushed is high, users can know 
>>>> "synchronous_commit" is useful for the workload.
>>> 
>>> I just wonder how useful these counters are. Even without these 
>>> counters,
>>> we already know synchronous_commit=off is likely to cause the better
>>> performance (but has the risk of data loss). So ISTM that these 
>>> counters are
>>> not so useful when tuning synchronous_commit.
>> 
>> Thanks, my understanding was wrong.
>> I agreed that your comments.
>> 
>> I merged the statistics of *_backend and *_walwriter.
>> I think the sum of them is useful to calculate the average per 
>> write/sync time.
>> For example, per write time is equals wal_write_time / wal_write.
> 
> Understood.
> 
> Thanks for updating the patch!

Thanks for your comments.

> patching file src/include/catalog/pg_proc.dat
> Hunk #1 FAILED at 5491.
> 1 out of 1 hunk FAILED -- saving rejects to file
> src/include/catalog/pg_proc.dat.rej
> 
> I got this failure when applying the patch. Could you update the patch?

Thanks, I updated the patch.

> -       Number of times WAL data was written to the disk because WAL
> buffers got full
> +       Total number of times WAL data written to the disk because WAL
> buffers got full
> 
> Isn't "was" necessary between "data" and "written"?

Yes, I fixed it.

> +      <entry role="catalog_table_entry"><para 
> role="column_definition">
> +       <structfield>wal_bytes</structfield> <type>bigint</type>
> 
> Shouldn't the type of wal_bytes be numeric because the total number of
> WAL bytes can exceed the range of bigint? I think that the type of
> pg_stat_statements.wal_bytes is also numeric for the same reason.

Thanks, I fixed it.

Since I cast the type of wal_bytes from PgStat_Counter to uint64,
I changed the type of PgStat_MsgWal and PgStat_WalStats too.

> +      <entry role="catalog_table_entry"><para 
> role="column_definition">
> +       <structfield>wal_write_time</structfield> <type>bigint</type>
> 
> Shouldn't the type of wal_xxx_time be double precision,
> like pg_stat_database.blk_write_time?

Thanks, I changed it.

> Even when fsync is set to off or wal_sync_method is set to open_sync,
> wal_sync is incremented. Isn't this behavior confusing?
> 
> 
> +       Total amount of time that has been spent in the portion of
> +       WAL data was written to disk by backend and walwriter, in 
> milliseconds
> +       (if <xref linkend="guc-track-io-timing"/> is enabled, otherwise 
> zero)
> 
> With the patch, track_io_timing controls both IO for data files and
> WAL files. But we may want to track only either of them. So it's better
> to extend track_io_timing so that we can specify the tracking target
> in the parameter? For example, we can make track_io_timing accept
> data, wal and all. Or we should introduce new GUC for WAL, e.g.,
> track_wal_io_timing? Thought?

OK, I introduced the new GUC "track_wal_io_timing".

> I'm afraid that "by backend and walwriter" part can make us thinkg
> incorrectly that WAL writes by other processes like autovacuum
> are not tracked.

Sorry, I removed "by backend and walwriter".

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

pgsql-hackers by date:

Previous
From: Masahiro Ikeda
Date:
Subject: Re: Add statistics to pg_stat_wal view for wal related parameter tuning
Next
From: "Drouvot, Bertrand"
Date:
Subject: Re: Add Information during standby recovery conflicts