Thread: Add statistics to pg_stat_wal view for wal related parameter tuning

Add statistics to pg_stat_wal view for wal related parameter tuning

From
Masahiro Ikeda
Date:
Hi,

I think we need to add some statistics to pg_stat_wal view.

Although there are some parameter related WAL,
there are few statistics for tuning them.

I think it's better to provide the following statistics.
Please let me know your comments.

```
postgres=# SELECT * from pg_stat_wal;
-[ RECORD 1 ]-------+------------------------------
wal_records         | 2000224
wal_fpi             | 47
wal_bytes           | 248216337
wal_buffers_full    | 20954
wal_init_file       | 8
wal_write_backend   | 20960
wal_write_walwriter | 46
wal_write_time      | 51
wal_sync_backend    | 7
wal_sync_walwriter  | 8
wal_sync_time       | 0
stats_reset         | 2020-10-20 11:04:51.307771+09
```

1. Basic statistics of WAL activity

- wal_records: Total number of WAL records generated
- wal_fpi: Total number of WAL full page images generated
- wal_bytes: Total amount of WAL bytes generated

To understand DB's performance, first, we will check the performance
trends for the entire database instance.
For example, if the number of wal_fpi becomes higher, users may tune
"wal_compression", "checkpoint_timeout" and so on.

Although users can check the above statistics via EXPLAIN, auto_explain,
autovacuum and pg_stat_statements now,
if users want to see the performance trends  for the entire database,
they must recalculate the statistics.

I think it is useful to add the sum of the basic statistics.


2.  WAL segment file creation

- wal_init_file: Total number of WAL segment files created.

To create a new WAL file may have an impact on the performance of
a write-heavy workload generating lots of WAL. If this number is 
reported high,
to reduce the number of this initialization, we can tune WAL-related 
parameters
so that more "recycled" WAL files can be held.



3. Number of when WAL is flushed

- wal_write_backend : Total number of WAL data written to the disk by 
backends
- wal_write_walwriter : Total number of WAL data written to the disk by 
walwriter
- wal_sync_backend : Total number of WAL data synced to the disk by 
backends
- wal_sync_walwriter : Total number of WAL data synced to the disk by 
walwrite

I think it's useful for tuning "synchronous_commit" and "commit_delay" 
for query executions.
If the number of WAL is flushed is high, users can know  
"synchronous_commit" is useful for the workload.

Also, it's useful for tuning "wal_writer_delay" and  
"wal_writer_flush_after" for wal writer.
If the number is high, users can change the parameter for performance.


4.  Wait time when WAL is flushed

- wal_write_time : Total amount of time that has been spent in the 
portion of
                                  WAL data was written to disk by backend 
and walwriter, in milliseconds
                                 (if track-io-timing is enabled, 
otherwise zero.)
- wal_sync_time : Total amount of time that has been spent in the 
portion of
                                  WAL data was synced to disk by backend 
and walwriter, in milliseconds
                                 (if track-io-timing is enabled, 
otherwise zero.)

If the time becomes much higher, users can detect the possibility of 
disk failure.
Since users can see how much flush time occupies of the query execution 
time,
it may lead to query tuning and so on.


Best Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Amit Kapila
Date:
On Tue, Oct 20, 2020 at 8:01 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>
> Hi,
>
> I think we need to add some statistics to pg_stat_wal view.
>
> Although there are some parameter related WAL,
> there are few statistics for tuning them.
>
> I think it's better to provide the following statistics.
> Please let me know your comments.
>
> ```
> postgres=# SELECT * from pg_stat_wal;
> -[ RECORD 1 ]-------+------------------------------
> wal_records         | 2000224
> wal_fpi             | 47
> wal_bytes           | 248216337
> wal_buffers_full    | 20954
> wal_init_file       | 8
> wal_write_backend   | 20960
> wal_write_walwriter | 46
> wal_write_time      | 51
> wal_sync_backend    | 7
> wal_sync_walwriter  | 8
> wal_sync_time       | 0
> stats_reset         | 2020-10-20 11:04:51.307771+09
> ```
>
> 1. Basic statistics of WAL activity
>
> - wal_records: Total number of WAL records generated
> - wal_fpi: Total number of WAL full page images generated
> - wal_bytes: Total amount of WAL bytes generated
>
> To understand DB's performance, first, we will check the performance
> trends for the entire database instance.
> For example, if the number of wal_fpi becomes higher, users may tune
> "wal_compression", "checkpoint_timeout" and so on.
>
> Although users can check the above statistics via EXPLAIN, auto_explain,
> autovacuum and pg_stat_statements now,
> if users want to see the performance trends  for the entire database,
> they must recalculate the statistics.
>

Here, do you mean to say 'entire cluster' instead of 'entire database'
because it seems these stats are getting collected for the entire
cluster?

> I think it is useful to add the sum of the basic statistics.
>

There is an argument that it is better to view these stats at the
statement-level so that one can know which statements are causing most
WAL and then try to rate-limit them if required in the application and
anyway they can get the aggregate of all the WAL if they want. We have
added these stats in PG-13, so do we have any evidence that the
already added stats don't provide enough information? I understand
that you are trying to display the accumulated stats here which if
required users/DBA need to compute with the currently provided stats.
OTOH, sometimes adding more ways to do some things causes difficulty
for users to understand and learn.

I see that we also need to add extra code to capture these stats (some
of which is in performance-critical path especially in
XLogInsertRecord) which again makes me a bit uncomfortable. It might
be that it is all fine as it is very important to collect these stats
at cluster-level in spite that the same information can be gathered at
statement-level to help customers but I don't see a very strong case
for that in your proposal.

-- 
With Regards,
Amit Kapila.



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Masahiro Ikeda
Date:
On 2020-10-20 12:46, Amit Kapila wrote:
> On Tue, Oct 20, 2020 at 8:01 AM Masahiro Ikeda 
> <ikedamsh@oss.nttdata.com> wrote:
>> 
>> Hi,
>> 
>> I think we need to add some statistics to pg_stat_wal view.
>> 
>> Although there are some parameter related WAL,
>> there are few statistics for tuning them.
>> 
>> I think it's better to provide the following statistics.
>> Please let me know your comments.
>> 
>> ```
>> postgres=# SELECT * from pg_stat_wal;
>> -[ RECORD 1 ]-------+------------------------------
>> wal_records         | 2000224
>> wal_fpi             | 47
>> wal_bytes           | 248216337
>> wal_buffers_full    | 20954
>> wal_init_file       | 8
>> wal_write_backend   | 20960
>> wal_write_walwriter | 46
>> wal_write_time      | 51
>> wal_sync_backend    | 7
>> wal_sync_walwriter  | 8
>> wal_sync_time       | 0
>> stats_reset         | 2020-10-20 11:04:51.307771+09
>> ```
>> 
>> 1. Basic statistics of WAL activity
>> 
>> - wal_records: Total number of WAL records generated
>> - wal_fpi: Total number of WAL full page images generated
>> - wal_bytes: Total amount of WAL bytes generated
>> 
>> To understand DB's performance, first, we will check the performance
>> trends for the entire database instance.
>> For example, if the number of wal_fpi becomes higher, users may tune
>> "wal_compression", "checkpoint_timeout" and so on.
>> 
>> Although users can check the above statistics via EXPLAIN, 
>> auto_explain,
>> autovacuum and pg_stat_statements now,
>> if users want to see the performance trends  for the entire database,
>> they must recalculate the statistics.
>> 
> 
> Here, do you mean to say 'entire cluster' instead of 'entire database'
> because it seems these stats are getting collected for the entire
> cluster?

Thanks for your comments.
Yes, I wanted to say 'entire cluster'.

>> I think it is useful to add the sum of the basic statistics.
>> 
> 
> There is an argument that it is better to view these stats at the
> statement-level so that one can know which statements are causing most
> WAL and then try to rate-limit them if required in the application and
> anyway they can get the aggregate of all the WAL if they want. We have
> added these stats in PG-13, so do we have any evidence that the
> already added stats don't provide enough information? I understand
> that you are trying to display the accumulated stats here which if
> required users/DBA need to compute with the currently provided stats.
> OTOH, sometimes adding more ways to do some things causes difficulty
> for users to understand and learn.

I agreed that the statement-level stat is important and I understood 
that we can
know the aggregated WAL stats of pg_stat_statement view and autovacuum's 
log.
But now, WAL stats generated by autovacuum can be output to logs and it 
is not
easy to aggregate them. Since WAL writes impacts for the entire cluster, 
I thought
it's natural to provide accumulated value.

> I see that we also need to add extra code to capture these stats (some
> of which is in performance-critical path especially in
> XLogInsertRecord) which again makes me a bit uncomfortable. It might
> be that it is all fine as it is very important to collect these stats
> at cluster-level in spite that the same information can be gathered at
> statement-level to help customers but I don't see a very strong case
> for that in your proposal.

Also about performance, I thought there are few impacts because it
increments stats in memory. If I can implement to reuse pgWalUsage's
value which already collects these stats, there is no impact in 
XLogInsertRecord.
For example, how about pg_stat_wal() calculates the accumulated
value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's value?

Regards
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Amit Kapila
Date:
On Tue, Oct 20, 2020 at 12:41 PM Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote:
>
> On 2020-10-20 12:46, Amit Kapila wrote:
> > On Tue, Oct 20, 2020 at 8:01 AM Masahiro Ikeda
> >> 1. Basic statistics of WAL activity
> >>
> >> - wal_records: Total number of WAL records generated
> >> - wal_fpi: Total number of WAL full page images generated
> >> - wal_bytes: Total amount of WAL bytes generated
> >>
> >> To understand DB's performance, first, we will check the performance
> >> trends for the entire database instance.
> >> For example, if the number of wal_fpi becomes higher, users may tune
> >> "wal_compression", "checkpoint_timeout" and so on.
> >>
> >> Although users can check the above statistics via EXPLAIN,
> >> auto_explain,
> >> autovacuum and pg_stat_statements now,
> >> if users want to see the performance trends  for the entire database,
> >> they must recalculate the statistics.
> >>
> >
> > Here, do you mean to say 'entire cluster' instead of 'entire database'
> > because it seems these stats are getting collected for the entire
> > cluster?
>
> Thanks for your comments.
> Yes, I wanted to say 'entire cluster'.
>
> >> I think it is useful to add the sum of the basic statistics.
> >>
> >
> > There is an argument that it is better to view these stats at the
> > statement-level so that one can know which statements are causing most
> > WAL and then try to rate-limit them if required in the application and
> > anyway they can get the aggregate of all the WAL if they want. We have
> > added these stats in PG-13, so do we have any evidence that the
> > already added stats don't provide enough information? I understand
> > that you are trying to display the accumulated stats here which if
> > required users/DBA need to compute with the currently provided stats.
> > OTOH, sometimes adding more ways to do some things causes difficulty
> > for users to understand and learn.
>
> I agreed that the statement-level stat is important and I understood
> that we can
> know the aggregated WAL stats of pg_stat_statement view and autovacuum's
> log.
> But now, WAL stats generated by autovacuum can be output to logs and it
> is not
> easy to aggregate them. Since WAL writes impacts for the entire cluster,
> I thought
> it's natural to provide accumulated value.
>

I think it is other way i.e if we would have accumulated stats then it
makes sense to provide those at statement-level because one would like
to know the exact cause of more WAL activity. Say it is due to an
autovacuum or due to the particular set of statements then it would
easier for users to do something about it.

-- 
With Regards,
Amit Kapila.



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
"lchch1990@sina.cn"
Date:

I think it's really a more convenient way to collect wal usage information,
with it we can query when I want. Several points on my side:

1. It will be nice If you provide a chance to reset the information in WalStats,
so that we can reset it without restart the database.

2. I think 'wal_write_backend' is mean wal write times happen in
backends. The describe in document is not so clear, suggest rewrite it.

3. I do not think it's a correct describe in document for 'wal_buffers_full'.

4. Quite strange to collect twice in XLogInsertRecord() for xl_tot_len,
m_wal_records, m_wal_fpi.

5. I notice some code in issue_xlog_fsync() function to collect sync info,
a standby may call the issue_xlog_fsync() too, which do not want to
to collect this info. I think this need some change avoid run by standby
side. 


Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Kyotaro Horiguchi
Date:
At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in 
> On 2020-10-20 12:46, Amit Kapila wrote:
> > I see that we also need to add extra code to capture these stats (some
> > of which is in performance-critical path especially in
> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
> > be that it is all fine as it is very important to collect these stats
> > at cluster-level in spite that the same information can be gathered at
> > statement-level to help customers but I don't see a very strong case
> > for that in your proposal.

We should avoid that duplication as possible even if the both number
are important.

> Also about performance, I thought there are few impacts because it
> increments stats in memory. If I can implement to reuse pgWalUsage's
> value which already collects these stats, there is no impact in
> XLogInsertRecord.
> For example, how about pg_stat_wal() calculates the accumulated
> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
> value?

I don't think that works, but it would work that pgstat_send_wal()
takes the difference of that values between two successive calls.

WalUsage prevWalUsage;
...
pgstat_send_wal()
{
..
   /* fill in some values using pgWalUsage */
   WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - prevWalUsage.wal_bytes;
   WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
   WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - prevWalUsage.wal_fpi;
...
   pgstat_send(&WalStats, sizeof(WalStats));

   /* remember the current numbers */
   prevWalUsage = pgWalUsage;   


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Masahiro Ikeda
Date:
On 2020-10-21 13:41, Amit Kapila wrote:
> On Tue, Oct 20, 2020 at 12:41 PM Masahiro Ikeda
> <ikedamsh@oss.nttdata.com> wrote:
>> 
>> On 2020-10-20 12:46, Amit Kapila wrote:
>> > On Tue, Oct 20, 2020 at 8:01 AM Masahiro Ikeda
>> >> 1. Basic statistics of WAL activity
>> >>
>> >> - wal_records: Total number of WAL records generated
>> >> - wal_fpi: Total number of WAL full page images generated
>> >> - wal_bytes: Total amount of WAL bytes generated
>> >>
>> >> To understand DB's performance, first, we will check the performance
>> >> trends for the entire database instance.
>> >> For example, if the number of wal_fpi becomes higher, users may tune
>> >> "wal_compression", "checkpoint_timeout" and so on.
>> >>
>> >> Although users can check the above statistics via EXPLAIN,
>> >> auto_explain,
>> >> autovacuum and pg_stat_statements now,
>> >> if users want to see the performance trends  for the entire database,
>> >> they must recalculate the statistics.
>> >>
>> >
>> > Here, do you mean to say 'entire cluster' instead of 'entire database'
>> > because it seems these stats are getting collected for the entire
>> > cluster?
>> 
>> Thanks for your comments.
>> Yes, I wanted to say 'entire cluster'.
>> 
>> >> I think it is useful to add the sum of the basic statistics.
>> >>
>> >
>> > There is an argument that it is better to view these stats at the
>> > statement-level so that one can know which statements are causing most
>> > WAL and then try to rate-limit them if required in the application and
>> > anyway they can get the aggregate of all the WAL if they want. We have
>> > added these stats in PG-13, so do we have any evidence that the
>> > already added stats don't provide enough information? I understand
>> > that you are trying to display the accumulated stats here which if
>> > required users/DBA need to compute with the currently provided stats.
>> > OTOH, sometimes adding more ways to do some things causes difficulty
>> > for users to understand and learn.
>> 
>> I agreed that the statement-level stat is important and I understood
>> that we can
>> know the aggregated WAL stats of pg_stat_statement view and 
>> autovacuum's
>> log.
>> But now, WAL stats generated by autovacuum can be output to logs and 
>> it
>> is not
>> easy to aggregate them. Since WAL writes impacts for the entire 
>> cluster,
>> I thought
>> it's natural to provide accumulated value.
>> 
> 
> I think it is other way i.e if we would have accumulated stats then it
> makes sense to provide those at statement-level because one would like
> to know the exact cause of more WAL activity. Say it is due to an
> autovacuum or due to the particular set of statements then it would
> easier for users to do something about it.

OK, I'll remove them.
Do you have any comments for other statistics?

-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Masahiro Ikeda
Date:
On 2020-10-21 15:54, lchch1990@sina.cn wrote:
> I think it's really a more convenient way to collect wal usage
> information,
> with it we can query when I want. Several points on my side:

Thanks for your comments.


> 1. It will be nice If you provide a chance to reset the information in
> WalStats,
> so that we can reset it without restart the database.

I agree.

Now, pg_stat_wal supports reset all informantion in WalStats
using pg_stat_reset_shared('wal') function.

Isn't it enough?


> 2. I think 'wal_write_backend' is mean wal write times happen in
> backends. The describe in document is not so clear, suggest rewrite
> it.

OK, I'll rewrite to "Total number of times backends write WAL data to 
the disk".


> 3. I do not think it's a correct describe in document for
> 'wal_buffers_full'.

Where should I rewrite the description? If my understanding is not 
correct, please let me know.


> 4. Quite strange to collect twice in XLogInsertRecord() for
> xl_tot_len,
> m_wal_records, m_wal_fpi.

Yes, Amit-san pointed me that too.
I'll remove them from pg_stat_wal since pg_stat_statements and vacuum 
log
already shows the related statistics and there is a comment it's enough.

Anyway, if you need to the accumulated above statistics in pg_stat_wal,
please let me know.


> 5. I notice some code in issue_xlog_fsync() function to collect sync
> info,
> a standby may call the issue_xlog_fsync() too, which do not want to
> to collect this info. I think this need some change avoid run by
> standby
> side.

Thanks, I will fix it.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Masahiro Ikeda
Date:
On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
> <ikedamsh@oss.nttdata.com> wrote in
>> On 2020-10-20 12:46, Amit Kapila wrote:
>> > I see that we also need to add extra code to capture these stats (some
>> > of which is in performance-critical path especially in
>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>> > be that it is all fine as it is very important to collect these stats
>> > at cluster-level in spite that the same information can be gathered at
>> > statement-level to help customers but I don't see a very strong case
>> > for that in your proposal.
> 
> We should avoid that duplication as possible even if the both number
> are important.
> 
>> Also about performance, I thought there are few impacts because it
>> increments stats in memory. If I can implement to reuse pgWalUsage's
>> value which already collects these stats, there is no impact in
>> XLogInsertRecord.
>> For example, how about pg_stat_wal() calculates the accumulated
>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>> value?
> 
> I don't think that works, but it would work that pgstat_send_wal()
> takes the difference of that values between two successive calls.
> 
> WalUsage prevWalUsage;
> ...
> pgstat_send_wal()
> {
> ..
>    /* fill in some values using pgWalUsage */
>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - 
> prevWalUsage.wal_bytes;
>    WalStats.m_wal_records = pgWalUsage.wal_records - 
> prevWalUsage.wal_records;
>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - 
> prevWalUsage.wal_fpi;
> ...
>    pgstat_send(&WalStats, sizeof(WalStats));
> 
>    /* remember the current numbers */
>    prevWalUsage = pgWalUsage;

Thanks for your advice. This code can avoid the performance impact of 
critical code.

By the way, what do you think to add these statistics to the pg_stat_wal 
view?
I thought to remove the above statistics because there is advice that 
PG13's features,
for example, pg_stat_statement view, vacuum log, and so on can cover 
use-cases.

regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Kyotaro Horiguchi
Date:
At Thu, 22 Oct 2020 10:44:53 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in 
> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
> > At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
> > <ikedamsh@oss.nttdata.com> wrote in
> >> On 2020-10-20 12:46, Amit Kapila wrote:
> >> > I see that we also need to add extra code to capture these stats (some
> >> > of which is in performance-critical path especially in
> >> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
> >> > be that it is all fine as it is very important to collect these stats
> >> > at cluster-level in spite that the same information can be gathered at
> >> > statement-level to help customers but I don't see a very strong case
> >> > for that in your proposal.
> > We should avoid that duplication as possible even if the both number
> > are important.
> > 
> >> Also about performance, I thought there are few impacts because it
> >> increments stats in memory. If I can implement to reuse pgWalUsage's
> >> value which already collects these stats, there is no impact in
> >> XLogInsertRecord.
> >> For example, how about pg_stat_wal() calculates the accumulated
> >> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
> >> value?
> > I don't think that works, but it would work that pgstat_send_wal()
> > takes the difference of that values between two successive calls.
> > WalUsage prevWalUsage;
> > ...
> > pgstat_send_wal()
> > {
> > ..
> >    /* fill in some values using pgWalUsage */
> >    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes;
> >    WalStats.m_wal_records = pgWalUsage.wal_records -
> >    prevWalUsage.wal_records;
> >    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
> > ...
> >    pgstat_send(&WalStats, sizeof(WalStats));
> >    /* remember the current numbers */
> >    prevWalUsage = pgWalUsage;
> 
> Thanks for your advice. This code can avoid the performance impact of
> critical code.
> 
> By the way, what do you think to add these statistics to the
> pg_stat_wal view?
> I thought to remove the above statistics because there is advice that
> PG13's features,
> for example, pg_stat_statement view, vacuum log, and so on can cover
> use-cases.


At Thu, 22 Oct 2020 10:09:21 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in 
> >> I agreed that the statement-level stat is important and I understood
> >> that we can
> >> know the aggregated WAL stats of pg_stat_statement view and
> >> autovacuum's
> >> log.
> >> But now, WAL stats generated by autovacuum can be output to logs and
> >> it
> >> is not
> >> easy to aggregate them. Since WAL writes impacts for the entire
> >> cluster,
> >> I thought
> >> it's natural to provide accumulated value.
> >> 
> > I think it is other way i.e if we would have accumulated stats then it
> > makes sense to provide those at statement-level because one would like
> > to know the exact cause of more WAL activity. Say it is due to an
> > autovacuum or due to the particular set of statements then it would
> > easier for users to do something about it.
> 
> OK, I'll remove them.
> Do you have any comments for other statistics?

That discussion comes from the fact that the code adds duplicate code
in a hot path.  If we that extra cost doesn't exist, we are free to
add the accumulated values in pg_stat_wal. I think they are useful for
stats-collecting tools as far as we can do that without such an extra
cost.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Masahiro Ikeda
Date:
Hi,

Thanks for your comments and advice. I updated the patch.

On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
> <ikedamsh@oss.nttdata.com> wrote in
>> On 2020-10-20 12:46, Amit Kapila wrote:
>> > I see that we also need to add extra code to capture these stats (some
>> > of which is in performance-critical path especially in
>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>> > be that it is all fine as it is very important to collect these stats
>> > at cluster-level in spite that the same information can be gathered at
>> > statement-level to help customers but I don't see a very strong case
>> > for that in your proposal.
> 
> We should avoid that duplication as possible even if the both number
> are important.
> 
>> Also about performance, I thought there are few impacts because it
>> increments stats in memory. If I can implement to reuse pgWalUsage's
>> value which already collects these stats, there is no impact in
>> XLogInsertRecord.
>> For example, how about pg_stat_wal() calculates the accumulated
>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>> value?
> 
> I don't think that works, but it would work that pgstat_send_wal()
> takes the difference of that values between two successive calls.
> 
> WalUsage prevWalUsage;
> ...
> pgstat_send_wal()
> {
> ..
>    /* fill in some values using pgWalUsage */
>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - 
> prevWalUsage.wal_bytes;
>    WalStats.m_wal_records = pgWalUsage.wal_records - 
> prevWalUsage.wal_records;
>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - 
> prevWalUsage.wal_fpi;
> ...
>    pgstat_send(&WalStats, sizeof(WalStats));
> 
>    /* remember the current numbers */
>    prevWalUsage = pgWalUsage;

Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
which is already defined and eliminates the extra overhead.

> 5. I notice some code in issue_xlog_fsync() function to collect sync 
> info,
> a standby may call the issue_xlog_fsync() too, which do not want to
> to collect this info. I think this need some change avoid run by
> standby side.

IIUC, issue_xlog_fsync is called by wal receiver on standby side.
But it doesn't send collected statistics to the stats collecter.
So, I think it's not necessary to change the code to avoid collecting 
the stats on the standby side.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Fujii Masao
Date:

On 2020/10/29 17:03, Masahiro Ikeda wrote:
> Hi,
> 
> Thanks for your comments and advice. I updated the patch.
> 
> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>> <ikedamsh@oss.nttdata.com> wrote in
>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>> > I see that we also need to add extra code to capture these stats (some
>>> > of which is in performance-critical path especially in
>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>> > be that it is all fine as it is very important to collect these stats
>>> > at cluster-level in spite that the same information can be gathered at
>>> > statement-level to help customers but I don't see a very strong case
>>> > for that in your proposal.
>>
>> We should avoid that duplication as possible even if the both number
>> are important.
>>
>>> Also about performance, I thought there are few impacts because it
>>> increments stats in memory. If I can implement to reuse pgWalUsage's
>>> value which already collects these stats, there is no impact in
>>> XLogInsertRecord.
>>> For example, how about pg_stat_wal() calculates the accumulated
>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>> value?
>>
>> I don't think that works, but it would work that pgstat_send_wal()
>> takes the difference of that values between two successive calls.
>>
>> WalUsage prevWalUsage;
>> ...
>> pgstat_send_wal()
>> {
>> ..
>>    /* fill in some values using pgWalUsage */
>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - prevWalUsage.wal_bytes;
>>    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - prevWalUsage.wal_fpi;
>> ...
>>    pgstat_send(&WalStats, sizeof(WalStats));
>>
>>    /* remember the current numbers */
>>    prevWalUsage = pgWalUsage;
> 
> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
> which is already defined and eliminates the extra overhead.

+    /* fill in some values using pgWalUsage */
+    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes;
+    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
+    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;

It's better to use WalUsageAccumDiff() here?

prevWalUsage needs to be initialized with pgWalUsage?

+                if (AmWalWriterProcess()){
+                    WalStats.m_wal_write_walwriter++;
+                }
+                else
+                {
+                    WalStats.m_wal_write_backend++;
+                }

I think that it's better not to separate m_wal_write_xxx into two for
walwriter and other processes. Instead, we can use one m_wal_write_xxx
counter and make pgstat_send_wal() send also the process type to
the stats collector. Then the stats collector can accumulate the counters
per process type if necessary. If we adopt this approach, we can easily
extend pg_stat_wal so that any fields can be reported per process type.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Fujii Masao
Date:

On 2020/10/20 11:31, Masahiro Ikeda wrote:
> Hi,
> 
> I think we need to add some statistics to pg_stat_wal view.
> 
> Although there are some parameter related WAL,
> there are few statistics for tuning them.
> 
> I think it's better to provide the following statistics.
> Please let me know your comments.
> 
> ```
> postgres=# SELECT * from pg_stat_wal;
> -[ RECORD 1 ]-------+------------------------------
> wal_records         | 2000224
> wal_fpi             | 47
> wal_bytes           | 248216337
> wal_buffers_full    | 20954
> wal_init_file       | 8
> wal_write_backend   | 20960
> wal_write_walwriter | 46
> wal_write_time      | 51
> wal_sync_backend    | 7
> wal_sync_walwriter  | 8
> wal_sync_time       | 0
> stats_reset         | 2020-10-20 11:04:51.307771+09
> ```
> 
> 1. Basic statistics of WAL activity
> 
> - wal_records: Total number of WAL records generated
> - wal_fpi: Total number of WAL full page images generated
> - wal_bytes: Total amount of WAL bytes generated
> 
> To understand DB's performance, first, we will check the performance
> trends for the entire database instance.
> For example, if the number of wal_fpi becomes higher, users may tune
> "wal_compression", "checkpoint_timeout" and so on.
> 
> Although users can check the above statistics via EXPLAIN, auto_explain,
> autovacuum and pg_stat_statements now,
> if users want to see the performance trends  for the entire database,
> they must recalculate the statistics.
> 
> I think it is useful to add the sum of the basic statistics.
> 
> 
> 2.  WAL segment file creation
> 
> - wal_init_file: Total number of WAL segment files created.
> 
> To create a new WAL file may have an impact on the performance of
> a write-heavy workload generating lots of WAL. If this number is reported high,
> to reduce the number of this initialization, we can tune WAL-related parameters
> so that more "recycled" WAL files can be held.
> 
> 
> 
> 3. Number of when WAL is flushed
> 
> - wal_write_backend : Total number of WAL data written to the disk by backends
> - wal_write_walwriter : Total number of WAL data written to the disk by walwriter
> - wal_sync_backend : Total number of WAL data synced to the disk by backends
> - wal_sync_walwriter : Total number of WAL data synced to the disk by walwrite
> 
> I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions.
> If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload.

I just wonder how useful these counters are. Even without these counters,
we already know synchronous_commit=off is likely to cause the better
performance (but has the risk of data loss). So ISTM that these counters are
not so useful when tuning synchronous_commit.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Masahiro Ikeda
Date:
On 2020-10-30 11:50, Fujii Masao wrote:
> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>> Hi,
>> 
>> Thanks for your comments and advice. I updated the patch.
>> 
>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>> <ikedamsh@oss.nttdata.com> wrote in
>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>> > I see that we also need to add extra code to capture these stats (some
>>>> > of which is in performance-critical path especially in
>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>> > be that it is all fine as it is very important to collect these stats
>>>> > at cluster-level in spite that the same information can be gathered at
>>>> > statement-level to help customers but I don't see a very strong case
>>>> > for that in your proposal.
>>> 
>>> We should avoid that duplication as possible even if the both number
>>> are important.
>>> 
>>>> Also about performance, I thought there are few impacts because it
>>>> increments stats in memory. If I can implement to reuse pgWalUsage's
>>>> value which already collects these stats, there is no impact in
>>>> XLogInsertRecord.
>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>> value?
>>> 
>>> I don't think that works, but it would work that pgstat_send_wal()
>>> takes the difference of that values between two successive calls.
>>> 
>>> WalUsage prevWalUsage;
>>> ...
>>> pgstat_send_wal()
>>> {
>>> ..
>>>    /* fill in some values using pgWalUsage */
>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - 
>>> prevWalUsage.wal_bytes;
>>>    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>> prevWalUsage.wal_records;
>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - 
>>> prevWalUsage.wal_fpi;
>>> ...
>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>> 
>>>    /* remember the current numbers */
>>>    prevWalUsage = pgWalUsage;
>> 
>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>> which is already defined and eliminates the extra overhead.
> 
> +    /* fill in some values using pgWalUsage */
> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes;
> +    WalStats.m_wal_records = pgWalUsage.wal_records - 
> prevWalUsage.wal_records;
> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
> 
> It's better to use WalUsageAccumDiff() here?

Yes, thanks. I fixed it.

> prevWalUsage needs to be initialized with pgWalUsage?
> 
> +                if (AmWalWriterProcess()){
> +                    WalStats.m_wal_write_walwriter++;
> +                }
> +                else
> +                {
> +                    WalStats.m_wal_write_backend++;
> +                }
> 
> I think that it's better not to separate m_wal_write_xxx into two for
> walwriter and other processes. Instead, we can use one m_wal_write_xxx
> counter and make pgstat_send_wal() send also the process type to
> the stats collector. Then the stats collector can accumulate the 
> counters
> per process type if necessary. If we adopt this approach, we can easily
> extend pg_stat_wal so that any fields can be reported per process type.

I'll remove the above source code because these counters are not useful.


On 2020-10-30 12:00, Fujii Masao wrote:
> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>> Hi,
>> 
>> I think we need to add some statistics to pg_stat_wal view.
>> 
>> Although there are some parameter related WAL,
>> there are few statistics for tuning them.
>> 
>> I think it's better to provide the following statistics.
>> Please let me know your comments.
>> 
>> ```
>> postgres=# SELECT * from pg_stat_wal;
>> -[ RECORD 1 ]-------+------------------------------
>> wal_records         | 2000224
>> wal_fpi             | 47
>> wal_bytes           | 248216337
>> wal_buffers_full    | 20954
>> wal_init_file       | 8
>> wal_write_backend   | 20960
>> wal_write_walwriter | 46
>> wal_write_time      | 51
>> wal_sync_backend    | 7
>> wal_sync_walwriter  | 8
>> wal_sync_time       | 0
>> stats_reset         | 2020-10-20 11:04:51.307771+09
>> ```
>> 
>> 1. Basic statistics of WAL activity
>> 
>> - wal_records: Total number of WAL records generated
>> - wal_fpi: Total number of WAL full page images generated
>> - wal_bytes: Total amount of WAL bytes generated
>> 
>> To understand DB's performance, first, we will check the performance
>> trends for the entire database instance.
>> For example, if the number of wal_fpi becomes higher, users may tune
>> "wal_compression", "checkpoint_timeout" and so on.
>> 
>> Although users can check the above statistics via EXPLAIN, 
>> auto_explain,
>> autovacuum and pg_stat_statements now,
>> if users want to see the performance trends  for the entire database,
>> they must recalculate the statistics.
>> 
>> I think it is useful to add the sum of the basic statistics.
>> 
>> 
>> 2.  WAL segment file creation
>> 
>> - wal_init_file: Total number of WAL segment files created.
>> 
>> To create a new WAL file may have an impact on the performance of
>> a write-heavy workload generating lots of WAL. If this number is 
>> reported high,
>> to reduce the number of this initialization, we can tune WAL-related 
>> parameters
>> so that more "recycled" WAL files can be held.
>> 
>> 
>> 
>> 3. Number of when WAL is flushed
>> 
>> - wal_write_backend : Total number of WAL data written to the disk by 
>> backends
>> - wal_write_walwriter : Total number of WAL data written to the disk 
>> by walwriter
>> - wal_sync_backend : Total number of WAL data synced to the disk by 
>> backends
>> - wal_sync_walwriter : Total number of WAL data synced to the disk by 
>> walwrite
>> 
>> I think it's useful for tuning "synchronous_commit" and "commit_delay" 
>> for query executions.
>> If the number of WAL is flushed is high, users can know 
>> "synchronous_commit" is useful for the workload.
> 
> I just wonder how useful these counters are. Even without these 
> counters,
> we already know synchronous_commit=off is likely to cause the better
> performance (but has the risk of data loss). So ISTM that these 
> counters are
> not so useful when tuning synchronous_commit.

Thanks, my understanding was wrong.
I agreed that your comments.

I merged the statistics of *_backend and *_walwriter.
I think the sum of them is useful to calculate the average per 
write/sync time.
For example, per write time is equals wal_write_time / wal_write.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Fujii Masao
Date:

On 2020/11/06 10:25, Masahiro Ikeda wrote:
> On 2020-10-30 11:50, Fujii Masao wrote:
>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>> Hi,
>>>
>>> Thanks for your comments and advice. I updated the patch.
>>>
>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>> <ikedamsh@oss.nttdata.com> wrote in
>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>> > of which is in performance-critical path especially in
>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>> > be that it is all fine as it is very important to collect these stats
>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>> > statement-level to help customers but I don't see a very strong case
>>>>> > for that in your proposal.
>>>>
>>>> We should avoid that duplication as possible even if the both number
>>>> are important.
>>>>
>>>>> Also about performance, I thought there are few impacts because it
>>>>> increments stats in memory. If I can implement to reuse pgWalUsage's
>>>>> value which already collects these stats, there is no impact in
>>>>> XLogInsertRecord.
>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>> value?
>>>>
>>>> I don't think that works, but it would work that pgstat_send_wal()
>>>> takes the difference of that values between two successive calls.
>>>>
>>>> WalUsage prevWalUsage;
>>>> ...
>>>> pgstat_send_wal()
>>>> {
>>>> ..
>>>>    /* fill in some values using pgWalUsage */
>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - prevWalUsage.wal_bytes;
>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - prevWalUsage.wal_fpi;
>>>> ...
>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>
>>>>    /* remember the current numbers */
>>>>    prevWalUsage = pgWalUsage;
>>>
>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>> which is already defined and eliminates the extra overhead.
>>
>> +    /* fill in some values using pgWalUsage */
>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes;
>> +    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
>>
>> It's better to use WalUsageAccumDiff() here?
> 
> Yes, thanks. I fixed it.
> 
>> prevWalUsage needs to be initialized with pgWalUsage?
>>
>> +                if (AmWalWriterProcess()){
>> +                    WalStats.m_wal_write_walwriter++;
>> +                }
>> +                else
>> +                {
>> +                    WalStats.m_wal_write_backend++;
>> +                }
>>
>> I think that it's better not to separate m_wal_write_xxx into two for
>> walwriter and other processes. Instead, we can use one m_wal_write_xxx
>> counter and make pgstat_send_wal() send also the process type to
>> the stats collector. Then the stats collector can accumulate the counters
>> per process type if necessary. If we adopt this approach, we can easily
>> extend pg_stat_wal so that any fields can be reported per process type.
> 
> I'll remove the above source code because these counters are not useful.
> 
> 
> On 2020-10-30 12:00, Fujii Masao wrote:
>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>> Hi,
>>>
>>> I think we need to add some statistics to pg_stat_wal view.
>>>
>>> Although there are some parameter related WAL,
>>> there are few statistics for tuning them.
>>>
>>> I think it's better to provide the following statistics.
>>> Please let me know your comments.
>>>
>>> ```
>>> postgres=# SELECT * from pg_stat_wal;
>>> -[ RECORD 1 ]-------+------------------------------
>>> wal_records         | 2000224
>>> wal_fpi             | 47
>>> wal_bytes           | 248216337
>>> wal_buffers_full    | 20954
>>> wal_init_file       | 8
>>> wal_write_backend   | 20960
>>> wal_write_walwriter | 46
>>> wal_write_time      | 51
>>> wal_sync_backend    | 7
>>> wal_sync_walwriter  | 8
>>> wal_sync_time       | 0
>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>> ```
>>>
>>> 1. Basic statistics of WAL activity
>>>
>>> - wal_records: Total number of WAL records generated
>>> - wal_fpi: Total number of WAL full page images generated
>>> - wal_bytes: Total amount of WAL bytes generated
>>>
>>> To understand DB's performance, first, we will check the performance
>>> trends for the entire database instance.
>>> For example, if the number of wal_fpi becomes higher, users may tune
>>> "wal_compression", "checkpoint_timeout" and so on.
>>>
>>> Although users can check the above statistics via EXPLAIN, auto_explain,
>>> autovacuum and pg_stat_statements now,
>>> if users want to see the performance trends  for the entire database,
>>> they must recalculate the statistics.
>>>
>>> I think it is useful to add the sum of the basic statistics.
>>>
>>>
>>> 2.  WAL segment file creation
>>>
>>> - wal_init_file: Total number of WAL segment files created.
>>>
>>> To create a new WAL file may have an impact on the performance of
>>> a write-heavy workload generating lots of WAL. If this number is reported high,
>>> to reduce the number of this initialization, we can tune WAL-related parameters
>>> so that more "recycled" WAL files can be held.
>>>
>>>
>>>
>>> 3. Number of when WAL is flushed
>>>
>>> - wal_write_backend : Total number of WAL data written to the disk by backends
>>> - wal_write_walwriter : Total number of WAL data written to the disk by walwriter
>>> - wal_sync_backend : Total number of WAL data synced to the disk by backends
>>> - wal_sync_walwriter : Total number of WAL data synced to the disk by walwrite
>>>
>>> I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions.
>>> If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload.
>>
>> I just wonder how useful these counters are. Even without these counters,
>> we already know synchronous_commit=off is likely to cause the better
>> performance (but has the risk of data loss). So ISTM that these counters are
>> not so useful when tuning synchronous_commit.
> 
> Thanks, my understanding was wrong.
> I agreed that your comments.
> 
> I merged the statistics of *_backend and *_walwriter.
> I think the sum of them is useful to calculate the average per write/sync time.
> For example, per write time is equals wal_write_time / wal_write.

Understood.

Thanks for updating the patch!

patching file src/include/catalog/pg_proc.dat
Hunk #1 FAILED at 5491.
1 out of 1 hunk FAILED -- saving rejects to file src/include/catalog/pg_proc.dat.rej

I got this failure when applying the patch. Could you update the patch?


-       Number of times WAL data was written to the disk because WAL buffers got full
+       Total number of times WAL data written to the disk because WAL buffers got full

Isn't "was" necessary between "data" and "written"?


+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_bytes</structfield> <type>bigint</type>

Shouldn't the type of wal_bytes be numeric because the total number of
WAL bytes can exceed the range of bigint? I think that the type of
pg_stat_statements.wal_bytes is also numeric for the same reason.


+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_write_time</structfield> <type>bigint</type>

Shouldn't the type of wal_xxx_time be double precision,
like pg_stat_database.blk_write_time?


Even when fsync is set to off or wal_sync_method is set to open_sync,
wal_sync is incremented. Isn't this behavior confusing?


+       Total amount of time that has been spent in the portion of
+       WAL data was written to disk by backend and walwriter, in milliseconds
+       (if <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)

With the patch, track_io_timing controls both IO for data files and
WAL files. But we may want to track only either of them. So it's better
to extend track_io_timing so that we can specify the tracking target
in the parameter? For example, we can make track_io_timing accept
data, wal and all. Or we should introduce new GUC for WAL, e.g.,
track_wal_io_timing? Thought?

I'm afraid that "by backend and walwriter" part can make us thinkg
incorrectly that WAL writes by other processes like autovacuum
are not tracked.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Fujii Masao
Date:

On 2020/11/12 14:58, Fujii Masao wrote:
> 
> 
> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>> On 2020-10-30 11:50, Fujii Masao wrote:
>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>> Hi,
>>>>
>>>> Thanks for your comments and advice. I updated the patch.
>>>>
>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>> <ikedamsh@oss.nttdata.com> wrote in
>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>> > of which is in performance-critical path especially in
>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>> > for that in your proposal.
>>>>>
>>>>> We should avoid that duplication as possible even if the both number
>>>>> are important.
>>>>>
>>>>>> Also about performance, I thought there are few impacts because it
>>>>>> increments stats in memory. If I can implement to reuse pgWalUsage's
>>>>>> value which already collects these stats, there is no impact in
>>>>>> XLogInsertRecord.
>>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>>> value?
>>>>>
>>>>> I don't think that works, but it would work that pgstat_send_wal()
>>>>> takes the difference of that values between two successive calls.
>>>>>
>>>>> WalUsage prevWalUsage;
>>>>> ...
>>>>> pgstat_send_wal()
>>>>> {
>>>>> ..
>>>>>    /* fill in some values using pgWalUsage */
>>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - prevWalUsage.wal_bytes;
>>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - prevWalUsage.wal_fpi;
>>>>> ...
>>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>>
>>>>>    /* remember the current numbers */
>>>>>    prevWalUsage = pgWalUsage;
>>>>
>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>> which is already defined and eliminates the extra overhead.
>>>
>>> +    /* fill in some values using pgWalUsage */
>>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes;
>>> +    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
>>>
>>> It's better to use WalUsageAccumDiff() here?
>>
>> Yes, thanks. I fixed it.
>>
>>> prevWalUsage needs to be initialized with pgWalUsage?
>>>
>>> +                if (AmWalWriterProcess()){
>>> +                    WalStats.m_wal_write_walwriter++;
>>> +                }
>>> +                else
>>> +                {
>>> +                    WalStats.m_wal_write_backend++;
>>> +                }
>>>
>>> I think that it's better not to separate m_wal_write_xxx into two for
>>> walwriter and other processes. Instead, we can use one m_wal_write_xxx
>>> counter and make pgstat_send_wal() send also the process type to
>>> the stats collector. Then the stats collector can accumulate the counters
>>> per process type if necessary. If we adopt this approach, we can easily
>>> extend pg_stat_wal so that any fields can be reported per process type.
>>
>> I'll remove the above source code because these counters are not useful.
>>
>>
>> On 2020-10-30 12:00, Fujii Masao wrote:
>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>> Hi,
>>>>
>>>> I think we need to add some statistics to pg_stat_wal view.
>>>>
>>>> Although there are some parameter related WAL,
>>>> there are few statistics for tuning them.
>>>>
>>>> I think it's better to provide the following statistics.
>>>> Please let me know your comments.
>>>>
>>>> ```
>>>> postgres=# SELECT * from pg_stat_wal;
>>>> -[ RECORD 1 ]-------+------------------------------
>>>> wal_records         | 2000224
>>>> wal_fpi             | 47
>>>> wal_bytes           | 248216337
>>>> wal_buffers_full    | 20954
>>>> wal_init_file       | 8
>>>> wal_write_backend   | 20960
>>>> wal_write_walwriter | 46
>>>> wal_write_time      | 51
>>>> wal_sync_backend    | 7
>>>> wal_sync_walwriter  | 8
>>>> wal_sync_time       | 0
>>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>>> ```
>>>>
>>>> 1. Basic statistics of WAL activity
>>>>
>>>> - wal_records: Total number of WAL records generated
>>>> - wal_fpi: Total number of WAL full page images generated
>>>> - wal_bytes: Total amount of WAL bytes generated
>>>>
>>>> To understand DB's performance, first, we will check the performance
>>>> trends for the entire database instance.
>>>> For example, if the number of wal_fpi becomes higher, users may tune
>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>>
>>>> Although users can check the above statistics via EXPLAIN, auto_explain,
>>>> autovacuum and pg_stat_statements now,
>>>> if users want to see the performance trends  for the entire database,
>>>> they must recalculate the statistics.
>>>>
>>>> I think it is useful to add the sum of the basic statistics.
>>>>
>>>>
>>>> 2.  WAL segment file creation
>>>>
>>>> - wal_init_file: Total number of WAL segment files created.
>>>>
>>>> To create a new WAL file may have an impact on the performance of
>>>> a write-heavy workload generating lots of WAL. If this number is reported high,
>>>> to reduce the number of this initialization, we can tune WAL-related parameters
>>>> so that more "recycled" WAL files can be held.
>>>>
>>>>
>>>>
>>>> 3. Number of when WAL is flushed
>>>>
>>>> - wal_write_backend : Total number of WAL data written to the disk by backends
>>>> - wal_write_walwriter : Total number of WAL data written to the disk by walwriter
>>>> - wal_sync_backend : Total number of WAL data synced to the disk by backends
>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk by walwrite
>>>>
>>>> I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions.
>>>> If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload.
>>>
>>> I just wonder how useful these counters are. Even without these counters,
>>> we already know synchronous_commit=off is likely to cause the better
>>> performance (but has the risk of data loss). So ISTM that these counters are
>>> not so useful when tuning synchronous_commit.
>>
>> Thanks, my understanding was wrong.
>> I agreed that your comments.
>>
>> I merged the statistics of *_backend and *_walwriter.
>> I think the sum of them is useful to calculate the average per write/sync time.
>> For example, per write time is equals wal_write_time / wal_write.
> 
> Understood.
> 
> Thanks for updating the patch!
> 
> patching file src/include/catalog/pg_proc.dat
> Hunk #1 FAILED at 5491.
> 1 out of 1 hunk FAILED -- saving rejects to file src/include/catalog/pg_proc.dat.rej
> 
> I got this failure when applying the patch. Could you update the patch?
> 
> 
> -       Number of times WAL data was written to the disk because WAL buffers got full
> +       Total number of times WAL data written to the disk because WAL buffers got full
> 
> Isn't "was" necessary between "data" and "written"?
> 
> 
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>wal_bytes</structfield> <type>bigint</type>
> 
> Shouldn't the type of wal_bytes be numeric because the total number of
> WAL bytes can exceed the range of bigint? I think that the type of
> pg_stat_statements.wal_bytes is also numeric for the same reason.
> 
> 
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>wal_write_time</structfield> <type>bigint</type>
> 
> Shouldn't the type of wal_xxx_time be double precision,
> like pg_stat_database.blk_write_time?
> 
> 
> Even when fsync is set to off or wal_sync_method is set to open_sync,
> wal_sync is incremented. Isn't this behavior confusing?
> 
> 
> +       Total amount of time that has been spent in the portion of
> +       WAL data was written to disk by backend and walwriter, in milliseconds
> +       (if <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
> 
> With the patch, track_io_timing controls both IO for data files and
> WAL files. But we may want to track only either of them. So it's better
> to extend track_io_timing so that we can specify the tracking target
> in the parameter? For example, we can make track_io_timing accept
> data, wal and all. Or we should introduce new GUC for WAL, e.g.,
> track_wal_io_timing? Thought?
> 
> I'm afraid that "by backend and walwriter" part can make us thinkg
> incorrectly that WAL writes by other processes like autovacuum
> are not tracked.

  pgstat_send_wal(void)
  {
+    /* fill in some values using pgWalUsage */
+    WalUsage walusage;
+    memset(&walusage, 0, sizeof(WalUsage));
+    WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);

At the first call to pgstat_send_wal(), prevWalUsage has not been set to
the previous value of pgWalUsage. So the calculation result of
WalUsageAccumDiff() can be incorrect. To address this issue,
prevWalUsage should be set to pgWalUsage or both should be initialized
with 0 at the beginning of the process, for example?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
"lchch1990@sina.cn"
Date:

>Now, pg_stat_wal supports reset all informantion in WalStats
>using pg_stat_reset_shared('wal') function.
>Isn't it enough?
Yes it ok, sorry I miss this infomation.


>> 3. I do not think it's a correct describe in document for
>> 'wal_buffers_full'.
 
>Where should I rewrite the description? If my understanding is not
>correct, please let me know.
Sorry I have not described it clearly, because I can not understand the meaning of this
column after I read the describe in document.
And now I read the source code of walwrite and found the place where 'wal_buffers_full'
added is for a backend to wait a wal buffer which is occupied by other wal page, so the 
backend flush the old page in the wal buffer(after wait it can).
So i think the origin decribe in document is not so in point, we can describe it such as
'Total number of times WAL data written to the disk because a backend yelled a wal buffer
for an advanced wal page.

Sorry if my understand is wrong.



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Masahiro Ikeda
Date:
On 2020-11-12 16:27, Fujii Masao wrote:
> On 2020/11/12 14:58, Fujii Masao wrote:
>> 
>> 
>> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>>> On 2020-10-30 11:50, Fujii Masao wrote:
>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>>> Hi,
>>>>> 
>>>>> Thanks for your comments and advice. I updated the patch.
>>>>> 
>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>>> <ikedamsh@oss.nttdata.com> wrote in
>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>>> > of which is in performance-critical path especially in
>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>>> > for that in your proposal.
>>>>>> 
>>>>>> We should avoid that duplication as possible even if the both 
>>>>>> number
>>>>>> are important.
>>>>>> 
>>>>>>> Also about performance, I thought there are few impacts because 
>>>>>>> it
>>>>>>> increments stats in memory. If I can implement to reuse 
>>>>>>> pgWalUsage's
>>>>>>> value which already collects these stats, there is no impact in
>>>>>>> XLogInsertRecord.
>>>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>>>> value?
>>>>>> 
>>>>>> I don't think that works, but it would work that pgstat_send_wal()
>>>>>> takes the difference of that values between two successive calls.
>>>>>> 
>>>>>> WalUsage prevWalUsage;
>>>>>> ...
>>>>>> pgstat_send_wal()
>>>>>> {
>>>>>> ..
>>>>>>    /* fill in some values using pgWalUsage */
>>>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - 
>>>>>> prevWalUsage.wal_bytes;
>>>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>>>>> prevWalUsage.wal_records;
>>>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - 
>>>>>> prevWalUsage.wal_fpi;
>>>>>> ...
>>>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>>> 
>>>>>>    /* remember the current numbers */
>>>>>>    prevWalUsage = pgWalUsage;
>>>>> 
>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>>> which is already defined and eliminates the extra overhead.
>>>> 
>>>> +    /* fill in some values using pgWalUsage */
>>>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - 
>>>> prevWalUsage.wal_bytes;
>>>> +    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>>> prevWalUsage.wal_records;
>>>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
>>>> 
>>>> It's better to use WalUsageAccumDiff() here?
>>> 
>>> Yes, thanks. I fixed it.
>>> 
>>>> prevWalUsage needs to be initialized with pgWalUsage?
>>>> 
>>>> +                if (AmWalWriterProcess()){
>>>> +                    WalStats.m_wal_write_walwriter++;
>>>> +                }
>>>> +                else
>>>> +                {
>>>> +                    WalStats.m_wal_write_backend++;
>>>> +                }
>>>> 
>>>> I think that it's better not to separate m_wal_write_xxx into two 
>>>> for
>>>> walwriter and other processes. Instead, we can use one 
>>>> m_wal_write_xxx
>>>> counter and make pgstat_send_wal() send also the process type to
>>>> the stats collector. Then the stats collector can accumulate the 
>>>> counters
>>>> per process type if necessary. If we adopt this approach, we can 
>>>> easily
>>>> extend pg_stat_wal so that any fields can be reported per process 
>>>> type.
>>> 
>>> I'll remove the above source code because these counters are not 
>>> useful.
>>> 
>>> 
>>> On 2020-10-30 12:00, Fujii Masao wrote:
>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>>> Hi,
>>>>> 
>>>>> I think we need to add some statistics to pg_stat_wal view.
>>>>> 
>>>>> Although there are some parameter related WAL,
>>>>> there are few statistics for tuning them.
>>>>> 
>>>>> I think it's better to provide the following statistics.
>>>>> Please let me know your comments.
>>>>> 
>>>>> ```
>>>>> postgres=# SELECT * from pg_stat_wal;
>>>>> -[ RECORD 1 ]-------+------------------------------
>>>>> wal_records         | 2000224
>>>>> wal_fpi             | 47
>>>>> wal_bytes           | 248216337
>>>>> wal_buffers_full    | 20954
>>>>> wal_init_file       | 8
>>>>> wal_write_backend   | 20960
>>>>> wal_write_walwriter | 46
>>>>> wal_write_time      | 51
>>>>> wal_sync_backend    | 7
>>>>> wal_sync_walwriter  | 8
>>>>> wal_sync_time       | 0
>>>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>>>> ```
>>>>> 
>>>>> 1. Basic statistics of WAL activity
>>>>> 
>>>>> - wal_records: Total number of WAL records generated
>>>>> - wal_fpi: Total number of WAL full page images generated
>>>>> - wal_bytes: Total amount of WAL bytes generated
>>>>> 
>>>>> To understand DB's performance, first, we will check the 
>>>>> performance
>>>>> trends for the entire database instance.
>>>>> For example, if the number of wal_fpi becomes higher, users may 
>>>>> tune
>>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>>> 
>>>>> Although users can check the above statistics via EXPLAIN, 
>>>>> auto_explain,
>>>>> autovacuum and pg_stat_statements now,
>>>>> if users want to see the performance trends  for the entire 
>>>>> database,
>>>>> they must recalculate the statistics.
>>>>> 
>>>>> I think it is useful to add the sum of the basic statistics.
>>>>> 
>>>>> 
>>>>> 2.  WAL segment file creation
>>>>> 
>>>>> - wal_init_file: Total number of WAL segment files created.
>>>>> 
>>>>> To create a new WAL file may have an impact on the performance of
>>>>> a write-heavy workload generating lots of WAL. If this number is 
>>>>> reported high,
>>>>> to reduce the number of this initialization, we can tune 
>>>>> WAL-related parameters
>>>>> so that more "recycled" WAL files can be held.
>>>>> 
>>>>> 
>>>>> 
>>>>> 3. Number of when WAL is flushed
>>>>> 
>>>>> - wal_write_backend : Total number of WAL data written to the disk 
>>>>> by backends
>>>>> - wal_write_walwriter : Total number of WAL data written to the 
>>>>> disk by walwriter
>>>>> - wal_sync_backend : Total number of WAL data synced to the disk by 
>>>>> backends
>>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk 
>>>>> by walwrite
>>>>> 
>>>>> I think it's useful for tuning "synchronous_commit" and 
>>>>> "commit_delay" for query executions.
>>>>> If the number of WAL is flushed is high, users can know 
>>>>> "synchronous_commit" is useful for the workload.
>>>> 
>>>> I just wonder how useful these counters are. Even without these 
>>>> counters,
>>>> we already know synchronous_commit=off is likely to cause the better
>>>> performance (but has the risk of data loss). So ISTM that these 
>>>> counters are
>>>> not so useful when tuning synchronous_commit.
>>> 
>>> Thanks, my understanding was wrong.
>>> I agreed that your comments.
>>> 
>>> I merged the statistics of *_backend and *_walwriter.
>>> I think the sum of them is useful to calculate the average per 
>>> write/sync time.
>>> For example, per write time is equals wal_write_time / wal_write.
>> 
>> Understood.
>> 
>> Thanks for updating the patch!
>> 
>> patching file src/include/catalog/pg_proc.dat
>> Hunk #1 FAILED at 5491.
>> 1 out of 1 hunk FAILED -- saving rejects to file 
>> src/include/catalog/pg_proc.dat.rej
>> 
>> I got this failure when applying the patch. Could you update the 
>> patch?
>> 
>> 
>> -       Number of times WAL data was written to the disk because WAL 
>> buffers got full
>> +       Total number of times WAL data written to the disk because WAL 
>> buffers got full
>> 
>> Isn't "was" necessary between "data" and "written"?
>> 
>> 
>> +      <entry role="catalog_table_entry"><para 
>> role="column_definition">
>> +       <structfield>wal_bytes</structfield> <type>bigint</type>
>> 
>> Shouldn't the type of wal_bytes be numeric because the total number of
>> WAL bytes can exceed the range of bigint? I think that the type of
>> pg_stat_statements.wal_bytes is also numeric for the same reason.
>> 
>> 
>> +      <entry role="catalog_table_entry"><para 
>> role="column_definition">
>> +       <structfield>wal_write_time</structfield> <type>bigint</type>
>> 
>> Shouldn't the type of wal_xxx_time be double precision,
>> like pg_stat_database.blk_write_time?
>> 
>> 
>> Even when fsync is set to off or wal_sync_method is set to open_sync,
>> wal_sync is incremented. Isn't this behavior confusing?
>> 
>> 
>> +       Total amount of time that has been spent in the portion of
>> +       WAL data was written to disk by backend and walwriter, in 
>> milliseconds
>> +       (if <xref linkend="guc-track-io-timing"/> is enabled, 
>> otherwise zero)
>> 
>> With the patch, track_io_timing controls both IO for data files and
>> WAL files. But we may want to track only either of them. So it's 
>> better
>> to extend track_io_timing so that we can specify the tracking target
>> in the parameter? For example, we can make track_io_timing accept
>> data, wal and all. Or we should introduce new GUC for WAL, e.g.,
>> track_wal_io_timing? Thought?
>> 
>> I'm afraid that "by backend and walwriter" part can make us thinkg
>> incorrectly that WAL writes by other processes like autovacuum
>> are not tracked.
> 
>  pgstat_send_wal(void)
>  {
> +    /* fill in some values using pgWalUsage */
> +    WalUsage walusage;
> +    memset(&walusage, 0, sizeof(WalUsage));
> +    WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);
> 
> At the first call to pgstat_send_wal(), prevWalUsage has not been set 
> to
> the previous value of pgWalUsage. So the calculation result of
> WalUsageAccumDiff() can be incorrect. To address this issue,
> prevWalUsage should be set to pgWalUsage or both should be initialized
> with 0 at the beginning of the process, for example?

I forgot to handle it, thanks.
Although I initialized it in pgstat_initialize(),
if there is better way please let me know.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Masahiro Ikeda
Date:
On 2020-11-12 14:58, Fujii Masao wrote:
> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>> On 2020-10-30 11:50, Fujii Masao wrote:
>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>> Hi,
>>>> 
>>>> Thanks for your comments and advice. I updated the patch.
>>>> 
>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>> <ikedamsh@oss.nttdata.com> wrote in
>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>> > of which is in performance-critical path especially in
>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>> > for that in your proposal.
>>>>> 
>>>>> We should avoid that duplication as possible even if the both 
>>>>> number
>>>>> are important.
>>>>> 
>>>>>> Also about performance, I thought there are few impacts because it
>>>>>> increments stats in memory. If I can implement to reuse 
>>>>>> pgWalUsage's
>>>>>> value which already collects these stats, there is no impact in
>>>>>> XLogInsertRecord.
>>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>>> value?
>>>>> 
>>>>> I don't think that works, but it would work that pgstat_send_wal()
>>>>> takes the difference of that values between two successive calls.
>>>>> 
>>>>> WalUsage prevWalUsage;
>>>>> ...
>>>>> pgstat_send_wal()
>>>>> {
>>>>> ..
>>>>>    /* fill in some values using pgWalUsage */
>>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - 
>>>>> prevWalUsage.wal_bytes;
>>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>>>> prevWalUsage.wal_records;
>>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - 
>>>>> prevWalUsage.wal_fpi;
>>>>> ...
>>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>> 
>>>>>    /* remember the current numbers */
>>>>>    prevWalUsage = pgWalUsage;
>>>> 
>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>> which is already defined and eliminates the extra overhead.
>>> 
>>> +    /* fill in some values using pgWalUsage */
>>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - 
>>> prevWalUsage.wal_bytes;
>>> +    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>> prevWalUsage.wal_records;
>>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
>>> 
>>> It's better to use WalUsageAccumDiff() here?
>> 
>> Yes, thanks. I fixed it.
>> 
>>> prevWalUsage needs to be initialized with pgWalUsage?
>>> 
>>> +                if (AmWalWriterProcess()){
>>> +                    WalStats.m_wal_write_walwriter++;
>>> +                }
>>> +                else
>>> +                {
>>> +                    WalStats.m_wal_write_backend++;
>>> +                }
>>> 
>>> I think that it's better not to separate m_wal_write_xxx into two for
>>> walwriter and other processes. Instead, we can use one 
>>> m_wal_write_xxx
>>> counter and make pgstat_send_wal() send also the process type to
>>> the stats collector. Then the stats collector can accumulate the 
>>> counters
>>> per process type if necessary. If we adopt this approach, we can 
>>> easily
>>> extend pg_stat_wal so that any fields can be reported per process 
>>> type.
>> 
>> I'll remove the above source code because these counters are not 
>> useful.
>> 
>> 
>> On 2020-10-30 12:00, Fujii Masao wrote:
>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>> Hi,
>>>> 
>>>> I think we need to add some statistics to pg_stat_wal view.
>>>> 
>>>> Although there are some parameter related WAL,
>>>> there are few statistics for tuning them.
>>>> 
>>>> I think it's better to provide the following statistics.
>>>> Please let me know your comments.
>>>> 
>>>> ```
>>>> postgres=# SELECT * from pg_stat_wal;
>>>> -[ RECORD 1 ]-------+------------------------------
>>>> wal_records         | 2000224
>>>> wal_fpi             | 47
>>>> wal_bytes           | 248216337
>>>> wal_buffers_full    | 20954
>>>> wal_init_file       | 8
>>>> wal_write_backend   | 20960
>>>> wal_write_walwriter | 46
>>>> wal_write_time      | 51
>>>> wal_sync_backend    | 7
>>>> wal_sync_walwriter  | 8
>>>> wal_sync_time       | 0
>>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>>> ```
>>>> 
>>>> 1. Basic statistics of WAL activity
>>>> 
>>>> - wal_records: Total number of WAL records generated
>>>> - wal_fpi: Total number of WAL full page images generated
>>>> - wal_bytes: Total amount of WAL bytes generated
>>>> 
>>>> To understand DB's performance, first, we will check the performance
>>>> trends for the entire database instance.
>>>> For example, if the number of wal_fpi becomes higher, users may tune
>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>> 
>>>> Although users can check the above statistics via EXPLAIN, 
>>>> auto_explain,
>>>> autovacuum and pg_stat_statements now,
>>>> if users want to see the performance trends  for the entire 
>>>> database,
>>>> they must recalculate the statistics.
>>>> 
>>>> I think it is useful to add the sum of the basic statistics.
>>>> 
>>>> 
>>>> 2.  WAL segment file creation
>>>> 
>>>> - wal_init_file: Total number of WAL segment files created.
>>>> 
>>>> To create a new WAL file may have an impact on the performance of
>>>> a write-heavy workload generating lots of WAL. If this number is 
>>>> reported high,
>>>> to reduce the number of this initialization, we can tune WAL-related 
>>>> parameters
>>>> so that more "recycled" WAL files can be held.
>>>> 
>>>> 
>>>> 
>>>> 3. Number of when WAL is flushed
>>>> 
>>>> - wal_write_backend : Total number of WAL data written to the disk 
>>>> by backends
>>>> - wal_write_walwriter : Total number of WAL data written to the disk 
>>>> by walwriter
>>>> - wal_sync_backend : Total number of WAL data synced to the disk by 
>>>> backends
>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk 
>>>> by walwrite
>>>> 
>>>> I think it's useful for tuning "synchronous_commit" and 
>>>> "commit_delay" for query executions.
>>>> If the number of WAL is flushed is high, users can know 
>>>> "synchronous_commit" is useful for the workload.
>>> 
>>> I just wonder how useful these counters are. Even without these 
>>> counters,
>>> we already know synchronous_commit=off is likely to cause the better
>>> performance (but has the risk of data loss). So ISTM that these 
>>> counters are
>>> not so useful when tuning synchronous_commit.
>> 
>> Thanks, my understanding was wrong.
>> I agreed that your comments.
>> 
>> I merged the statistics of *_backend and *_walwriter.
>> I think the sum of them is useful to calculate the average per 
>> write/sync time.
>> For example, per write time is equals wal_write_time / wal_write.
> 
> Understood.
> 
> Thanks for updating the patch!

Thanks for your comments.

> patching file src/include/catalog/pg_proc.dat
> Hunk #1 FAILED at 5491.
> 1 out of 1 hunk FAILED -- saving rejects to file
> src/include/catalog/pg_proc.dat.rej
> 
> I got this failure when applying the patch. Could you update the patch?

Thanks, I updated the patch.

> -       Number of times WAL data was written to the disk because WAL
> buffers got full
> +       Total number of times WAL data written to the disk because WAL
> buffers got full
> 
> Isn't "was" necessary between "data" and "written"?

Yes, I fixed it.

> +      <entry role="catalog_table_entry"><para 
> role="column_definition">
> +       <structfield>wal_bytes</structfield> <type>bigint</type>
> 
> Shouldn't the type of wal_bytes be numeric because the total number of
> WAL bytes can exceed the range of bigint? I think that the type of
> pg_stat_statements.wal_bytes is also numeric for the same reason.

Thanks, I fixed it.

Since I cast the type of wal_bytes from PgStat_Counter to uint64,
I changed the type of PgStat_MsgWal and PgStat_WalStats too.

> +      <entry role="catalog_table_entry"><para 
> role="column_definition">
> +       <structfield>wal_write_time</structfield> <type>bigint</type>
> 
> Shouldn't the type of wal_xxx_time be double precision,
> like pg_stat_database.blk_write_time?

Thanks, I changed it.

> Even when fsync is set to off or wal_sync_method is set to open_sync,
> wal_sync is incremented. Isn't this behavior confusing?
> 
> 
> +       Total amount of time that has been spent in the portion of
> +       WAL data was written to disk by backend and walwriter, in 
> milliseconds
> +       (if <xref linkend="guc-track-io-timing"/> is enabled, otherwise 
> zero)
> 
> With the patch, track_io_timing controls both IO for data files and
> WAL files. But we may want to track only either of them. So it's better
> to extend track_io_timing so that we can specify the tracking target
> in the parameter? For example, we can make track_io_timing accept
> data, wal and all. Or we should introduce new GUC for WAL, e.g.,
> track_wal_io_timing? Thought?

OK, I introduced the new GUC "track_wal_io_timing".

> I'm afraid that "by backend and walwriter" part can make us thinkg
> incorrectly that WAL writes by other processes like autovacuum
> are not tracked.

Sorry, I removed "by backend and walwriter".

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Masahiro Ikeda
Date:
On 2020-11-13 12:32, lchch1990@sina.cn wrote:
>> Now, pg_stat_wal supports reset all informantion in WalStats
>> using pg_stat_reset_shared('wal') function.
>> Isn't it enough?
> Yes it ok, sorry I miss this infomation.

OK.

>>> 3. I do not think it's a correct describe in document for
>>> 'wal_buffers_full'.
> 
>> Where should I rewrite the description? If my understanding is not
>> correct, please let me know.
> Sorry I have not described it clearly, because I can not understand
> the meaning of this
> column after I read the describe in document.
> And now I read the source code of walwrite and found the place where
> 'wal_buffers_full'
> added is for a backend to wait a wal buffer which is occupied by other
> wal page, so the
> backend flush the old page in the wal buffer(after wait it can).
> So i think the origin decribe in document is not so in point, we can
> describe it such as
> 'Total number of times WAL data written to the disk because a backend
> yelled a wal buffer
> for an advanced wal page.
> 
> Sorry if my understand is wrong.

Thanks for your comments.

You're understanding is almost the same as mine.
It describes when not only backends but also other backgrounds 
initialize a new wal page,
wal buffer's space is already used and there is no space.

> 'Total number of times WAL data written to the disk because a backend
> yelled a wal buffer for an advanced wal page'

Thanks for your suggestion.
I wondered that users may confuse about how to use "wal_buffers_full" 
and how to tune parameters.

I thought the reason which wal buffer has no space is
important for users to tune the wal_buffers parameter.

How about the following comments?

'Total number of times WAL data was written to the disk because WAL 
buffers got full
  when to initialize a new WAL page'


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Fujii Masao
Date:

On 2020/11/16 16:35, Masahiro Ikeda wrote:
> On 2020-11-12 14:58, Fujii Masao wrote:
>> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>>> On 2020-10-30 11:50, Fujii Masao wrote:
>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>>> Hi,
>>>>>
>>>>> Thanks for your comments and advice. I updated the patch.
>>>>>
>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>>> <ikedamsh@oss.nttdata.com> wrote in
>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>>> > of which is in performance-critical path especially in
>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>>> > for that in your proposal.
>>>>>>
>>>>>> We should avoid that duplication as possible even if the both number
>>>>>> are important.
>>>>>>
>>>>>>> Also about performance, I thought there are few impacts because it
>>>>>>> increments stats in memory. If I can implement to reuse pgWalUsage's
>>>>>>> value which already collects these stats, there is no impact in
>>>>>>> XLogInsertRecord.
>>>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>>>> value?
>>>>>>
>>>>>> I don't think that works, but it would work that pgstat_send_wal()
>>>>>> takes the difference of that values between two successive calls.
>>>>>>
>>>>>> WalUsage prevWalUsage;
>>>>>> ...
>>>>>> pgstat_send_wal()
>>>>>> {
>>>>>> ..
>>>>>>    /* fill in some values using pgWalUsage */
>>>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - prevWalUsage.wal_bytes;
>>>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - prevWalUsage.wal_fpi;
>>>>>> ...
>>>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>>>
>>>>>>    /* remember the current numbers */
>>>>>>    prevWalUsage = pgWalUsage;
>>>>>
>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>>> which is already defined and eliminates the extra overhead.
>>>>
>>>> +    /* fill in some values using pgWalUsage */
>>>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes;
>>>> +    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
>>>>
>>>> It's better to use WalUsageAccumDiff() here?
>>>
>>> Yes, thanks. I fixed it.
>>>
>>>> prevWalUsage needs to be initialized with pgWalUsage?
>>>>
>>>> +                if (AmWalWriterProcess()){
>>>> +                    WalStats.m_wal_write_walwriter++;
>>>> +                }
>>>> +                else
>>>> +                {
>>>> +                    WalStats.m_wal_write_backend++;
>>>> +                }
>>>>
>>>> I think that it's better not to separate m_wal_write_xxx into two for
>>>> walwriter and other processes. Instead, we can use one m_wal_write_xxx
>>>> counter and make pgstat_send_wal() send also the process type to
>>>> the stats collector. Then the stats collector can accumulate the counters
>>>> per process type if necessary. If we adopt this approach, we can easily
>>>> extend pg_stat_wal so that any fields can be reported per process type.
>>>
>>> I'll remove the above source code because these counters are not useful.
>>>
>>>
>>> On 2020-10-30 12:00, Fujii Masao wrote:
>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>>> Hi,
>>>>>
>>>>> I think we need to add some statistics to pg_stat_wal view.
>>>>>
>>>>> Although there are some parameter related WAL,
>>>>> there are few statistics for tuning them.
>>>>>
>>>>> I think it's better to provide the following statistics.
>>>>> Please let me know your comments.
>>>>>
>>>>> ```
>>>>> postgres=# SELECT * from pg_stat_wal;
>>>>> -[ RECORD 1 ]-------+------------------------------
>>>>> wal_records         | 2000224
>>>>> wal_fpi             | 47
>>>>> wal_bytes           | 248216337
>>>>> wal_buffers_full    | 20954
>>>>> wal_init_file       | 8
>>>>> wal_write_backend   | 20960
>>>>> wal_write_walwriter | 46
>>>>> wal_write_time      | 51
>>>>> wal_sync_backend    | 7
>>>>> wal_sync_walwriter  | 8
>>>>> wal_sync_time       | 0
>>>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>>>> ```
>>>>>
>>>>> 1. Basic statistics of WAL activity
>>>>>
>>>>> - wal_records: Total number of WAL records generated
>>>>> - wal_fpi: Total number of WAL full page images generated
>>>>> - wal_bytes: Total amount of WAL bytes generated
>>>>>
>>>>> To understand DB's performance, first, we will check the performance
>>>>> trends for the entire database instance.
>>>>> For example, if the number of wal_fpi becomes higher, users may tune
>>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>>>
>>>>> Although users can check the above statistics via EXPLAIN, auto_explain,
>>>>> autovacuum and pg_stat_statements now,
>>>>> if users want to see the performance trends  for the entire database,
>>>>> they must recalculate the statistics.
>>>>>
>>>>> I think it is useful to add the sum of the basic statistics.
>>>>>
>>>>>
>>>>> 2.  WAL segment file creation
>>>>>
>>>>> - wal_init_file: Total number of WAL segment files created.
>>>>>
>>>>> To create a new WAL file may have an impact on the performance of
>>>>> a write-heavy workload generating lots of WAL. If this number is reported high,
>>>>> to reduce the number of this initialization, we can tune WAL-related parameters
>>>>> so that more "recycled" WAL files can be held.
>>>>>
>>>>>
>>>>>
>>>>> 3. Number of when WAL is flushed
>>>>>
>>>>> - wal_write_backend : Total number of WAL data written to the disk by backends
>>>>> - wal_write_walwriter : Total number of WAL data written to the disk by walwriter
>>>>> - wal_sync_backend : Total number of WAL data synced to the disk by backends
>>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk by walwrite
>>>>>
>>>>> I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions.
>>>>> If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload.
>>>>
>>>> I just wonder how useful these counters are. Even without these counters,
>>>> we already know synchronous_commit=off is likely to cause the better
>>>> performance (but has the risk of data loss). So ISTM that these counters are
>>>> not so useful when tuning synchronous_commit.
>>>
>>> Thanks, my understanding was wrong.
>>> I agreed that your comments.
>>>
>>> I merged the statistics of *_backend and *_walwriter.
>>> I think the sum of them is useful to calculate the average per write/sync time.
>>> For example, per write time is equals wal_write_time / wal_write.
>>
>> Understood.
>>
>> Thanks for updating the patch!
> 
> Thanks for your comments.
> 
>> patching file src/include/catalog/pg_proc.dat
>> Hunk #1 FAILED at 5491.
>> 1 out of 1 hunk FAILED -- saving rejects to file
>> src/include/catalog/pg_proc.dat.rej
>>
>> I got this failure when applying the patch. Could you update the patch?
> 
> Thanks, I updated the patch.
> 
>> -       Number of times WAL data was written to the disk because WAL
>> buffers got full
>> +       Total number of times WAL data written to the disk because WAL
>> buffers got full
>>
>> Isn't "was" necessary between "data" and "written"?
> 
> Yes, I fixed it.
> 
>> +      <entry role="catalog_table_entry"><para role="column_definition">
>> +       <structfield>wal_bytes</structfield> <type>bigint</type>
>>
>> Shouldn't the type of wal_bytes be numeric because the total number of
>> WAL bytes can exceed the range of bigint? I think that the type of
>> pg_stat_statements.wal_bytes is also numeric for the same reason.
> 
> Thanks, I fixed it.
> 
> Since I cast the type of wal_bytes from PgStat_Counter to uint64,
> I changed the type of PgStat_MsgWal and PgStat_WalStats too.
> 
>> +      <entry role="catalog_table_entry"><para role="column_definition">
>> +       <structfield>wal_write_time</structfield> <type>bigint</type>
>>
>> Shouldn't the type of wal_xxx_time be double precision,
>> like pg_stat_database.blk_write_time?
> 
> Thanks, I changed it.
> 
>> Even when fsync is set to off or wal_sync_method is set to open_sync,
>> wal_sync is incremented. Isn't this behavior confusing?

What do you think about this comment?

I found that we discussed track-WAL-IO-timing feature at the past discussion
about the similar feature [1]. But the feature was droppped from the proposal
patch because there was the performance concern. So probably we need to
revisit the past discussion and benchmark the performance. Thought?

If track-WAL-IO-timing feature may cause performance regression,
it might be an idea to extract wal_records, wal_fpi and wal_bytes parts
from the patch and commit it at first.

[1]
https://postgr.es/m/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST+vwJcFtCSCEySnA@mail.gmail.com


>>
>>
>> +       Total amount of time that has been spent in the portion of
>> +       WAL data was written to disk by backend and walwriter, in milliseconds
>> +       (if <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
>>
>> With the patch, track_io_timing controls both IO for data files and
>> WAL files. But we may want to track only either of them. So it's better
>> to extend track_io_timing so that we can specify the tracking target
>> in the parameter? For example, we can make track_io_timing accept
>> data, wal and all. Or we should introduce new GUC for WAL, e.g.,
>> track_wal_io_timing? Thought?
> 
> OK, I introduced the new GUC "track_wal_io_timing".
> 
>> I'm afraid that "by backend and walwriter" part can make us thinkg
>> incorrectly that WAL writes by other processes like autovacuum
>> are not tracked.
> 
> Sorry, I removed "by backend and walwriter".

Thanks for updating the patch!

+WalUsage prevWalUsage;

ISTM that we can declare this as static variable because
it's used only in pgstat.c.

+    memset(&walusage, 0, sizeof(WalUsage));
+    WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);

This memset seems unnecessary.

      /* We assume this initializes to zeroes */
      static const PgStat_MsgWal all_zeroes;

This declaration of the variable should be placed around
the top of pgstat_send_wal().

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Fujii Masao
Date:

On 2020/11/16 18:24, Masahiro Ikeda wrote:
> On 2020-11-13 12:32, lchch1990@sina.cn wrote:
>>> Now, pg_stat_wal supports reset all informantion in WalStats
>>> using pg_stat_reset_shared('wal') function.
>>> Isn't it enough?
>> Yes it ok, sorry I miss this infomation.
> 
> OK.
> 
>>>> 3. I do not think it's a correct describe in document for
>>>> 'wal_buffers_full'.
>>
>>> Where should I rewrite the description? If my understanding is not
>>> correct, please let me know.
>> Sorry I have not described it clearly, because I can not understand
>> the meaning of this
>> column after I read the describe in document.
>> And now I read the source code of walwrite and found the place where
>> 'wal_buffers_full'
>> added is for a backend to wait a wal buffer which is occupied by other
>> wal page, so the
>> backend flush the old page in the wal buffer(after wait it can).
>> So i think the origin decribe in document is not so in point, we can
>> describe it such as
>> 'Total number of times WAL data written to the disk because a backend
>> yelled a wal buffer
>> for an advanced wal page.
>>
>> Sorry if my understand is wrong.
> 
> Thanks for your comments.
> 
> You're understanding is almost the same as mine.
> It describes when not only backends but also other backgrounds initialize a new wal page,
> wal buffer's space is already used and there is no space.
> 
>> 'Total number of times WAL data written to the disk because a backend
>> yelled a wal buffer for an advanced wal page'
> 
> Thanks for your suggestion.
> I wondered that users may confuse about how to use "wal_buffers_full" and how to tune parameters.
> 
> I thought the reason which wal buffer has no space is
> important for users to tune the wal_buffers parameter.
> 
> How about the following comments?
> 
> 'Total number of times WAL data was written to the disk because WAL buffers got full
>   when to initialize a new WAL page'

Or what about the following?

Total number of times WAL data was written to the disk, to claim the buffer page to insert new WAL data when the WAL
buffersgot filled up with unwritten WAL data.
 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
"lchch1990@sina.cn"
Date:

>> Thanks for your comments.
>>
>> You're understanding is almost the same as mine.
>> It describes when not only backends but also other backgrounds initialize a new wal page,
>> wal buffer's space is already used and there is no space.
>>
>>> 'Total number of times WAL data written to the disk because a backend
>>> yelled a wal buffer for an advanced wal page'
>>
>> Thanks for your suggestion.
>> I wondered that users may confuse about how to use "wal_buffers_full" and how to tune parameters.
>>
>> I thought the reason which wal buffer has no space is
>> important for users to tune the wal_buffers parameter.
>>
>> How about the following comments?
>>
>> 'Total number of times WAL data was written to the disk because WAL buffers got full
>>   when to initialize a new WAL page'
>Or what about the following?
>Total number of times WAL data was written to the disk, to claim the buffer page to insert new
>WAL data when the WAL buffers got filled up with unwritten WAL data.
As my understand we can not say 'full' because every wal page mapped a special wal buffer slot.
When a wal page need to be write, but the buffer slot was occupied by other wal page. It need to
wait the wal buffer slot released. So i think we should say it 'occupied' not 'full'.

Maybe:
Total number of times WAL data was written to the disk, to claim the buffer page to insert new
WAL data when the special WAL buffer occupied by other page.


Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Masahiro Ikeda
Date:
On 2020-11-17 11:46, Fujii Masao wrote:
> On 2020/11/16 16:35, Masahiro Ikeda wrote:
>> On 2020-11-12 14:58, Fujii Masao wrote:
>>> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>>>> On 2020-10-30 11:50, Fujii Masao wrote:
>>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> Thanks for your comments and advice. I updated the patch.
>>>>>> 
>>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>>>> <ikedamsh@oss.nttdata.com> wrote in
>>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>>>> > of which is in performance-critical path especially in
>>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>>>> > for that in your proposal.
>>>>>>> 
>>>>>>> We should avoid that duplication as possible even if the both 
>>>>>>> number
>>>>>>> are important.
>>>>>>> 
>>>>>>>> Also about performance, I thought there are few impacts because 
>>>>>>>> it
>>>>>>>> increments stats in memory. If I can implement to reuse 
>>>>>>>> pgWalUsage's
>>>>>>>> value which already collects these stats, there is no impact in
>>>>>>>> XLogInsertRecord.
>>>>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>>>>> value?
>>>>>>> 
>>>>>>> I don't think that works, but it would work that 
>>>>>>> pgstat_send_wal()
>>>>>>> takes the difference of that values between two successive calls.
>>>>>>> 
>>>>>>> WalUsage prevWalUsage;
>>>>>>> ...
>>>>>>> pgstat_send_wal()
>>>>>>> {
>>>>>>> ..
>>>>>>>    /* fill in some values using pgWalUsage */
>>>>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - 
>>>>>>> prevWalUsage.wal_bytes;
>>>>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>>>>>> prevWalUsage.wal_records;
>>>>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - 
>>>>>>> prevWalUsage.wal_fpi;
>>>>>>> ...
>>>>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>>>> 
>>>>>>>    /* remember the current numbers */
>>>>>>>    prevWalUsage = pgWalUsage;
>>>>>> 
>>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>>>> which is already defined and eliminates the extra overhead.
>>>>> 
>>>>> +    /* fill in some values using pgWalUsage */
>>>>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - 
>>>>> prevWalUsage.wal_bytes;
>>>>> +    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>>>> prevWalUsage.wal_records;
>>>>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - 
>>>>> prevWalUsage.wal_fpi;
>>>>> 
>>>>> It's better to use WalUsageAccumDiff() here?
>>>> 
>>>> Yes, thanks. I fixed it.
>>>> 
>>>>> prevWalUsage needs to be initialized with pgWalUsage?
>>>>> 
>>>>> +                if (AmWalWriterProcess()){
>>>>> +                    WalStats.m_wal_write_walwriter++;
>>>>> +                }
>>>>> +                else
>>>>> +                {
>>>>> +                    WalStats.m_wal_write_backend++;
>>>>> +                }
>>>>> 
>>>>> I think that it's better not to separate m_wal_write_xxx into two 
>>>>> for
>>>>> walwriter and other processes. Instead, we can use one 
>>>>> m_wal_write_xxx
>>>>> counter and make pgstat_send_wal() send also the process type to
>>>>> the stats collector. Then the stats collector can accumulate the 
>>>>> counters
>>>>> per process type if necessary. If we adopt this approach, we can 
>>>>> easily
>>>>> extend pg_stat_wal so that any fields can be reported per process 
>>>>> type.
>>>> 
>>>> I'll remove the above source code because these counters are not 
>>>> useful.
>>>> 
>>>> 
>>>> On 2020-10-30 12:00, Fujii Masao wrote:
>>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I think we need to add some statistics to pg_stat_wal view.
>>>>>> 
>>>>>> Although there are some parameter related WAL,
>>>>>> there are few statistics for tuning them.
>>>>>> 
>>>>>> I think it's better to provide the following statistics.
>>>>>> Please let me know your comments.
>>>>>> 
>>>>>> ```
>>>>>> postgres=# SELECT * from pg_stat_wal;
>>>>>> -[ RECORD 1 ]-------+------------------------------
>>>>>> wal_records         | 2000224
>>>>>> wal_fpi             | 47
>>>>>> wal_bytes           | 248216337
>>>>>> wal_buffers_full    | 20954
>>>>>> wal_init_file       | 8
>>>>>> wal_write_backend   | 20960
>>>>>> wal_write_walwriter | 46
>>>>>> wal_write_time      | 51
>>>>>> wal_sync_backend    | 7
>>>>>> wal_sync_walwriter  | 8
>>>>>> wal_sync_time       | 0
>>>>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>>>>> ```
>>>>>> 
>>>>>> 1. Basic statistics of WAL activity
>>>>>> 
>>>>>> - wal_records: Total number of WAL records generated
>>>>>> - wal_fpi: Total number of WAL full page images generated
>>>>>> - wal_bytes: Total amount of WAL bytes generated
>>>>>> 
>>>>>> To understand DB's performance, first, we will check the 
>>>>>> performance
>>>>>> trends for the entire database instance.
>>>>>> For example, if the number of wal_fpi becomes higher, users may 
>>>>>> tune
>>>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>>>> 
>>>>>> Although users can check the above statistics via EXPLAIN, 
>>>>>> auto_explain,
>>>>>> autovacuum and pg_stat_statements now,
>>>>>> if users want to see the performance trends  for the entire 
>>>>>> database,
>>>>>> they must recalculate the statistics.
>>>>>> 
>>>>>> I think it is useful to add the sum of the basic statistics.
>>>>>> 
>>>>>> 
>>>>>> 2.  WAL segment file creation
>>>>>> 
>>>>>> - wal_init_file: Total number of WAL segment files created.
>>>>>> 
>>>>>> To create a new WAL file may have an impact on the performance of
>>>>>> a write-heavy workload generating lots of WAL. If this number is 
>>>>>> reported high,
>>>>>> to reduce the number of this initialization, we can tune 
>>>>>> WAL-related parameters
>>>>>> so that more "recycled" WAL files can be held.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 3. Number of when WAL is flushed
>>>>>> 
>>>>>> - wal_write_backend : Total number of WAL data written to the disk 
>>>>>> by backends
>>>>>> - wal_write_walwriter : Total number of WAL data written to the 
>>>>>> disk by walwriter
>>>>>> - wal_sync_backend : Total number of WAL data synced to the disk 
>>>>>> by backends
>>>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk 
>>>>>> by walwrite
>>>>>> 
>>>>>> I think it's useful for tuning "synchronous_commit" and 
>>>>>> "commit_delay" for query executions.
>>>>>> If the number of WAL is flushed is high, users can know 
>>>>>> "synchronous_commit" is useful for the workload.
>>>>> 
>>>>> I just wonder how useful these counters are. Even without these 
>>>>> counters,
>>>>> we already know synchronous_commit=off is likely to cause the 
>>>>> better
>>>>> performance (but has the risk of data loss). So ISTM that these 
>>>>> counters are
>>>>> not so useful when tuning synchronous_commit.
>>>> 
>>>> Thanks, my understanding was wrong.
>>>> I agreed that your comments.
>>>> 
>>>> I merged the statistics of *_backend and *_walwriter.
>>>> I think the sum of them is useful to calculate the average per 
>>>> write/sync time.
>>>> For example, per write time is equals wal_write_time / wal_write.
>>> 
>>> Understood.
>>> 
>>> Thanks for updating the patch!
>> 
>> Thanks for your comments.
>> 
>>> patching file src/include/catalog/pg_proc.dat
>>> Hunk #1 FAILED at 5491.
>>> 1 out of 1 hunk FAILED -- saving rejects to file
>>> src/include/catalog/pg_proc.dat.rej
>>> 
>>> I got this failure when applying the patch. Could you update the 
>>> patch?
>> 
>> Thanks, I updated the patch.
>> 
>>> -       Number of times WAL data was written to the disk because WAL
>>> buffers got full
>>> +       Total number of times WAL data written to the disk because 
>>> WAL
>>> buffers got full
>>> 
>>> Isn't "was" necessary between "data" and "written"?
>> 
>> Yes, I fixed it.
>> 
>>> +      <entry role="catalog_table_entry"><para 
>>> role="column_definition">
>>> +       <structfield>wal_bytes</structfield> <type>bigint</type>
>>> 
>>> Shouldn't the type of wal_bytes be numeric because the total number 
>>> of
>>> WAL bytes can exceed the range of bigint? I think that the type of
>>> pg_stat_statements.wal_bytes is also numeric for the same reason.
>> 
>> Thanks, I fixed it.
>> 
>> Since I cast the type of wal_bytes from PgStat_Counter to uint64,
>> I changed the type of PgStat_MsgWal and PgStat_WalStats too.
>> 
>>> +      <entry role="catalog_table_entry"><para 
>>> role="column_definition">
>>> +       <structfield>wal_write_time</structfield> <type>bigint</type>
>>> 
>>> Shouldn't the type of wal_xxx_time be double precision,
>>> like pg_stat_database.blk_write_time?
>> 
>> Thanks, I changed it.
>> 
>>> Even when fsync is set to off or wal_sync_method is set to open_sync,
>>> wal_sync is incremented. Isn't this behavior confusing?
> 
> What do you think about this comment?

Sorry, I'll change to increment wal_sync and wal_sync_time only
if a specific fsync method is called.

> I found that we discussed track-WAL-IO-timing feature at the past 
> discussion
> about the similar feature [1]. But the feature was droppped from the 
> proposal
> patch because there was the performance concern. So probably we need to
> revisit the past discussion and benchmark the performance. Thought?
> 
> If track-WAL-IO-timing feature may cause performance regression,
> it might be an idea to extract wal_records, wal_fpi and wal_bytes parts
> from the patch and commit it at first.
> 
> [1]
> https://postgr.es/m/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST+vwJcFtCSCEySnA@mail.gmail.com

Thanks, I'll check the thread.
I agree to add basic statistics at first and I attached the patch.

>>> 
>>> 
>>> +       Total amount of time that has been spent in the portion of
>>> +       WAL data was written to disk by backend and walwriter, in 
>>> milliseconds
>>> +       (if <xref linkend="guc-track-io-timing"/> is enabled, 
>>> otherwise zero)
>>> 
>>> With the patch, track_io_timing controls both IO for data files and
>>> WAL files. But we may want to track only either of them. So it's 
>>> better
>>> to extend track_io_timing so that we can specify the tracking target
>>> in the parameter? For example, we can make track_io_timing accept
>>> data, wal and all. Or we should introduce new GUC for WAL, e.g.,
>>> track_wal_io_timing? Thought?
>> 
>> OK, I introduced the new GUC "track_wal_io_timing".
>> 
>>> I'm afraid that "by backend and walwriter" part can make us thinkg
>>> incorrectly that WAL writes by other processes like autovacuum
>>> are not tracked.
>> 
>> Sorry, I removed "by backend and walwriter".
> 
> Thanks for updating the patch!
> 
> +WalUsage prevWalUsage;
> 
> ISTM that we can declare this as static variable because
> it's used only in pgstat.c.

Thanks, I fixed it.

> +    memset(&walusage, 0, sizeof(WalUsage));
> +    WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);
> 
> This memset seems unnecessary.

I couldn't understand why this memset is unnecessary.
Since WalUsageAccumDiff not only calculates the difference but also adds 
the value,
I thought walusage needs to be initialized.


>      /* We assume this initializes to zeroes */
>      static const PgStat_MsgWal all_zeroes;
> 
> This declaration of the variable should be placed around
> the top of pgstat_send_wal().

Sorry, I fixed it.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Masahiro Ikeda
Date:
On 2020-11-17 12:53, lchch1990@sina.cn wrote:
>>> Thanks for your comments.
>>> 
>>> You're understanding is almost the same as mine.
>>> It describes when not only backends but also other backgrounds
> initialize a new wal page,
>>> wal buffer's space is already used and there is no space.
>>> 
>>>> 'Total number of times WAL data written to the disk because a
> backend
>>>> yelled a wal buffer for an advanced wal page'
>>> 
>>> Thanks for your suggestion.
>>> I wondered that users may confuse about how to use
> "wal_buffers_full" and how to tune parameters.
>>> 
>>> I thought the reason which wal buffer has no space is
>>> important for users to tune the wal_buffers parameter.
>>> 
>>> How about the following comments?
>>> 
>>> 'Total number of times WAL data was written to the disk because WAL
> buffers got full
>>>   when to initialize a new WAL page'
>> Or what about the following?
>> Total number of times WAL data was written to the disk, to claim the
> buffer page to insert new
>> WAL data when the WAL buffers got filled up with unwritten WAL data.
> As my understand we can not say 'full' because every wal page mapped a
> special wal buffer slot.
> When a wal page need to be write, but the buffer slot was occupied by
> other wal page. It need to
> wait the wal buffer slot released. So i think we should say it
> 'occupied' not 'full'.
> 
> Maybe:
> Total number of times WAL data was written to the disk, to claim the
> buffer page to insert new
> WAL data when the special WAL buffer occupied by other page.

OK, I will change the above sentence since there are some sentences
like "space occupied by", "disk blocks occupied", and so on in the 
documents.

Do we need to change the column name from "wal_buffers_full"
to another name like "wal_buffers_all_occupied"?

Regards
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Fujii Masao
Date:

On 2020/11/19 16:31, Masahiro Ikeda wrote:
> On 2020-11-17 11:46, Fujii Masao wrote:
>> On 2020/11/16 16:35, Masahiro Ikeda wrote:
>>> On 2020-11-12 14:58, Fujii Masao wrote:
>>>> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>>>>> On 2020-10-30 11:50, Fujii Masao wrote:
>>>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Thanks for your comments and advice. I updated the patch.
>>>>>>>
>>>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>>>>> <ikedamsh@oss.nttdata.com> wrote in
>>>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>>>>> > of which is in performance-critical path especially in
>>>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>>>>> > for that in your proposal.
>>>>>>>>
>>>>>>>> We should avoid that duplication as possible even if the both number
>>>>>>>> are important.
>>>>>>>>
>>>>>>>>> Also about performance, I thought there are few impacts because it
>>>>>>>>> increments stats in memory. If I can implement to reuse pgWalUsage's
>>>>>>>>> value which already collects these stats, there is no impact in
>>>>>>>>> XLogInsertRecord.
>>>>>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>>>>>> value?
>>>>>>>>
>>>>>>>> I don't think that works, but it would work that pgstat_send_wal()
>>>>>>>> takes the difference of that values between two successive calls.
>>>>>>>>
>>>>>>>> WalUsage prevWalUsage;
>>>>>>>> ...
>>>>>>>> pgstat_send_wal()
>>>>>>>> {
>>>>>>>> ..
>>>>>>>>    /* fill in some values using pgWalUsage */
>>>>>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - prevWalUsage.wal_bytes;
>>>>>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>>>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - prevWalUsage.wal_fpi;
>>>>>>>> ...
>>>>>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>>>>>
>>>>>>>>    /* remember the current numbers */
>>>>>>>>    prevWalUsage = pgWalUsage;
>>>>>>>
>>>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>>>>> which is already defined and eliminates the extra overhead.
>>>>>>
>>>>>> +    /* fill in some values using pgWalUsage */
>>>>>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes;
>>>>>> +    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>>>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
>>>>>>
>>>>>> It's better to use WalUsageAccumDiff() here?
>>>>>
>>>>> Yes, thanks. I fixed it.
>>>>>
>>>>>> prevWalUsage needs to be initialized with pgWalUsage?
>>>>>>
>>>>>> +                if (AmWalWriterProcess()){
>>>>>> +                    WalStats.m_wal_write_walwriter++;
>>>>>> +                }
>>>>>> +                else
>>>>>> +                {
>>>>>> +                    WalStats.m_wal_write_backend++;
>>>>>> +                }
>>>>>>
>>>>>> I think that it's better not to separate m_wal_write_xxx into two for
>>>>>> walwriter and other processes. Instead, we can use one m_wal_write_xxx
>>>>>> counter and make pgstat_send_wal() send also the process type to
>>>>>> the stats collector. Then the stats collector can accumulate the counters
>>>>>> per process type if necessary. If we adopt this approach, we can easily
>>>>>> extend pg_stat_wal so that any fields can be reported per process type.
>>>>>
>>>>> I'll remove the above source code because these counters are not useful.
>>>>>
>>>>>
>>>>> On 2020-10-30 12:00, Fujii Masao wrote:
>>>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I think we need to add some statistics to pg_stat_wal view.
>>>>>>>
>>>>>>> Although there are some parameter related WAL,
>>>>>>> there are few statistics for tuning them.
>>>>>>>
>>>>>>> I think it's better to provide the following statistics.
>>>>>>> Please let me know your comments.
>>>>>>>
>>>>>>> ```
>>>>>>> postgres=# SELECT * from pg_stat_wal;
>>>>>>> -[ RECORD 1 ]-------+------------------------------
>>>>>>> wal_records         | 2000224
>>>>>>> wal_fpi             | 47
>>>>>>> wal_bytes           | 248216337
>>>>>>> wal_buffers_full    | 20954
>>>>>>> wal_init_file       | 8
>>>>>>> wal_write_backend   | 20960
>>>>>>> wal_write_walwriter | 46
>>>>>>> wal_write_time      | 51
>>>>>>> wal_sync_backend    | 7
>>>>>>> wal_sync_walwriter  | 8
>>>>>>> wal_sync_time       | 0
>>>>>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>>>>>> ```
>>>>>>>
>>>>>>> 1. Basic statistics of WAL activity
>>>>>>>
>>>>>>> - wal_records: Total number of WAL records generated
>>>>>>> - wal_fpi: Total number of WAL full page images generated
>>>>>>> - wal_bytes: Total amount of WAL bytes generated
>>>>>>>
>>>>>>> To understand DB's performance, first, we will check the performance
>>>>>>> trends for the entire database instance.
>>>>>>> For example, if the number of wal_fpi becomes higher, users may tune
>>>>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>>>>>
>>>>>>> Although users can check the above statistics via EXPLAIN, auto_explain,
>>>>>>> autovacuum and pg_stat_statements now,
>>>>>>> if users want to see the performance trends  for the entire database,
>>>>>>> they must recalculate the statistics.
>>>>>>>
>>>>>>> I think it is useful to add the sum of the basic statistics.
>>>>>>>
>>>>>>>
>>>>>>> 2.  WAL segment file creation
>>>>>>>
>>>>>>> - wal_init_file: Total number of WAL segment files created.
>>>>>>>
>>>>>>> To create a new WAL file may have an impact on the performance of
>>>>>>> a write-heavy workload generating lots of WAL. If this number is reported high,
>>>>>>> to reduce the number of this initialization, we can tune WAL-related parameters
>>>>>>> so that more "recycled" WAL files can be held.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 3. Number of when WAL is flushed
>>>>>>>
>>>>>>> - wal_write_backend : Total number of WAL data written to the disk by backends
>>>>>>> - wal_write_walwriter : Total number of WAL data written to the disk by walwriter
>>>>>>> - wal_sync_backend : Total number of WAL data synced to the disk by backends
>>>>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk by walwrite
>>>>>>>
>>>>>>> I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions.
>>>>>>> If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload.
>>>>>>
>>>>>> I just wonder how useful these counters are. Even without these counters,
>>>>>> we already know synchronous_commit=off is likely to cause the better
>>>>>> performance (but has the risk of data loss). So ISTM that these counters are
>>>>>> not so useful when tuning synchronous_commit.
>>>>>
>>>>> Thanks, my understanding was wrong.
>>>>> I agreed that your comments.
>>>>>
>>>>> I merged the statistics of *_backend and *_walwriter.
>>>>> I think the sum of them is useful to calculate the average per write/sync time.
>>>>> For example, per write time is equals wal_write_time / wal_write.
>>>>
>>>> Understood.
>>>>
>>>> Thanks for updating the patch!
>>>
>>> Thanks for your comments.
>>>
>>>> patching file src/include/catalog/pg_proc.dat
>>>> Hunk #1 FAILED at 5491.
>>>> 1 out of 1 hunk FAILED -- saving rejects to file
>>>> src/include/catalog/pg_proc.dat.rej
>>>>
>>>> I got this failure when applying the patch. Could you update the patch?
>>>
>>> Thanks, I updated the patch.
>>>
>>>> -       Number of times WAL data was written to the disk because WAL
>>>> buffers got full
>>>> +       Total number of times WAL data written to the disk because WAL
>>>> buffers got full
>>>>
>>>> Isn't "was" necessary between "data" and "written"?
>>>
>>> Yes, I fixed it.
>>>
>>>> +      <entry role="catalog_table_entry"><para role="column_definition">
>>>> +       <structfield>wal_bytes</structfield> <type>bigint</type>
>>>>
>>>> Shouldn't the type of wal_bytes be numeric because the total number of
>>>> WAL bytes can exceed the range of bigint? I think that the type of
>>>> pg_stat_statements.wal_bytes is also numeric for the same reason.
>>>
>>> Thanks, I fixed it.
>>>
>>> Since I cast the type of wal_bytes from PgStat_Counter to uint64,
>>> I changed the type of PgStat_MsgWal and PgStat_WalStats too.
>>>
>>>> +      <entry role="catalog_table_entry"><para role="column_definition">
>>>> +       <structfield>wal_write_time</structfield> <type>bigint</type>
>>>>
>>>> Shouldn't the type of wal_xxx_time be double precision,
>>>> like pg_stat_database.blk_write_time?
>>>
>>> Thanks, I changed it.
>>>
>>>> Even when fsync is set to off or wal_sync_method is set to open_sync,
>>>> wal_sync is incremented. Isn't this behavior confusing?
>>
>> What do you think about this comment?
> 
> Sorry, I'll change to increment wal_sync and wal_sync_time only
> if a specific fsync method is called.
> 
>> I found that we discussed track-WAL-IO-timing feature at the past discussion
>> about the similar feature [1]. But the feature was droppped from the proposal
>> patch because there was the performance concern. So probably we need to
>> revisit the past discussion and benchmark the performance. Thought?
>>
>> If track-WAL-IO-timing feature may cause performance regression,
>> it might be an idea to extract wal_records, wal_fpi and wal_bytes parts
>> from the patch and commit it at first.
>>
>> [1]
>> https://postgr.es/m/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST+vwJcFtCSCEySnA@mail.gmail.com
> 
> Thanks, I'll check the thread.
> I agree to add basic statistics at first and I attached the patch.

Thanks!

+        /* Send WAL statistics */
+        pgstat_send_wal();

This is not necessary because walwriter generates no WAL data?

> 
>>>>
>>>>
>>>> +       Total amount of time that has been spent in the portion of
>>>> +       WAL data was written to disk by backend and walwriter, in milliseconds
>>>> +       (if <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
>>>>
>>>> With the patch, track_io_timing controls both IO for data files and
>>>> WAL files. But we may want to track only either of them. So it's better
>>>> to extend track_io_timing so that we can specify the tracking target
>>>> in the parameter? For example, we can make track_io_timing accept
>>>> data, wal and all. Or we should introduce new GUC for WAL, e.g.,
>>>> track_wal_io_timing? Thought?
>>>
>>> OK, I introduced the new GUC "track_wal_io_timing".
>>>
>>>> I'm afraid that "by backend and walwriter" part can make us thinkg
>>>> incorrectly that WAL writes by other processes like autovacuum
>>>> are not tracked.
>>>
>>> Sorry, I removed "by backend and walwriter".
>>
>> Thanks for updating the patch!
>>
>> +WalUsage prevWalUsage;
>>
>> ISTM that we can declare this as static variable because
>> it's used only in pgstat.c.
> 
> Thanks, I fixed it.
> 
>> +    memset(&walusage, 0, sizeof(WalUsage));
>> +    WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);
>>
>> This memset seems unnecessary.
> 
> I couldn't understand why this memset is unnecessary.
> Since WalUsageAccumDiff not only calculates the difference but also adds the value,
> I thought walusage needs to be initialized.

Yes, you're right! Sorry for noise...

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Masahiro Ikeda
Date:
On 2020-11-25 20:19, Fujii Masao wrote:
> On 2020/11/19 16:31, Masahiro Ikeda wrote:
>> On 2020-11-17 11:46, Fujii Masao wrote:
>>> On 2020/11/16 16:35, Masahiro Ikeda wrote:
>>>> On 2020-11-12 14:58, Fujii Masao wrote:
>>>>> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>>>>>> On 2020-10-30 11:50, Fujii Masao wrote:
>>>>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Thanks for your comments and advice. I updated the patch.
>>>>>>>> 
>>>>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>>>>>> <ikedamsh@oss.nttdata.com> wrote in
>>>>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>>>>>> > of which is in performance-critical path especially in
>>>>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>>>>>> > for that in your proposal.
>>>>>>>>> 
>>>>>>>>> We should avoid that duplication as possible even if the both 
>>>>>>>>> number
>>>>>>>>> are important.
>>>>>>>>> 
>>>>>>>>>> Also about performance, I thought there are few impacts 
>>>>>>>>>> because it
>>>>>>>>>> increments stats in memory. If I can implement to reuse 
>>>>>>>>>> pgWalUsage's
>>>>>>>>>> value which already collects these stats, there is no impact 
>>>>>>>>>> in
>>>>>>>>>> XLogInsertRecord.
>>>>>>>>>> For example, how about pg_stat_wal() calculates the 
>>>>>>>>>> accumulated
>>>>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use 
>>>>>>>>>> pgWalUsage's
>>>>>>>>>> value?
>>>>>>>>> 
>>>>>>>>> I don't think that works, but it would work that 
>>>>>>>>> pgstat_send_wal()
>>>>>>>>> takes the difference of that values between two successive 
>>>>>>>>> calls.
>>>>>>>>> 
>>>>>>>>> WalUsage prevWalUsage;
>>>>>>>>> ...
>>>>>>>>> pgstat_send_wal()
>>>>>>>>> {
>>>>>>>>> ..
>>>>>>>>>    /* fill in some values using pgWalUsage */
>>>>>>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - 
>>>>>>>>> prevWalUsage.wal_bytes;
>>>>>>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>>>>>>>> prevWalUsage.wal_records;
>>>>>>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - 
>>>>>>>>> prevWalUsage.wal_fpi;
>>>>>>>>> ...
>>>>>>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>>>>>> 
>>>>>>>>>    /* remember the current numbers */
>>>>>>>>>    prevWalUsage = pgWalUsage;
>>>>>>>> 
>>>>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>>>>>> which is already defined and eliminates the extra overhead.
>>>>>>> 
>>>>>>> +    /* fill in some values using pgWalUsage */
>>>>>>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - 
>>>>>>> prevWalUsage.wal_bytes;
>>>>>>> +    WalStats.m_wal_records = pgWalUsage.wal_records - 
>>>>>>> prevWalUsage.wal_records;
>>>>>>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - 
>>>>>>> prevWalUsage.wal_fpi;
>>>>>>> 
>>>>>>> It's better to use WalUsageAccumDiff() here?
>>>>>> 
>>>>>> Yes, thanks. I fixed it.
>>>>>> 
>>>>>>> prevWalUsage needs to be initialized with pgWalUsage?
>>>>>>> 
>>>>>>> +                if (AmWalWriterProcess()){
>>>>>>> +                    WalStats.m_wal_write_walwriter++;
>>>>>>> +                }
>>>>>>> +                else
>>>>>>> +                {
>>>>>>> +                    WalStats.m_wal_write_backend++;
>>>>>>> +                }
>>>>>>> 
>>>>>>> I think that it's better not to separate m_wal_write_xxx into two 
>>>>>>> for
>>>>>>> walwriter and other processes. Instead, we can use one 
>>>>>>> m_wal_write_xxx
>>>>>>> counter and make pgstat_send_wal() send also the process type to
>>>>>>> the stats collector. Then the stats collector can accumulate the 
>>>>>>> counters
>>>>>>> per process type if necessary. If we adopt this approach, we can 
>>>>>>> easily
>>>>>>> extend pg_stat_wal so that any fields can be reported per process 
>>>>>>> type.
>>>>>> 
>>>>>> I'll remove the above source code because these counters are not 
>>>>>> useful.
>>>>>> 
>>>>>> 
>>>>>> On 2020-10-30 12:00, Fujii Masao wrote:
>>>>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I think we need to add some statistics to pg_stat_wal view.
>>>>>>>> 
>>>>>>>> Although there are some parameter related WAL,
>>>>>>>> there are few statistics for tuning them.
>>>>>>>> 
>>>>>>>> I think it's better to provide the following statistics.
>>>>>>>> Please let me know your comments.
>>>>>>>> 
>>>>>>>> ```
>>>>>>>> postgres=# SELECT * from pg_stat_wal;
>>>>>>>> -[ RECORD 1 ]-------+------------------------------
>>>>>>>> wal_records         | 2000224
>>>>>>>> wal_fpi             | 47
>>>>>>>> wal_bytes           | 248216337
>>>>>>>> wal_buffers_full    | 20954
>>>>>>>> wal_init_file       | 8
>>>>>>>> wal_write_backend   | 20960
>>>>>>>> wal_write_walwriter | 46
>>>>>>>> wal_write_time      | 51
>>>>>>>> wal_sync_backend    | 7
>>>>>>>> wal_sync_walwriter  | 8
>>>>>>>> wal_sync_time       | 0
>>>>>>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>>>>>>> ```
>>>>>>>> 
>>>>>>>> 1. Basic statistics of WAL activity
>>>>>>>> 
>>>>>>>> - wal_records: Total number of WAL records generated
>>>>>>>> - wal_fpi: Total number of WAL full page images generated
>>>>>>>> - wal_bytes: Total amount of WAL bytes generated
>>>>>>>> 
>>>>>>>> To understand DB's performance, first, we will check the 
>>>>>>>> performance
>>>>>>>> trends for the entire database instance.
>>>>>>>> For example, if the number of wal_fpi becomes higher, users may 
>>>>>>>> tune
>>>>>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>>>>>> 
>>>>>>>> Although users can check the above statistics via EXPLAIN, 
>>>>>>>> auto_explain,
>>>>>>>> autovacuum and pg_stat_statements now,
>>>>>>>> if users want to see the performance trends  for the entire 
>>>>>>>> database,
>>>>>>>> they must recalculate the statistics.
>>>>>>>> 
>>>>>>>> I think it is useful to add the sum of the basic statistics.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2.  WAL segment file creation
>>>>>>>> 
>>>>>>>> - wal_init_file: Total number of WAL segment files created.
>>>>>>>> 
>>>>>>>> To create a new WAL file may have an impact on the performance 
>>>>>>>> of
>>>>>>>> a write-heavy workload generating lots of WAL. If this number is 
>>>>>>>> reported high,
>>>>>>>> to reduce the number of this initialization, we can tune 
>>>>>>>> WAL-related parameters
>>>>>>>> so that more "recycled" WAL files can be held.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 3. Number of when WAL is flushed
>>>>>>>> 
>>>>>>>> - wal_write_backend : Total number of WAL data written to the 
>>>>>>>> disk by backends
>>>>>>>> - wal_write_walwriter : Total number of WAL data written to the 
>>>>>>>> disk by walwriter
>>>>>>>> - wal_sync_backend : Total number of WAL data synced to the disk 
>>>>>>>> by backends
>>>>>>>> - wal_sync_walwriter : Total number of WAL data synced to the 
>>>>>>>> disk by walwrite
>>>>>>>> 
>>>>>>>> I think it's useful for tuning "synchronous_commit" and 
>>>>>>>> "commit_delay" for query executions.
>>>>>>>> If the number of WAL is flushed is high, users can know 
>>>>>>>> "synchronous_commit" is useful for the workload.
>>>>>>> 
>>>>>>> I just wonder how useful these counters are. Even without these 
>>>>>>> counters,
>>>>>>> we already know synchronous_commit=off is likely to cause the 
>>>>>>> better
>>>>>>> performance (but has the risk of data loss). So ISTM that these 
>>>>>>> counters are
>>>>>>> not so useful when tuning synchronous_commit.
>>>>>> 
>>>>>> Thanks, my understanding was wrong.
>>>>>> I agreed that your comments.
>>>>>> 
>>>>>> I merged the statistics of *_backend and *_walwriter.
>>>>>> I think the sum of them is useful to calculate the average per 
>>>>>> write/sync time.
>>>>>> For example, per write time is equals wal_write_time / wal_write.
>>>>> 
>>>>> Understood.
>>>>> 
>>>>> Thanks for updating the patch!
>>>> 
>>>> Thanks for your comments.
>>>> 
>>>>> patching file src/include/catalog/pg_proc.dat
>>>>> Hunk #1 FAILED at 5491.
>>>>> 1 out of 1 hunk FAILED -- saving rejects to file
>>>>> src/include/catalog/pg_proc.dat.rej
>>>>> 
>>>>> I got this failure when applying the patch. Could you update the 
>>>>> patch?
>>>> 
>>>> Thanks, I updated the patch.
>>>> 
>>>>> -       Number of times WAL data was written to the disk because 
>>>>> WAL
>>>>> buffers got full
>>>>> +       Total number of times WAL data written to the disk because 
>>>>> WAL
>>>>> buffers got full
>>>>> 
>>>>> Isn't "was" necessary between "data" and "written"?
>>>> 
>>>> Yes, I fixed it.
>>>> 
>>>>> +      <entry role="catalog_table_entry"><para 
>>>>> role="column_definition">
>>>>> +       <structfield>wal_bytes</structfield> <type>bigint</type>
>>>>> 
>>>>> Shouldn't the type of wal_bytes be numeric because the total number 
>>>>> of
>>>>> WAL bytes can exceed the range of bigint? I think that the type of
>>>>> pg_stat_statements.wal_bytes is also numeric for the same reason.
>>>> 
>>>> Thanks, I fixed it.
>>>> 
>>>> Since I cast the type of wal_bytes from PgStat_Counter to uint64,
>>>> I changed the type of PgStat_MsgWal and PgStat_WalStats too.
>>>> 
>>>>> +      <entry role="catalog_table_entry"><para 
>>>>> role="column_definition">
>>>>> +       <structfield>wal_write_time</structfield> 
>>>>> <type>bigint</type>
>>>>> 
>>>>> Shouldn't the type of wal_xxx_time be double precision,
>>>>> like pg_stat_database.blk_write_time?
>>>> 
>>>> Thanks, I changed it.
>>>> 
>>>>> Even when fsync is set to off or wal_sync_method is set to 
>>>>> open_sync,
>>>>> wal_sync is incremented. Isn't this behavior confusing?
>>> 
>>> What do you think about this comment?
>> 
>> Sorry, I'll change to increment wal_sync and wal_sync_time only
>> if a specific fsync method is called.
>> 
>>> I found that we discussed track-WAL-IO-timing feature at the past 
>>> discussion
>>> about the similar feature [1]. But the feature was droppped from the 
>>> proposal
>>> patch because there was the performance concern. So probably we need 
>>> to
>>> revisit the past discussion and benchmark the performance. Thought?
>>> 
>>> If track-WAL-IO-timing feature may cause performance regression,
>>> it might be an idea to extract wal_records, wal_fpi and wal_bytes 
>>> parts
>>> from the patch and commit it at first.
>>> 
>>> [1]
>>> https://postgr.es/m/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST+vwJcFtCSCEySnA@mail.gmail.com
>> 
>> Thanks, I'll check the thread.
>> I agree to add basic statistics at first and I attached the patch.
> 
> Thanks!
> 
> +        /* Send WAL statistics */
> +        pgstat_send_wal();
> 
> This is not necessary because walwriter generates no WAL data?

No, it's not necessary.
Thanks. I fixed it.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Fujii Masao
Date:

On 2020/11/26 16:07, Masahiro Ikeda wrote:
> On 2020-11-25 20:19, Fujii Masao wrote:
>> On 2020/11/19 16:31, Masahiro Ikeda wrote:
>>> On 2020-11-17 11:46, Fujii Masao wrote:
>>>> On 2020/11/16 16:35, Masahiro Ikeda wrote:
>>>>> On 2020-11-12 14:58, Fujii Masao wrote:
>>>>>> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>>>>>>> On 2020-10-30 11:50, Fujii Masao wrote:
>>>>>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Thanks for your comments and advice. I updated the patch.
>>>>>>>>>
>>>>>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>>>>>>> <ikedamsh@oss.nttdata.com> wrote in
>>>>>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>>>>>>> > of which is in performance-critical path especially in
>>>>>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>>>>>>> > for that in your proposal.
>>>>>>>>>>
>>>>>>>>>> We should avoid that duplication as possible even if the both number
>>>>>>>>>> are important.
>>>>>>>>>>
>>>>>>>>>>> Also about performance, I thought there are few impacts because it
>>>>>>>>>>> increments stats in memory. If I can implement to reuse pgWalUsage's
>>>>>>>>>>> value which already collects these stats, there is no impact in
>>>>>>>>>>> XLogInsertRecord.
>>>>>>>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>>>>>>>> value?
>>>>>>>>>>
>>>>>>>>>> I don't think that works, but it would work that pgstat_send_wal()
>>>>>>>>>> takes the difference of that values between two successive calls.
>>>>>>>>>>
>>>>>>>>>> WalUsage prevWalUsage;
>>>>>>>>>> ...
>>>>>>>>>> pgstat_send_wal()
>>>>>>>>>> {
>>>>>>>>>> ..
>>>>>>>>>>    /* fill in some values using pgWalUsage */
>>>>>>>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - prevWalUsage.wal_bytes;
>>>>>>>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>>>>>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - prevWalUsage.wal_fpi;
>>>>>>>>>> ...
>>>>>>>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>>>>>>>
>>>>>>>>>>    /* remember the current numbers */
>>>>>>>>>>    prevWalUsage = pgWalUsage;
>>>>>>>>>
>>>>>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>>>>>>> which is already defined and eliminates the extra overhead.
>>>>>>>>
>>>>>>>> +    /* fill in some values using pgWalUsage */
>>>>>>>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes;
>>>>>>>> +    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>>>>>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
>>>>>>>>
>>>>>>>> It's better to use WalUsageAccumDiff() here?
>>>>>>>
>>>>>>> Yes, thanks. I fixed it.
>>>>>>>
>>>>>>>> prevWalUsage needs to be initialized with pgWalUsage?
>>>>>>>>
>>>>>>>> +                if (AmWalWriterProcess()){
>>>>>>>> +                    WalStats.m_wal_write_walwriter++;
>>>>>>>> +                }
>>>>>>>> +                else
>>>>>>>> +                {
>>>>>>>> +                    WalStats.m_wal_write_backend++;
>>>>>>>> +                }
>>>>>>>>
>>>>>>>> I think that it's better not to separate m_wal_write_xxx into two for
>>>>>>>> walwriter and other processes. Instead, we can use one m_wal_write_xxx
>>>>>>>> counter and make pgstat_send_wal() send also the process type to
>>>>>>>> the stats collector. Then the stats collector can accumulate the counters
>>>>>>>> per process type if necessary. If we adopt this approach, we can easily
>>>>>>>> extend pg_stat_wal so that any fields can be reported per process type.
>>>>>>>
>>>>>>> I'll remove the above source code because these counters are not useful.
>>>>>>>
>>>>>>>
>>>>>>> On 2020-10-30 12:00, Fujii Masao wrote:
>>>>>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I think we need to add some statistics to pg_stat_wal view.
>>>>>>>>>
>>>>>>>>> Although there are some parameter related WAL,
>>>>>>>>> there are few statistics for tuning them.
>>>>>>>>>
>>>>>>>>> I think it's better to provide the following statistics.
>>>>>>>>> Please let me know your comments.
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> postgres=# SELECT * from pg_stat_wal;
>>>>>>>>> -[ RECORD 1 ]-------+------------------------------
>>>>>>>>> wal_records         | 2000224
>>>>>>>>> wal_fpi             | 47
>>>>>>>>> wal_bytes           | 248216337
>>>>>>>>> wal_buffers_full    | 20954
>>>>>>>>> wal_init_file       | 8
>>>>>>>>> wal_write_backend   | 20960
>>>>>>>>> wal_write_walwriter | 46
>>>>>>>>> wal_write_time      | 51
>>>>>>>>> wal_sync_backend    | 7
>>>>>>>>> wal_sync_walwriter  | 8
>>>>>>>>> wal_sync_time       | 0
>>>>>>>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> 1. Basic statistics of WAL activity
>>>>>>>>>
>>>>>>>>> - wal_records: Total number of WAL records generated
>>>>>>>>> - wal_fpi: Total number of WAL full page images generated
>>>>>>>>> - wal_bytes: Total amount of WAL bytes generated
>>>>>>>>>
>>>>>>>>> To understand DB's performance, first, we will check the performance
>>>>>>>>> trends for the entire database instance.
>>>>>>>>> For example, if the number of wal_fpi becomes higher, users may tune
>>>>>>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>>>>>>>
>>>>>>>>> Although users can check the above statistics via EXPLAIN, auto_explain,
>>>>>>>>> autovacuum and pg_stat_statements now,
>>>>>>>>> if users want to see the performance trends  for the entire database,
>>>>>>>>> they must recalculate the statistics.
>>>>>>>>>
>>>>>>>>> I think it is useful to add the sum of the basic statistics.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2.  WAL segment file creation
>>>>>>>>>
>>>>>>>>> - wal_init_file: Total number of WAL segment files created.
>>>>>>>>>
>>>>>>>>> To create a new WAL file may have an impact on the performance of
>>>>>>>>> a write-heavy workload generating lots of WAL. If this number is reported high,
>>>>>>>>> to reduce the number of this initialization, we can tune WAL-related parameters
>>>>>>>>> so that more "recycled" WAL files can be held.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 3. Number of when WAL is flushed
>>>>>>>>>
>>>>>>>>> - wal_write_backend : Total number of WAL data written to the disk by backends
>>>>>>>>> - wal_write_walwriter : Total number of WAL data written to the disk by walwriter
>>>>>>>>> - wal_sync_backend : Total number of WAL data synced to the disk by backends
>>>>>>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk by walwrite
>>>>>>>>>
>>>>>>>>> I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions.
>>>>>>>>> If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload.
>>>>>>>>
>>>>>>>> I just wonder how useful these counters are. Even without these counters,
>>>>>>>> we already know synchronous_commit=off is likely to cause the better
>>>>>>>> performance (but has the risk of data loss). So ISTM that these counters are
>>>>>>>> not so useful when tuning synchronous_commit.
>>>>>>>
>>>>>>> Thanks, my understanding was wrong.
>>>>>>> I agreed that your comments.
>>>>>>>
>>>>>>> I merged the statistics of *_backend and *_walwriter.
>>>>>>> I think the sum of them is useful to calculate the average per write/sync time.
>>>>>>> For example, per write time is equals wal_write_time / wal_write.
>>>>>>
>>>>>> Understood.
>>>>>>
>>>>>> Thanks for updating the patch!
>>>>>
>>>>> Thanks for your comments.
>>>>>
>>>>>> patching file src/include/catalog/pg_proc.dat
>>>>>> Hunk #1 FAILED at 5491.
>>>>>> 1 out of 1 hunk FAILED -- saving rejects to file
>>>>>> src/include/catalog/pg_proc.dat.rej
>>>>>>
>>>>>> I got this failure when applying the patch. Could you update the patch?
>>>>>
>>>>> Thanks, I updated the patch.
>>>>>
>>>>>> -       Number of times WAL data was written to the disk because WAL
>>>>>> buffers got full
>>>>>> +       Total number of times WAL data written to the disk because WAL
>>>>>> buffers got full
>>>>>>
>>>>>> Isn't "was" necessary between "data" and "written"?
>>>>>
>>>>> Yes, I fixed it.
>>>>>
>>>>>> +      <entry role="catalog_table_entry"><para role="column_definition">
>>>>>> +       <structfield>wal_bytes</structfield> <type>bigint</type>
>>>>>>
>>>>>> Shouldn't the type of wal_bytes be numeric because the total number of
>>>>>> WAL bytes can exceed the range of bigint? I think that the type of
>>>>>> pg_stat_statements.wal_bytes is also numeric for the same reason.
>>>>>
>>>>> Thanks, I fixed it.
>>>>>
>>>>> Since I cast the type of wal_bytes from PgStat_Counter to uint64,
>>>>> I changed the type of PgStat_MsgWal and PgStat_WalStats too.
>>>>>
>>>>>> +      <entry role="catalog_table_entry"><para role="column_definition">
>>>>>> +       <structfield>wal_write_time</structfield> <type>bigint</type>
>>>>>>
>>>>>> Shouldn't the type of wal_xxx_time be double precision,
>>>>>> like pg_stat_database.blk_write_time?
>>>>>
>>>>> Thanks, I changed it.
>>>>>
>>>>>> Even when fsync is set to off or wal_sync_method is set to open_sync,
>>>>>> wal_sync is incremented. Isn't this behavior confusing?
>>>>
>>>> What do you think about this comment?
>>>
>>> Sorry, I'll change to increment wal_sync and wal_sync_time only
>>> if a specific fsync method is called.
>>>
>>>> I found that we discussed track-WAL-IO-timing feature at the past discussion
>>>> about the similar feature [1]. But the feature was droppped from the proposal
>>>> patch because there was the performance concern. So probably we need to
>>>> revisit the past discussion and benchmark the performance. Thought?
>>>>
>>>> If track-WAL-IO-timing feature may cause performance regression,
>>>> it might be an idea to extract wal_records, wal_fpi and wal_bytes parts
>>>> from the patch and commit it at first.
>>>>
>>>> [1]
>>>> https://postgr.es/m/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST+vwJcFtCSCEySnA@mail.gmail.com
>>>
>>> Thanks, I'll check the thread.
>>> I agree to add basic statistics at first and I attached the patch.
>>
>> Thanks!
>>
>> +        /* Send WAL statistics */
>> +        pgstat_send_wal();
>>
>> This is not necessary because walwriter generates no WAL data?
> 
> No, it's not necessary.
> Thanks. I fixed it.

Thanks for updating the patch! I applied cosmetic changes to it.
For example, I added more comments. Patch attached.
Barring no objection, I will commit this patch.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Fujii Masao
Date:

On 2020/12/01 14:01, Fujii Masao wrote:
> 
> 
> On 2020/11/26 16:07, Masahiro Ikeda wrote:
>> On 2020-11-25 20:19, Fujii Masao wrote:
>>> On 2020/11/19 16:31, Masahiro Ikeda wrote:
>>>> On 2020-11-17 11:46, Fujii Masao wrote:
>>>>> On 2020/11/16 16:35, Masahiro Ikeda wrote:
>>>>>> On 2020-11-12 14:58, Fujii Masao wrote:
>>>>>>> On 2020/11/06 10:25, Masahiro Ikeda wrote:
>>>>>>>> On 2020-10-30 11:50, Fujii Masao wrote:
>>>>>>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Thanks for your comments and advice. I updated the patch.
>>>>>>>>>>
>>>>>>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote:
>>>>>>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda
>>>>>>>>>>> <ikedamsh@oss.nttdata.com> wrote in
>>>>>>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote:
>>>>>>>>>>>> > I see that we also need to add extra code to capture these stats (some
>>>>>>>>>>>> > of which is in performance-critical path especially in
>>>>>>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might
>>>>>>>>>>>> > be that it is all fine as it is very important to collect these stats
>>>>>>>>>>>> > at cluster-level in spite that the same information can be gathered at
>>>>>>>>>>>> > statement-level to help customers but I don't see a very strong case
>>>>>>>>>>>> > for that in your proposal.
>>>>>>>>>>>
>>>>>>>>>>> We should avoid that duplication as possible even if the both number
>>>>>>>>>>> are important.
>>>>>>>>>>>
>>>>>>>>>>>> Also about performance, I thought there are few impacts because it
>>>>>>>>>>>> increments stats in memory. If I can implement to reuse pgWalUsage's
>>>>>>>>>>>> value which already collects these stats, there is no impact in
>>>>>>>>>>>> XLogInsertRecord.
>>>>>>>>>>>> For example, how about pg_stat_wal() calculates the accumulated
>>>>>>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's
>>>>>>>>>>>> value?
>>>>>>>>>>>
>>>>>>>>>>> I don't think that works, but it would work that pgstat_send_wal()
>>>>>>>>>>> takes the difference of that values between two successive calls.
>>>>>>>>>>>
>>>>>>>>>>> WalUsage prevWalUsage;
>>>>>>>>>>> ...
>>>>>>>>>>> pgstat_send_wal()
>>>>>>>>>>> {
>>>>>>>>>>> ..
>>>>>>>>>>>    /* fill in some values using pgWalUsage */
>>>>>>>>>>>    WalStats.m_wal_bytes   = pgWalUsage.wal_bytes   - prevWalUsage.wal_bytes;
>>>>>>>>>>>    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>>>>>>>>>    WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi     - prevWalUsage.wal_fpi;
>>>>>>>>>>> ...
>>>>>>>>>>>    pgstat_send(&WalStats, sizeof(WalStats));
>>>>>>>>>>>
>>>>>>>>>>>    /* remember the current numbers */
>>>>>>>>>>>    prevWalUsage = pgWalUsage;
>>>>>>>>>>
>>>>>>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage
>>>>>>>>>> which is already defined and eliminates the extra overhead.
>>>>>>>>>
>>>>>>>>> +    /* fill in some values using pgWalUsage */
>>>>>>>>> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes;
>>>>>>>>> +    WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records;
>>>>>>>>> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi;
>>>>>>>>>
>>>>>>>>> It's better to use WalUsageAccumDiff() here?
>>>>>>>>
>>>>>>>> Yes, thanks. I fixed it.
>>>>>>>>
>>>>>>>>> prevWalUsage needs to be initialized with pgWalUsage?
>>>>>>>>>
>>>>>>>>> +                if (AmWalWriterProcess()){
>>>>>>>>> +                    WalStats.m_wal_write_walwriter++;
>>>>>>>>> +                }
>>>>>>>>> +                else
>>>>>>>>> +                {
>>>>>>>>> +                    WalStats.m_wal_write_backend++;
>>>>>>>>> +                }
>>>>>>>>>
>>>>>>>>> I think that it's better not to separate m_wal_write_xxx into two for
>>>>>>>>> walwriter and other processes. Instead, we can use one m_wal_write_xxx
>>>>>>>>> counter and make pgstat_send_wal() send also the process type to
>>>>>>>>> the stats collector. Then the stats collector can accumulate the counters
>>>>>>>>> per process type if necessary. If we adopt this approach, we can easily
>>>>>>>>> extend pg_stat_wal so that any fields can be reported per process type.
>>>>>>>>
>>>>>>>> I'll remove the above source code because these counters are not useful.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2020-10-30 12:00, Fujii Masao wrote:
>>>>>>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I think we need to add some statistics to pg_stat_wal view.
>>>>>>>>>>
>>>>>>>>>> Although there are some parameter related WAL,
>>>>>>>>>> there are few statistics for tuning them.
>>>>>>>>>>
>>>>>>>>>> I think it's better to provide the following statistics.
>>>>>>>>>> Please let me know your comments.
>>>>>>>>>>
>>>>>>>>>> ```
>>>>>>>>>> postgres=# SELECT * from pg_stat_wal;
>>>>>>>>>> -[ RECORD 1 ]-------+------------------------------
>>>>>>>>>> wal_records         | 2000224
>>>>>>>>>> wal_fpi             | 47
>>>>>>>>>> wal_bytes           | 248216337
>>>>>>>>>> wal_buffers_full    | 20954
>>>>>>>>>> wal_init_file       | 8
>>>>>>>>>> wal_write_backend   | 20960
>>>>>>>>>> wal_write_walwriter | 46
>>>>>>>>>> wal_write_time      | 51
>>>>>>>>>> wal_sync_backend    | 7
>>>>>>>>>> wal_sync_walwriter  | 8
>>>>>>>>>> wal_sync_time       | 0
>>>>>>>>>> stats_reset         | 2020-10-20 11:04:51.307771+09
>>>>>>>>>> ```
>>>>>>>>>>
>>>>>>>>>> 1. Basic statistics of WAL activity
>>>>>>>>>>
>>>>>>>>>> - wal_records: Total number of WAL records generated
>>>>>>>>>> - wal_fpi: Total number of WAL full page images generated
>>>>>>>>>> - wal_bytes: Total amount of WAL bytes generated
>>>>>>>>>>
>>>>>>>>>> To understand DB's performance, first, we will check the performance
>>>>>>>>>> trends for the entire database instance.
>>>>>>>>>> For example, if the number of wal_fpi becomes higher, users may tune
>>>>>>>>>> "wal_compression", "checkpoint_timeout" and so on.
>>>>>>>>>>
>>>>>>>>>> Although users can check the above statistics via EXPLAIN, auto_explain,
>>>>>>>>>> autovacuum and pg_stat_statements now,
>>>>>>>>>> if users want to see the performance trends  for the entire database,
>>>>>>>>>> they must recalculate the statistics.
>>>>>>>>>>
>>>>>>>>>> I think it is useful to add the sum of the basic statistics.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2.  WAL segment file creation
>>>>>>>>>>
>>>>>>>>>> - wal_init_file: Total number of WAL segment files created.
>>>>>>>>>>
>>>>>>>>>> To create a new WAL file may have an impact on the performance of
>>>>>>>>>> a write-heavy workload generating lots of WAL. If this number is reported high,
>>>>>>>>>> to reduce the number of this initialization, we can tune WAL-related parameters
>>>>>>>>>> so that more "recycled" WAL files can be held.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 3. Number of when WAL is flushed
>>>>>>>>>>
>>>>>>>>>> - wal_write_backend : Total number of WAL data written to the disk by backends
>>>>>>>>>> - wal_write_walwriter : Total number of WAL data written to the disk by walwriter
>>>>>>>>>> - wal_sync_backend : Total number of WAL data synced to the disk by backends
>>>>>>>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk by walwrite
>>>>>>>>>>
>>>>>>>>>> I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions.
>>>>>>>>>> If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload.
>>>>>>>>>
>>>>>>>>> I just wonder how useful these counters are. Even without these counters,
>>>>>>>>> we already know synchronous_commit=off is likely to cause the better
>>>>>>>>> performance (but has the risk of data loss). So ISTM that these counters are
>>>>>>>>> not so useful when tuning synchronous_commit.
>>>>>>>>
>>>>>>>> Thanks, my understanding was wrong.
>>>>>>>> I agreed that your comments.
>>>>>>>>
>>>>>>>> I merged the statistics of *_backend and *_walwriter.
>>>>>>>> I think the sum of them is useful to calculate the average per write/sync time.
>>>>>>>> For example, per write time is equals wal_write_time / wal_write.
>>>>>>>
>>>>>>> Understood.
>>>>>>>
>>>>>>> Thanks for updating the patch!
>>>>>>
>>>>>> Thanks for your comments.
>>>>>>
>>>>>>> patching file src/include/catalog/pg_proc.dat
>>>>>>> Hunk #1 FAILED at 5491.
>>>>>>> 1 out of 1 hunk FAILED -- saving rejects to file
>>>>>>> src/include/catalog/pg_proc.dat.rej
>>>>>>>
>>>>>>> I got this failure when applying the patch. Could you update the patch?
>>>>>>
>>>>>> Thanks, I updated the patch.
>>>>>>
>>>>>>> -       Number of times WAL data was written to the disk because WAL
>>>>>>> buffers got full
>>>>>>> +       Total number of times WAL data written to the disk because WAL
>>>>>>> buffers got full
>>>>>>>
>>>>>>> Isn't "was" necessary between "data" and "written"?
>>>>>>
>>>>>> Yes, I fixed it.
>>>>>>
>>>>>>> +      <entry role="catalog_table_entry"><para role="column_definition">
>>>>>>> +       <structfield>wal_bytes</structfield> <type>bigint</type>
>>>>>>>
>>>>>>> Shouldn't the type of wal_bytes be numeric because the total number of
>>>>>>> WAL bytes can exceed the range of bigint? I think that the type of
>>>>>>> pg_stat_statements.wal_bytes is also numeric for the same reason.
>>>>>>
>>>>>> Thanks, I fixed it.
>>>>>>
>>>>>> Since I cast the type of wal_bytes from PgStat_Counter to uint64,
>>>>>> I changed the type of PgStat_MsgWal and PgStat_WalStats too.
>>>>>>
>>>>>>> +      <entry role="catalog_table_entry"><para role="column_definition">
>>>>>>> +       <structfield>wal_write_time</structfield> <type>bigint</type>
>>>>>>>
>>>>>>> Shouldn't the type of wal_xxx_time be double precision,
>>>>>>> like pg_stat_database.blk_write_time?
>>>>>>
>>>>>> Thanks, I changed it.
>>>>>>
>>>>>>> Even when fsync is set to off or wal_sync_method is set to open_sync,
>>>>>>> wal_sync is incremented. Isn't this behavior confusing?
>>>>>
>>>>> What do you think about this comment?
>>>>
>>>> Sorry, I'll change to increment wal_sync and wal_sync_time only
>>>> if a specific fsync method is called.
>>>>
>>>>> I found that we discussed track-WAL-IO-timing feature at the past discussion
>>>>> about the similar feature [1]. But the feature was droppped from the proposal
>>>>> patch because there was the performance concern. So probably we need to
>>>>> revisit the past discussion and benchmark the performance. Thought?
>>>>>
>>>>> If track-WAL-IO-timing feature may cause performance regression,
>>>>> it might be an idea to extract wal_records, wal_fpi and wal_bytes parts
>>>>> from the patch and commit it at first.
>>>>>
>>>>> [1]
>>>>> https://postgr.es/m/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST+vwJcFtCSCEySnA@mail.gmail.com
>>>>
>>>> Thanks, I'll check the thread.
>>>> I agree to add basic statistics at first and I attached the patch.
>>>
>>> Thanks!
>>>
>>> +        /* Send WAL statistics */
>>> +        pgstat_send_wal();
>>>
>>> This is not necessary because walwriter generates no WAL data?
>>
>> No, it's not necessary.
>> Thanks. I fixed it.
> 
> Thanks for updating the patch! I applied cosmetic changes to it.
> For example, I added more comments. Patch attached.
> Barring no objection, I will commit this patch.

Pushed. Thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Andres Freund
Date:
Hi,

On 2020-12-02 13:52:43 +0900, Fujii Masao wrote:
> Pushed. Thanks!

Why are wal_records/fpi long, instead of uint64?
    long        wal_records;    /* # of WAL records produced */
    long        wal_fpi;        /* # of WAL full page images produced */
    uint64        wal_bytes;        /* size of WAL records produced */

long is only 4 byte e.g. on windows, and it is entirely possible to wrap
a 4 byte record counter. It's also somewhat weird that wal_bytes is
unsigned, but the others are signed?

This is made doubly weird because on the SQL level you chose to make
wal_records, wal_fpi bigint. And wal_bytes numeric?

Greetings,

Andres Freund



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Andres Freund
Date:
Hi,

On 2020-12-21 13:16:50 -0800, Andres Freund wrote:
> On 2020-12-02 13:52:43 +0900, Fujii Masao wrote:
> > Pushed. Thanks!
>
> Why are wal_records/fpi long, instead of uint64?
>     long        wal_records;    /* # of WAL records produced */
>     long        wal_fpi;        /* # of WAL full page images produced */
>     uint64        wal_bytes;        /* size of WAL records produced */
>
> long is only 4 byte e.g. on windows, and it is entirely possible to wrap
> a 4 byte record counter. It's also somewhat weird that wal_bytes is
> unsigned, but the others are signed?
>
> This is made doubly weird because on the SQL level you chose to make
> wal_records, wal_fpi bigint. And wal_bytes numeric?

Some more things:
- There's both PgStat_MsgWal WalStats; and static PgStat_WalStats walStats;
  that seems *WAY* too confusing. And the former imo shouldn't be
  global.
- AdvanceXLInsertBuffer() does WalStats.m_wal_buffers_full, but as far
  as I can tell there's nothing actually sending that?

- Andres



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Masahiro Ikeda
Date:
Thanks for your comments.

On 2020-12-22 09:39, Andres Freund wrote:
> Hi,
> 
> On 2020-12-21 13:16:50 -0800, Andres Freund wrote:
>> On 2020-12-02 13:52:43 +0900, Fujii Masao wrote:
>> > Pushed. Thanks!
>> 
>> Why are wal_records/fpi long, instead of uint64?
>>     long        wal_records;    /* # of WAL records produced */
>>     long        wal_fpi;        /* # of WAL full page images produced */
>>     uint64        wal_bytes;        /* size of WAL records produced */
>> 
>> long is only 4 byte e.g. on windows, and it is entirely possible to 
>> wrap
>> a 4 byte record counter. It's also somewhat weird that wal_bytes is
>> unsigned, but the others are signed?
>> 
>> This is made doubly weird because on the SQL level you chose to make
>> wal_records, wal_fpi bigint. And wal_bytes numeric?

I'm sorry I don't know the reason.

The following thread is related to the patch and the type of wal_bytes
is changed from long to uint64 because XLogRecPtr is uint64.
https://www.postgresql.org/message-id/flat/20200402144438.GF64485%40nol#1f0127c98df430104c63426fdc285c20

I assumed that the reason why the type of wal_records/fpi is long
is BufferUsage have the members (i.e, shared_blks_hit) of the same 
types.

So, I think it's better if to change the type of wal_records/fpi from 
long to uint64,
to change the types of BufferUsage's members too.


> Some more things:
> - There's both PgStat_MsgWal WalStats; and static PgStat_WalStats 
> walStats;
>   that seems *WAY* too confusing. And the former imo shouldn't be
>   global.

Sorry for the confusing name.
I referenced the following variable name.

  static PgStat_MsgSLRU SLRUStats[SLRU_NUM_ELEMENTS];
  static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];

How about to change from "PgStat_MsgWal WalStats"
to "PgStat_MsgWal MsgWalStats"?

Is it better to change the following name too?
  "PgStat_MsgBgWriter BgWriterStats;"
  "static PgStat_MsgSLRU SLRUStats[SLRU_NUM_ELEMENTS];"

Since PgStat_MsgWal is called in xlog.c and pgstat.c,
I thought it's should be global.

> - AdvanceXLInsertBuffer() does WalStats.m_wal_buffers_full, but as far
>   as I can tell there's nothing actually sending that?

IIUC, when pgstat_send_wal() is called by backends and so on,
it is sent to the statistic collector and it is exposed via pg_stat_wal 
view.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From
Masahiro Ikeda
Date:
On 2020-12-22 11:16, Masahiro Ikeda wrote:
> Thanks for your comments.
> 
> On 2020-12-22 09:39, Andres Freund wrote:
>> Hi,
>> 
>> On 2020-12-21 13:16:50 -0800, Andres Freund wrote:
>>> On 2020-12-02 13:52:43 +0900, Fujii Masao wrote:
>>> > Pushed. Thanks!
>>> 
>>> Why are wal_records/fpi long, instead of uint64?
>>>     long        wal_records;    /* # of WAL records produced */
>>>     long        wal_fpi;        /* # of WAL full page images produced */
>>>     uint64        wal_bytes;        /* size of WAL records produced */
>>> 
>>> long is only 4 byte e.g. on windows, and it is entirely possible to 
>>> wrap
>>> a 4 byte record counter. It's also somewhat weird that wal_bytes is
>>> unsigned, but the others are signed?
>>> 
>>> This is made doubly weird because on the SQL level you chose to make
>>> wal_records, wal_fpi bigint. And wal_bytes numeric?
> 
> I'm sorry I don't know the reason.
> 
> The following thread is related to the patch and the type of wal_bytes
> is changed from long to uint64 because XLogRecPtr is uint64.
> https://www.postgresql.org/message-id/flat/20200402144438.GF64485%40nol#1f0127c98df430104c63426fdc285c20
> 
> I assumed that the reason why the type of wal_records/fpi is long
> is BufferUsage have the members (i.e, shared_blks_hit) of the same 
> types.
> 
> So, I think it's better if to change the type of wal_records/fpi from
> long to uint64,
> to change the types of BufferUsage's members too.

I've done a little more research so I'll share the results.

IUCC, theoretically this leads to caliculate the statistics less,
but actually, it's not happened.

The above "wal_records", "wal_fpi" are accumulation values and when 
WalUsageAccumDiff()
is called, we can know how many wals are generated for specific 
executions,
for example, planning/executing a query, processing a utility command, 
and vacuuming one relation.

The following variable has accumulated "wal_records" and "wal_fpi" per 
process.

```
typedef struct WalUsage
{
    long        wal_records;    /* # of WAL records produced */
    long        wal_fpi;        /* # of WAL full page images produced */
    uint64        wal_bytes;        /* size of WAL records produced */
} WalUsage;

WalUsage    pgWalUsage;
```

Although this may be overflow, it doesn't affect to caliculate the 
difference
of wal usage between some execution points. If to generate over 2 
billion wal
records per executions, 4 bytes is not enough and collected statistics 
will be
lost, but I think it's not happened.


In addition, "wal_records" and "wal_fpi" values sent by processes are
accumulated in the statistic collector and their types are 
PgStat_Counter(int64).

```
typedef struct PgStat_WalStats
{
    PgStat_Counter wal_records;
    PgStat_Counter wal_fpi;
    uint64        wal_bytes;
    PgStat_Counter wal_buffers_full;
    TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
```


>> Some more things:
>> - There's both PgStat_MsgWal WalStats; and static PgStat_WalStats 
>> walStats;
>>   that seems *WAY* too confusing. And the former imo shouldn't be
>>   global.
> 
> Sorry for the confusing name.
> I referenced the following variable name.
> 
>  static PgStat_MsgSLRU SLRUStats[SLRU_NUM_ELEMENTS];
>  static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
> 
> How about to change from "PgStat_MsgWal WalStats"
> to "PgStat_MsgWal MsgWalStats"?
> 
> Is it better to change the following name too?
>  "PgStat_MsgBgWriter BgWriterStats;"
>  "static PgStat_MsgSLRU SLRUStats[SLRU_NUM_ELEMENTS];"
> 
> Since PgStat_MsgWal is called in xlog.c and pgstat.c,
> I thought it's should be global.

I made an attached patch to rename the above variable names.
What do you think?

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment