Thread: New statistics for tuning WAL buffer size

New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
Hi,

It's important to provide the metrics for tuning the size of WAL 
buffers.
For now, it's lack of the statistics how often processes wait to write 
WAL because WAL buffer is full.

If those situation are often occurred, WAL buffer is too small for the 
workload.
DBAs must to tune the WAL buffer size for performance improvement.

There are related threads, but those are not merged.
https://www.postgresql.org/message-id/4FF824F3.5090407@uptime.jp
https://www.postgresql.org/message-id/flat/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST%2BvwJcFtCSCEySnA%40mail.gmail.com

What do you think?
If we can have a consensus, I will make a PoC patch.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



RE: New statistics for tuning WAL buffer size

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
> It's important to provide the metrics for tuning the size of WAL buffers.
> For now, it's lack of the statistics how often processes wait to write WAL
> because WAL buffer is full.
>
> If those situation are often occurred, WAL buffer is too small for the workload.
> DBAs must to tune the WAL buffer size for performance improvement.

Yes, it's helpful to know if we need to enlarge the WAL buffer.  That's why our colleague HariBabu proposed the patch.
We'dbe happy if it could be committed in some form. 


> There are related threads, but those are not merged.
> https://www.postgresql.org/message-id/4FF824F3.5090407@uptime.jp
> https://www.postgresql.org/message-id/flat/CAJrrPGc6APFUGYNcPe4qcNx
> pL8gXKYv1KST%2BvwJcFtCSCEySnA%40mail.gmail.com

What's the difference between those patches?  What blocked them from being committed?


Regards
Takayuki Tsunakawa




RE: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
On 2020-08-18 16:35, tsunakawa.takay@fujitsu.com wrote:
> From: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
>> It's important to provide the metrics for tuning the size of WAL 
>> buffers.
>> For now, it's lack of the statistics how often processes wait to write 
>> WAL
>> because WAL buffer is full.
>> 
>> If those situation are often occurred, WAL buffer is too small for the 
>> workload.
>> DBAs must to tune the WAL buffer size for performance improvement.
> 
> Yes, it's helpful to know if we need to enlarge the WAL buffer.
> That's why our colleague HariBabu proposed the patch.  We'd be happy
> if it could be committed in some form.
> 
>> There are related threads, but those are not merged.
>> https://www.postgresql.org/message-id/4FF824F3.5090407@uptime.jp
>> https://www.postgresql.org/message-id/flat/CAJrrPGc6APFUGYNcPe4qcNx
>> pL8gXKYv1KST%2BvwJcFtCSCEySnA%40mail.gmail.com
> 
> What's the difference between those patches?  What blocked them from
> being committed?

Thanks for replying.

Since the above threads are not active now and those patches can't be 
applied HEAD,
I made this thread. If it is better to reply the above thread, I will do 
so.

If my understanding is correct, we have to measure the performance 
impact first.
Do you know HariBabu is now trying to solve it? If not, I will try to 
modify patches to apply HEAD.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



RE: New statistics for tuning WAL buffer size

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
> If my understanding is correct, we have to measure the performance
> impact first.
> Do you know HariBabu is now trying to solve it? If not, I will try to
> modify patches to apply HEAD.

No, he's not doing it anymore.  It'd be great if you could resume it.  However, I recommend sharing your understanding
aboutwhat were the issues with those two threads and how you're trying to solve them.  Was the performance overhead the
blockerin both of the threads? 


Regards
Takayuki Tsunakawa






RE: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
On 2020-08-19 13:49, tsunakawa.takay@fujitsu.com wrote:
> From: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
>> If my understanding is correct, we have to measure the performance
>> impact first.
>> Do you know HariBabu is now trying to solve it? If not, I will try to
>> modify patches to apply HEAD.
> 
> No, he's not doing it anymore.  It'd be great if you could resume it.

OK, thanks.

> However, I recommend sharing your understanding about what were the
> issues with those two threads and how you're trying to solve them.
> Was the performance overhead the blocker in both of the threads?

In my understanding, some comments are not solved in both of the 
threads.
I think the following works are remained.

1) Modify patches to apply HEAD
2) Get consensus what metrics we collect and how to use them for tuning.
3) Measure performance impact and if it leads poor performance, we solve 
it.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/08/19 14:10, Masahiro Ikeda wrote:
> On 2020-08-19 13:49, tsunakawa.takay@fujitsu.com wrote:
>> From: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
>>> If my understanding is correct, we have to measure the performance
>>> impact first.
>>> Do you know HariBabu is now trying to solve it? If not, I will try to
>>> modify patches to apply HEAD.
>>
>> No, he's not doing it anymore.  It'd be great if you could resume it.
> 
> OK, thanks.
> 
>> However, I recommend sharing your understanding about what were the
>> issues with those two threads and how you're trying to solve them.
>> Was the performance overhead the blocker in both of the threads?
> 
> In my understanding, some comments are not solved in both of the threads.
> I think the following works are remained.
> 
> 1) Modify patches to apply HEAD
> 2) Get consensus what metrics we collect and how to use them for tuning.

I agree to expose the number of WAL write caused by full of WAL buffers.
It's helpful when tuning wal_buffers size. Haribabu separated that number
into two fields in his patch; one is the number of WAL write by backend,
and another is by background processes and workers. But I'm not sure
how useful such separation is. I'm ok with just one field for that number.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/08/20 20:01, Fujii Masao wrote:
> 
> 
> On 2020/08/19 14:10, Masahiro Ikeda wrote:
>> On 2020-08-19 13:49, tsunakawa.takay@fujitsu.com wrote:
>>> From: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
>>>> If my understanding is correct, we have to measure the performance
>>>> impact first.
>>>> Do you know HariBabu is now trying to solve it? If not, I will try to
>>>> modify patches to apply HEAD.
>>>
>>> No, he's not doing it anymore.  It'd be great if you could resume it.
>>
>> OK, thanks.
>>
>>> However, I recommend sharing your understanding about what were the
>>> issues with those two threads and how you're trying to solve them.
>>> Was the performance overhead the blocker in both of the threads?
>>
>> In my understanding, some comments are not solved in both of the threads.
>> I think the following works are remained.
>>
>> 1) Modify patches to apply HEAD
>> 2) Get consensus what metrics we collect and how to use them for tuning.
> 
> I agree to expose the number of WAL write caused by full of WAL buffers.
> It's helpful when tuning wal_buffers size. Haribabu separated that number
> into two fields in his patch; one is the number of WAL write by backend,
> and another is by background processes and workers. But I'm not sure
> how useful such separation is. I'm ok with just one field for that number.

Just idea; it may be worth exposing the number of when new WAL file is
created and zero-filled. This initialization may have impact on
the performance of write-heavy workload generating lots of WAL. If this
number is reported high, to reduce the number of this initialization,
we can tune WAL-related parameters so that more "recycled" WAL files
can be hold.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



RE: New statistics for tuning WAL buffer size

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Fujii Masao <masao.fujii@oss.nttdata.com>
> I agree to expose the number of WAL write caused by full of WAL buffers.
> It's helpful when tuning wal_buffers size. Haribabu separated that number
> into two fields in his patch; one is the number of WAL write by backend,
> and another is by background processes and workers. But I'm not sure
> how useful such separation is. I'm ok with just one field for that number.

I agree with you.  I don't think we need to separate the numbers for foreground processes and background ones.  WAL
bufferis a single resource.  So "Writes due to full WAL buffer are happening.  We may be able to boost performance by
increasingwal_buffers" would be enough.
 


Regards
Takayuki Tsunakawa


RE: New statistics for tuning WAL buffer size

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Fujii Masao <masao.fujii@oss.nttdata.com>
> Just idea; it may be worth exposing the number of when new WAL file is
> created and zero-filled. This initialization may have impact on
> the performance of write-heavy workload generating lots of WAL. If this
> number is reported high, to reduce the number of this initialization,
> we can tune WAL-related parameters so that more "recycled" WAL files
> can be hold.

Sounds good.  Actually, I want to know how much those zeroing affected the transaction response times, but it may be
thetarget of the wait event statistics that Imai-san is addressing.
 

(I wonder how the fallocate() patch went that tries to minimize the zeroing time.)


Regards
Takayuki Tsunakawa


Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/08/21 12:08, tsunakawa.takay@fujitsu.com wrote:
> From: Fujii Masao <masao.fujii@oss.nttdata.com>
>> Just idea; it may be worth exposing the number of when new WAL file is
>> created and zero-filled. This initialization may have impact on
>> the performance of write-heavy workload generating lots of WAL. If this
>> number is reported high, to reduce the number of this initialization,
>> we can tune WAL-related parameters so that more "recycled" WAL files
>> can be hold.
> 
> Sounds good.  Actually, I want to know how much those zeroing affected the transaction response times, but it may be
thetarget of the wait event statistics that Imai-san is addressing.
 

Maybe, so I'm ok if the first pg_stat_walwriter patch doesn't expose
this number. We can extend it to include that later after we confirm
that number is really useful.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
Hi, thanks for useful comments.

>> I agree to expose the number of WAL write caused by full of WAL 
>> buffers.
>> It's helpful when tuning wal_buffers size. Haribabu separated that 
>> number
>> into two fields in his patch; one is the number of WAL write by 
>> backend,
>> and another is by background processes and workers. But I'm not sure
>> how useful such separation is. I'm ok with just one field for that 
>> number.
> I agree with you.  I don't think we need to separate the numbers for 
> foreground processes and background ones.  WAL buffer is a single 
> resource.  So "Writes due to full WAL buffer are happening.  We may be 
> able to boost performance by increasing wal_buffers" would be enough.

I made a patch to expose the number of WAL write caused by full of WAL 
buffers.
I'm going to submit this patch to commitfests.

As Fujii-san and Tsunakawa-san said, it expose the total number
since I agreed that we don't need to separate the numbers for
foreground processes and background ones.

By the way, do we need to add another metrics related to WAL?
For example, is the total number of WAL writes to the buffers useful to 
calculate the dirty WAL write ratio?

Is it enough as a first step?

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
On 2020-08-24 20:45, Masahiro Ikeda wrote:
> Hi, thanks for useful comments.
> 
>>> I agree to expose the number of WAL write caused by full of WAL 
>>> buffers.
>>> It's helpful when tuning wal_buffers size. Haribabu separated that 
>>> number
>>> into two fields in his patch; one is the number of WAL write by 
>>> backend,
>>> and another is by background processes and workers. But I'm not sure
>>> how useful such separation is. I'm ok with just one field for that 
>>> number.
>> I agree with you.  I don't think we need to separate the numbers for 
>> foreground processes and background ones.  WAL buffer is a single 
>> resource.  So "Writes due to full WAL buffer are happening.  We may be 
>> able to boost performance by increasing wal_buffers" would be enough.
> 
> I made a patch to expose the number of WAL write caused by full of WAL 
> buffers.
> I'm going to submit this patch to commitfests.
> 
> As Fujii-san and Tsunakawa-san said, it expose the total number
> since I agreed that we don't need to separate the numbers for
> foreground processes and background ones.
> 
> By the way, do we need to add another metrics related to WAL?
> For example, is the total number of WAL writes to the buffers useful
> to calculate the dirty WAL write ratio?
> 
> Is it enough as a first step?

I forgot to rebase the current master.
I've attached the rebased patch.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/08/24 21:00, Masahiro Ikeda wrote:
> On 2020-08-24 20:45, Masahiro Ikeda wrote:
>> Hi, thanks for useful comments.
>>
>>>> I agree to expose the number of WAL write caused by full of WAL buffers.
>>>> It's helpful when tuning wal_buffers size. Haribabu separated that number
>>>> into two fields in his patch; one is the number of WAL write by backend,
>>>> and another is by background processes and workers. But I'm not sure
>>>> how useful such separation is. I'm ok with just one field for that number.
>>> I agree with you.  I don't think we need to separate the numbers for foreground processes and background ones.  WAL
bufferis a single resource.  So "Writes due to full WAL buffer are happening.  We may be able to boost performance by
increasingwal_buffers" would be enough.
 
>>
>> I made a patch to expose the number of WAL write caused by full of WAL buffers.
>> I'm going to submit this patch to commitfests.
>>
>> As Fujii-san and Tsunakawa-san said, it expose the total number
>> since I agreed that we don't need to separate the numbers for
>> foreground processes and background ones.
>>
>> By the way, do we need to add another metrics related to WAL?
>> For example, is the total number of WAL writes to the buffers useful
>> to calculate the dirty WAL write ratio?
>>
>> Is it enough as a first step?
> 
> I forgot to rebase the current master.
> I've attached the rebased patch.

Thanks for the patch!

+/* ----------
+ * Backend types
+ * ----------

You seem to forget to add "*/" into the above comment.
This issue could cause the following compiler warning.

../../src/include/pgstat.h:761:1: warning: '/*' within block comment [-Wcomment]


The contents of pg_stat_walwrites are reset when the server
is restarted. Isn't this problematic? IMO since pg_stat_walwrites
is a collected statistics view, basically its contents should be
kept even in the case of server restart.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
> +/* ----------
> + * Backend types
> + * ----------
> 
> You seem to forget to add "*/" into the above comment.
> This issue could cause the following compiler warning.
> ../../src/include/pgstat.h:761:1: warning: '/*' within block comment 
> [-Wcomment]

Thanks for the comment. I fixed.

> The contents of pg_stat_walwrites are reset when the server
> is restarted. Isn't this problematic? IMO since pg_stat_walwrites
> is a collected statistics view, basically its contents should be
> kept even in the case of server restart.

I agree your opinion.
I modified to use the statistics collector and persist the wal 
statistics.


I changed the view name from pg_stat_walwrites to pg_stat_walwriter.
I think it is better to match naming scheme with other views like 
pg_stat_bgwriter,
which is for bgwriter statistics but it has the statistics related to 
backend.

The pg_stat_walwriter is not security restricted now, so ordinary users 
can access it.
I has the same security level as pg_stat_archiver.If you have any 
comments, please let me know.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/09/02 18:56, Masahiro Ikeda wrote:
>> +/* ----------
>> + * Backend types
>> + * ----------
>>
>> You seem to forget to add "*/" into the above comment.
>> This issue could cause the following compiler warning.
>> ../../src/include/pgstat.h:761:1: warning: '/*' within block comment [-Wcomment]
> 
> Thanks for the comment. I fixed.

Thanks for the fix! But why are those comments necessary?


> 
>> The contents of pg_stat_walwrites are reset when the server
>> is restarted. Isn't this problematic? IMO since pg_stat_walwrites
>> is a collected statistics view, basically its contents should be
>> kept even in the case of server restart.
> 
> I agree your opinion.
> I modified to use the statistics collector and persist the wal statistics.
> 
> 
> I changed the view name from pg_stat_walwrites to pg_stat_walwriter.
> I think it is better to match naming scheme with other views like pg_stat_bgwriter,
> which is for bgwriter statistics but it has the statistics related to backend.

I prefer the view name pg_stat_walwriter for the consistency with
other view names. But we also have pg_stat_wal_receiver. Which
makes me think that maybe pg_stat_wal_writer is better for
the consistency. Thought? IMO either of them works for me.
I'd like to hear more opinons about this.


> 
> The pg_stat_walwriter is not security restricted now, so ordinary users can access it.
> I has the same security level as pg_stat_archiver.If you have any comments, please let me know.

+       <structfield>dirty_writes</structfield> <type>bigint</type>

I guess that the column name "dirty_writes" derived from
the DTrace probe name. Isn't this name confusing? We should
rename it to "wal_buffers_full" or something?


+/* ----------
+ * PgStat_MsgWalWriter            Sent by the walwriter to update statistics.

This comment seems not accurate because backends also send it.

+/*
+ * WAL writes statistics counter is updated in XLogWrite function
+ */
+extern PgStat_MsgWalWriter WalWriterStats;

This comment seems not right because the counter is not updated in XLogWrite().

+-- There will surely and maximum one record
+select count(*) = 1 as ok from pg_stat_walwriter;

What about changing this comment to "There must be only one record"?

+                    WalWriterStats.m_xlog_dirty_writes++;
                      LWLockRelease(WALWriteLock);

Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected
with WALWriteLock, isn't it better to increment that after releasing the lock?

+CREATE VIEW pg_stat_walwriter AS
+    SELECT
+        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
+        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
+
  CREATE VIEW pg_stat_progress_vacuum AS

In system_views.sql, the definition of pg_stat_walwriter should be
placed just after that of pg_stat_bgwriter not pg_stat_progress_analyze.

      }
-
      /*
       * We found an existing collector stats file. Read it and put all the

You seem to accidentally have removed the empty line here.

-                 errhint("Target must be \"archiver\" or \"bgwriter\".")));
+                 errhint("Target must be \"archiver\" or \"bgwriter\" or \"walwriter\".")));

There are two "or" in the message, but the former should be replaced with ","?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



RE: New statistics for tuning WAL buffer size

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Fujii Masao <masao.fujii@oss.nttdata.com>
> > I changed the view name from pg_stat_walwrites to pg_stat_walwriter.
> > I think it is better to match naming scheme with other views like
> pg_stat_bgwriter,
> > which is for bgwriter statistics but it has the statistics related to backend.
>
> I prefer the view name pg_stat_walwriter for the consistency with
> other view names. But we also have pg_stat_wal_receiver. Which
> makes me think that maybe pg_stat_wal_writer is better for
> the consistency. Thought? IMO either of them works for me.
> I'd like to hear more opinons about this.

I think pg_stat_bgwriter is now a misnomer, because it contains the backends' activity.  Likewise, pg_stat_walwriter
leadsto misunderstanding because its information is not limited to WAL writer. 

How about simply pg_stat_wal?  In the future, we may want to include WAL reads in this view, e.g. reading undo logs in
zheap.


Regards
Takayuki Tsunakawa






Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:
> From: Fujii Masao <masao.fujii@oss.nttdata.com>
>>> I changed the view name from pg_stat_walwrites to pg_stat_walwriter.
>>> I think it is better to match naming scheme with other views like
>> pg_stat_bgwriter,
>>> which is for bgwriter statistics but it has the statistics related to backend.
>>
>> I prefer the view name pg_stat_walwriter for the consistency with
>> other view names. But we also have pg_stat_wal_receiver. Which
>> makes me think that maybe pg_stat_wal_writer is better for
>> the consistency. Thought? IMO either of them works for me.
>> I'd like to hear more opinons about this.
> 
> I think pg_stat_bgwriter is now a misnomer, because it contains the backends' activity.  Likewise, pg_stat_walwriter
leadsto misunderstanding because its information is not limited to WAL writer.
 
> 
> How about simply pg_stat_wal?  In the future, we may want to include WAL reads in this view, e.g. reading undo logs
inzheap.
 

Sounds reasonable.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Magnus Hagander
Date:
On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:


On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:
> From: Fujii Masao <masao.fujii@oss.nttdata.com>
>>> I changed the view name from pg_stat_walwrites to pg_stat_walwriter.
>>> I think it is better to match naming scheme with other views like
>> pg_stat_bgwriter,
>>> which is for bgwriter statistics but it has the statistics related to backend.
>>
>> I prefer the view name pg_stat_walwriter for the consistency with
>> other view names. But we also have pg_stat_wal_receiver. Which
>> makes me think that maybe pg_stat_wal_writer is better for
>> the consistency. Thought? IMO either of them works for me.
>> I'd like to hear more opinons about this.
>
> I think pg_stat_bgwriter is now a misnomer, because it contains the backends' activity.  Likewise, pg_stat_walwriter leads to misunderstanding because its information is not limited to WAL writer.
>
> How about simply pg_stat_wal?  In the future, we may want to include WAL reads in this view, e.g. reading undo logs in zheap.

Sounds reasonable.

+1.

pg_stat_bgwriter has had the "wrong name" for quite some time now -- it became even more apparent when the checkpointer was split out to it's own process, and that's not exactly a recent change. And it had allocs in it from day one...

I think naming it for what the data in it is ("wal") rather than which process deals with it ("walwriter") is correct, unless the statistics can be known to only *ever* affect one type of process. (And then different processes can affect different columns in the view). As a general rule -- and that's from what I can tell exactly what's being proposed.

--

Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
Thanks for the review and advice!

On 2020-09-03 16:05, Fujii Masao wrote:
> On 2020/09/02 18:56, Masahiro Ikeda wrote:
>>> +/* ----------
>>> + * Backend types
>>> + * ----------
>>> 
>>> You seem to forget to add "*/" into the above comment.
>>> This issue could cause the following compiler warning.
>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block comment 
>>> [-Wcomment]
>> 
>> Thanks for the comment. I fixed.
> 
> Thanks for the fix! But why are those comments necessary?

Sorry about that. This comment is not necessary.
I removed it.

>> The pg_stat_walwriter is not security restricted now, so ordinary 
>> users can access it.
>> It has the same security level as pg_stat_archiver. If you have any 
>> comments, please let me know.
> 
> +       <structfield>dirty_writes</structfield> <type>bigint</type>
> 
> I guess that the column name "dirty_writes" derived from
> the DTrace probe name. Isn't this name confusing? We should
> rename it to "wal_buffers_full" or something?

I agree and rename it to "wal_buffers_full".

> +/* ----------
> + * PgStat_MsgWalWriter            Sent by the walwriter to update statistics.
> 
> This comment seems not accurate because backends also send it.
> 
> +/*
> + * WAL writes statistics counter is updated in XLogWrite function
> + */
> +extern PgStat_MsgWalWriter WalWriterStats;
> 
> This comment seems not right because the counter is not updated in 
> XLogWrite().

Right. I fixed it to "Sent by each backend and background workers to 
update WAL statistics."
In the future, other statistics will be included so I remove the 
function's name.


> +-- There will surely and maximum one record
> +select count(*) = 1 as ok from pg_stat_walwriter;
> 
> What about changing this comment to "There must be only one record"?

Thanks, I fixed.

> +                    WalWriterStats.m_xlog_dirty_writes++;
>                      LWLockRelease(WALWriteLock);
> 
> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected
> with WALWriteLock, isn't it better to increment that after releasing 
> the lock?

Thanks, I fixed.

> +CREATE VIEW pg_stat_walwriter AS
> +    SELECT
> +        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
> +        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
> +
>  CREATE VIEW pg_stat_progress_vacuum AS
> 
> In system_views.sql, the definition of pg_stat_walwriter should be
> placed just after that of pg_stat_bgwriter not 
> pg_stat_progress_analyze.

OK, I fixed it.

>      }
> -
>      /*
>       * We found an existing collector stats file. Read it and put all the
> 
> You seem to accidentally have removed the empty line here.

Sorry about that. I fixed it.

> -                 errhint("Target must be \"archiver\" or \"bgwriter\".")));
> +                 errhint("Target must be \"archiver\" or \"bgwriter\" or
> \"walwriter\".")));
> 
> There are two "or" in the message, but the former should be replaced 
> with ","?

Thanks, I fixed.


On 2020-09-05 18:40, Magnus Hagander wrote:
> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
> 
>> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:
>>> From: Fujii Masao <masao.fujii@oss.nttdata.com>
>>>>> I changed the view name from pg_stat_walwrites to
>> pg_stat_walwriter.
>>>>> I think it is better to match naming scheme with other views
>> like
>>>> pg_stat_bgwriter,
>>>>> which is for bgwriter statistics but it has the statistics
>> related to backend.
>>>> 
>>>> I prefer the view name pg_stat_walwriter for the consistency with
>>>> other view names. But we also have pg_stat_wal_receiver. Which
>>>> makes me think that maybe pg_stat_wal_writer is better for
>>>> the consistency. Thought? IMO either of them works for me.
>>>> I'd like to hear more opinons about this.
>>> 
>>> I think pg_stat_bgwriter is now a misnomer, because it contains
>> the backends' activity.  Likewise, pg_stat_walwriter leads to
>> misunderstanding because its information is not limited to WAL
>> writer.
>>> 
>>> How about simply pg_stat_wal?  In the future, we may want to
>> include WAL reads in this view, e.g. reading undo logs in zheap.
>> 
>> Sounds reasonable.
> 
> +1.
> 
> pg_stat_bgwriter has had the "wrong name" for quite some time now --
> it became even more apparent when the checkpointer was split out to
> it's own process, and that's not exactly a recent change. And it had
> allocs in it from day one...
> 
> I think naming it for what the data in it is ("wal") rather than which
> process deals with it ("walwriter") is correct, unless the statistics
> can be known to only *ever* affect one type of process. (And then
> different processes can affect different columns in the view). As a
> general rule -- and that's from what I can tell exactly what's being
> proposed.

Thanks for your comments. I agree with your opinions.
I changed the view name to "pg_stat_wal".


I fixed the code to send the WAL statistics from not only backend and 
walwriter
but also checkpointer, walsender and autovacuum worker.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/09/07 9:58, Masahiro Ikeda wrote:
> Thanks for the review and advice!
> 
> On 2020-09-03 16:05, Fujii Masao wrote:
>> On 2020/09/02 18:56, Masahiro Ikeda wrote:
>>>> +/* ----------
>>>> + * Backend types
>>>> + * ----------
>>>>
>>>> You seem to forget to add "*/" into the above comment.
>>>> This issue could cause the following compiler warning.
>>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block comment [-Wcomment]
>>>
>>> Thanks for the comment. I fixed.
>>
>> Thanks for the fix! But why are those comments necessary?
> 
> Sorry about that. This comment is not necessary.
> I removed it.
> 
>>> The pg_stat_walwriter is not security restricted now, so ordinary users can access it.
>>> It has the same security level as pg_stat_archiver. If you have any comments, please let me know.
>>
>> +       <structfield>dirty_writes</structfield> <type>bigint</type>
>>
>> I guess that the column name "dirty_writes" derived from
>> the DTrace probe name. Isn't this name confusing? We should
>> rename it to "wal_buffers_full" or something?
> 
> I agree and rename it to "wal_buffers_full".
> 
>> +/* ----------
>> + * PgStat_MsgWalWriter            Sent by the walwriter to update statistics.
>>
>> This comment seems not accurate because backends also send it.
>>
>> +/*
>> + * WAL writes statistics counter is updated in XLogWrite function
>> + */
>> +extern PgStat_MsgWalWriter WalWriterStats;
>>
>> This comment seems not right because the counter is not updated in XLogWrite().
> 
> Right. I fixed it to "Sent by each backend and background workers to update WAL statistics."
> In the future, other statistics will be included so I remove the function's name.
> 
> 
>> +-- There will surely and maximum one record
>> +select count(*) = 1 as ok from pg_stat_walwriter;
>>
>> What about changing this comment to "There must be only one record"?
> 
> Thanks, I fixed.
> 
>> +                    WalWriterStats.m_xlog_dirty_writes++;
>>                      LWLockRelease(WALWriteLock);
>>
>> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected
>> with WALWriteLock, isn't it better to increment that after releasing the lock?
> 
> Thanks, I fixed.
> 
>> +CREATE VIEW pg_stat_walwriter AS
>> +    SELECT
>> +        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
>> +        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
>> +
>>  CREATE VIEW pg_stat_progress_vacuum AS
>>
>> In system_views.sql, the definition of pg_stat_walwriter should be
>> placed just after that of pg_stat_bgwriter not pg_stat_progress_analyze.
> 
> OK, I fixed it.
> 
>>      }
>> -
>>      /*
>>       * We found an existing collector stats file. Read it and put all the
>>
>> You seem to accidentally have removed the empty line here.
> 
> Sorry about that. I fixed it.
> 
>> -                 errhint("Target must be \"archiver\" or \"bgwriter\".")));
>> +                 errhint("Target must be \"archiver\" or \"bgwriter\" or
>> \"walwriter\".")));
>>
>> There are two "or" in the message, but the former should be replaced with ","?
> 
> Thanks, I fixed.
> 
> 
> On 2020-09-05 18:40, Magnus Hagander wrote:
>> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
>> <masao.fujii@oss.nttdata.com> wrote:
>>
>>> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:
>>>> From: Fujii Masao <masao.fujii@oss.nttdata.com>
>>>>>> I changed the view name from pg_stat_walwrites to
>>> pg_stat_walwriter.
>>>>>> I think it is better to match naming scheme with other views
>>> like
>>>>> pg_stat_bgwriter,
>>>>>> which is for bgwriter statistics but it has the statistics
>>> related to backend.
>>>>>
>>>>> I prefer the view name pg_stat_walwriter for the consistency with
>>>>> other view names. But we also have pg_stat_wal_receiver. Which
>>>>> makes me think that maybe pg_stat_wal_writer is better for
>>>>> the consistency. Thought? IMO either of them works for me.
>>>>> I'd like to hear more opinons about this.
>>>>
>>>> I think pg_stat_bgwriter is now a misnomer, because it contains
>>> the backends' activity.  Likewise, pg_stat_walwriter leads to
>>> misunderstanding because its information is not limited to WAL
>>> writer.
>>>>
>>>> How about simply pg_stat_wal?  In the future, we may want to
>>> include WAL reads in this view, e.g. reading undo logs in zheap.
>>>
>>> Sounds reasonable.
>>
>> +1.
>>
>> pg_stat_bgwriter has had the "wrong name" for quite some time now --
>> it became even more apparent when the checkpointer was split out to
>> it's own process, and that's not exactly a recent change. And it had
>> allocs in it from day one...
>>
>> I think naming it for what the data in it is ("wal") rather than which
>> process deals with it ("walwriter") is correct, unless the statistics
>> can be known to only *ever* affect one type of process. (And then
>> different processes can affect different columns in the view). As a
>> general rule -- and that's from what I can tell exactly what's being
>> proposed.
> 
> Thanks for your comments. I agree with your opinions.
> I changed the view name to "pg_stat_wal".
> 
> 
> I fixed the code to send the WAL statistics from not only backend and walwriter
> but also checkpointer, walsender and autovacuum worker.

Good point! Thanks for updating the patch!


@@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
                           onerel->rd_rel->relisshared,
                           Max(new_live_tuples, 0),
                           vacrelstats->new_dead_tuples);
+    pgstat_send_wal();

I guess that you changed heap_vacuum_rel() as above so that autovacuum
workers can send WAL stats. But heap_vacuum_rel() can be called by
the processes (e.g., backends) other than autovacuum workers? Also
what happens if autovacuum workers just do ANALYZE only? In that case,
heap_vacuum_rel() may not be called.

Currently autovacuum worker reports the stats at the exit via
pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker
is not the process that basically keeps running during the service. It exits
after it does vacuum or analyze. So ISTM that it's not bad to report the stats
only at the exit, in autovacuum worker case. There is no need to add extra
code for WAL stats report by autovacuum worker. Thought?


@@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc)
          else
              RecentFlushPtr = GetXLogReplayRecPtr(NULL);
  
+        /* Send wal statistics */
+        pgstat_send_wal();

AFAIR logical walsender uses three loops in WalSndLoop(), WalSndWriteData()
and WalSndWaitForWal(). But could you tell me why added pgstat_send_wal()
into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is the best
for that purpose.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
On 2020-09-07 16:19, Fujii Masao wrote:
> On 2020/09/07 9:58, Masahiro Ikeda wrote:
>> Thanks for the review and advice!
>> 
>> On 2020-09-03 16:05, Fujii Masao wrote:
>>> On 2020/09/02 18:56, Masahiro Ikeda wrote:
>>>>> +/* ----------
>>>>> + * Backend types
>>>>> + * ----------
>>>>> 
>>>>> You seem to forget to add "*/" into the above comment.
>>>>> This issue could cause the following compiler warning.
>>>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block 
>>>>> comment [-Wcomment]
>>>> 
>>>> Thanks for the comment. I fixed.
>>> 
>>> Thanks for the fix! But why are those comments necessary?
>> 
>> Sorry about that. This comment is not necessary.
>> I removed it.
>> 
>>>> The pg_stat_walwriter is not security restricted now, so ordinary 
>>>> users can access it.
>>>> It has the same security level as pg_stat_archiver. If you have any 
>>>> comments, please let me know.
>>> 
>>> +       <structfield>dirty_writes</structfield> <type>bigint</type>
>>> 
>>> I guess that the column name "dirty_writes" derived from
>>> the DTrace probe name. Isn't this name confusing? We should
>>> rename it to "wal_buffers_full" or something?
>> 
>> I agree and rename it to "wal_buffers_full".
>> 
>>> +/* ----------
>>> + * PgStat_MsgWalWriter            Sent by the walwriter to update 
>>> statistics.
>>> 
>>> This comment seems not accurate because backends also send it.
>>> 
>>> +/*
>>> + * WAL writes statistics counter is updated in XLogWrite function
>>> + */
>>> +extern PgStat_MsgWalWriter WalWriterStats;
>>> 
>>> This comment seems not right because the counter is not updated in 
>>> XLogWrite().
>> 
>> Right. I fixed it to "Sent by each backend and background workers to 
>> update WAL statistics."
>> In the future, other statistics will be included so I remove the 
>> function's name.
>> 
>> 
>>> +-- There will surely and maximum one record
>>> +select count(*) = 1 as ok from pg_stat_walwriter;
>>> 
>>> What about changing this comment to "There must be only one record"?
>> 
>> Thanks, I fixed.
>> 
>>> +                    WalWriterStats.m_xlog_dirty_writes++;
>>>                      LWLockRelease(WALWriteLock);
>>> 
>>> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected
>>> with WALWriteLock, isn't it better to increment that after releasing 
>>> the lock?
>> 
>> Thanks, I fixed.
>> 
>>> +CREATE VIEW pg_stat_walwriter AS
>>> +    SELECT
>>> +        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
>>> +        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
>>> +
>>>  CREATE VIEW pg_stat_progress_vacuum AS
>>> 
>>> In system_views.sql, the definition of pg_stat_walwriter should be
>>> placed just after that of pg_stat_bgwriter not 
>>> pg_stat_progress_analyze.
>> 
>> OK, I fixed it.
>> 
>>>      }
>>> -
>>>      /*
>>>       * We found an existing collector stats file. Read it and put 
>>> all the
>>> 
>>> You seem to accidentally have removed the empty line here.
>> 
>> Sorry about that. I fixed it.
>> 
>>> -                 errhint("Target must be \"archiver\" or 
>>> \"bgwriter\".")));
>>> +                 errhint("Target must be \"archiver\" or 
>>> \"bgwriter\" or
>>> \"walwriter\".")));
>>> 
>>> There are two "or" in the message, but the former should be replaced 
>>> with ","?
>> 
>> Thanks, I fixed.
>> 
>> 
>> On 2020-09-05 18:40, Magnus Hagander wrote:
>>> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
>>> <masao.fujii@oss.nttdata.com> wrote:
>>> 
>>>> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:
>>>>> From: Fujii Masao <masao.fujii@oss.nttdata.com>
>>>>>>> I changed the view name from pg_stat_walwrites to
>>>> pg_stat_walwriter.
>>>>>>> I think it is better to match naming scheme with other views
>>>> like
>>>>>> pg_stat_bgwriter,
>>>>>>> which is for bgwriter statistics but it has the statistics
>>>> related to backend.
>>>>>> 
>>>>>> I prefer the view name pg_stat_walwriter for the consistency with
>>>>>> other view names. But we also have pg_stat_wal_receiver. Which
>>>>>> makes me think that maybe pg_stat_wal_writer is better for
>>>>>> the consistency. Thought? IMO either of them works for me.
>>>>>> I'd like to hear more opinons about this.
>>>>> 
>>>>> I think pg_stat_bgwriter is now a misnomer, because it contains
>>>> the backends' activity.  Likewise, pg_stat_walwriter leads to
>>>> misunderstanding because its information is not limited to WAL
>>>> writer.
>>>>> 
>>>>> How about simply pg_stat_wal?  In the future, we may want to
>>>> include WAL reads in this view, e.g. reading undo logs in zheap.
>>>> 
>>>> Sounds reasonable.
>>> 
>>> +1.
>>> 
>>> pg_stat_bgwriter has had the "wrong name" for quite some time now --
>>> it became even more apparent when the checkpointer was split out to
>>> it's own process, and that's not exactly a recent change. And it had
>>> allocs in it from day one...
>>> 
>>> I think naming it for what the data in it is ("wal") rather than 
>>> which
>>> process deals with it ("walwriter") is correct, unless the statistics
>>> can be known to only *ever* affect one type of process. (And then
>>> different processes can affect different columns in the view). As a
>>> general rule -- and that's from what I can tell exactly what's being
>>> proposed.
>> 
>> Thanks for your comments. I agree with your opinions.
>> I changed the view name to "pg_stat_wal".
>> 
>> 
>> I fixed the code to send the WAL statistics from not only backend and 
>> walwriter
>> but also checkpointer, walsender and autovacuum worker.
> 
> Good point! Thanks for updating the patch!
> 
> 
> @@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams 
> *params,
>                           onerel->rd_rel->relisshared,
>                           Max(new_live_tuples, 0),
>                           vacrelstats->new_dead_tuples);
> +    pgstat_send_wal();
> 
> I guess that you changed heap_vacuum_rel() as above so that autovacuum
> workers can send WAL stats. But heap_vacuum_rel() can be called by
> the processes (e.g., backends) other than autovacuum workers? Also
> what happens if autovacuum workers just do ANALYZE only? In that case,
> heap_vacuum_rel() may not be called.
> 
> Currently autovacuum worker reports the stats at the exit via
> pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker
> is not the process that basically keeps running during the service. It 
> exits
> after it does vacuum or analyze. So ISTM that it's not bad to report 
> the stats
> only at the exit, in autovacuum worker case. There is no need to add 
> extra
> code for WAL stats report by autovacuum worker. Thought?

Thanks, I understood. I removed this code.

> 
> @@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc)
>          else
>              RecentFlushPtr = GetXLogReplayRecPtr(NULL);
>  +        /* Send wal statistics */
> +        pgstat_send_wal();
> 
> AFAIR logical walsender uses three loops in WalSndLoop(), 
> WalSndWriteData()
> and WalSndWaitForWal(). But could you tell me why added 
> pgstat_send_wal()
> into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is the 
> best
> for that purpose.

I checked what function calls XLogBackgroundFlush() which calls
AdvanceXLInsertBuffer() to increment m_wal_buffers_full.

I found that WalSndWaitForWal() calls it, so I added it.
Is it better to move it in WalSndLoop() like the attached patch?

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/09/09 13:57, Masahiro Ikeda wrote:
> On 2020-09-07 16:19, Fujii Masao wrote:
>> On 2020/09/07 9:58, Masahiro Ikeda wrote:
>>> Thanks for the review and advice!
>>>
>>> On 2020-09-03 16:05, Fujii Masao wrote:
>>>> On 2020/09/02 18:56, Masahiro Ikeda wrote:
>>>>>> +/* ----------
>>>>>> + * Backend types
>>>>>> + * ----------
>>>>>>
>>>>>> You seem to forget to add "*/" into the above comment.
>>>>>> This issue could cause the following compiler warning.
>>>>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block comment [-Wcomment]
>>>>>
>>>>> Thanks for the comment. I fixed.
>>>>
>>>> Thanks for the fix! But why are those comments necessary?
>>>
>>> Sorry about that. This comment is not necessary.
>>> I removed it.
>>>
>>>>> The pg_stat_walwriter is not security restricted now, so ordinary users can access it.
>>>>> It has the same security level as pg_stat_archiver. If you have any comments, please let me know.
>>>>
>>>> +       <structfield>dirty_writes</structfield> <type>bigint</type>
>>>>
>>>> I guess that the column name "dirty_writes" derived from
>>>> the DTrace probe name. Isn't this name confusing? We should
>>>> rename it to "wal_buffers_full" or something?
>>>
>>> I agree and rename it to "wal_buffers_full".
>>>
>>>> +/* ----------
>>>> + * PgStat_MsgWalWriter            Sent by the walwriter to update statistics.
>>>>
>>>> This comment seems not accurate because backends also send it.
>>>>
>>>> +/*
>>>> + * WAL writes statistics counter is updated in XLogWrite function
>>>> + */
>>>> +extern PgStat_MsgWalWriter WalWriterStats;
>>>>
>>>> This comment seems not right because the counter is not updated in XLogWrite().
>>>
>>> Right. I fixed it to "Sent by each backend and background workers to update WAL statistics."
>>> In the future, other statistics will be included so I remove the function's name.
>>>
>>>
>>>> +-- There will surely and maximum one record
>>>> +select count(*) = 1 as ok from pg_stat_walwriter;
>>>>
>>>> What about changing this comment to "There must be only one record"?
>>>
>>> Thanks, I fixed.
>>>
>>>> +                    WalWriterStats.m_xlog_dirty_writes++;
>>>>                      LWLockRelease(WALWriteLock);
>>>>
>>>> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected
>>>> with WALWriteLock, isn't it better to increment that after releasing the lock?
>>>
>>> Thanks, I fixed.
>>>
>>>> +CREATE VIEW pg_stat_walwriter AS
>>>> +    SELECT
>>>> +        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
>>>> +        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
>>>> +
>>>>  CREATE VIEW pg_stat_progress_vacuum AS
>>>>
>>>> In system_views.sql, the definition of pg_stat_walwriter should be
>>>> placed just after that of pg_stat_bgwriter not pg_stat_progress_analyze.
>>>
>>> OK, I fixed it.
>>>
>>>>      }
>>>> -
>>>>      /*
>>>>       * We found an existing collector stats file. Read it and put all the
>>>>
>>>> You seem to accidentally have removed the empty line here.
>>>
>>> Sorry about that. I fixed it.
>>>
>>>> -                 errhint("Target must be \"archiver\" or \"bgwriter\".")));
>>>> +                 errhint("Target must be \"archiver\" or \"bgwriter\" or
>>>> \"walwriter\".")));
>>>>
>>>> There are two "or" in the message, but the former should be replaced with ","?
>>>
>>> Thanks, I fixed.
>>>
>>>
>>> On 2020-09-05 18:40, Magnus Hagander wrote:
>>>> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
>>>> <masao.fujii@oss.nttdata.com> wrote:
>>>>
>>>>> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:
>>>>>> From: Fujii Masao <masao.fujii@oss.nttdata.com>
>>>>>>>> I changed the view name from pg_stat_walwrites to
>>>>> pg_stat_walwriter.
>>>>>>>> I think it is better to match naming scheme with other views
>>>>> like
>>>>>>> pg_stat_bgwriter,
>>>>>>>> which is for bgwriter statistics but it has the statistics
>>>>> related to backend.
>>>>>>>
>>>>>>> I prefer the view name pg_stat_walwriter for the consistency with
>>>>>>> other view names. But we also have pg_stat_wal_receiver. Which
>>>>>>> makes me think that maybe pg_stat_wal_writer is better for
>>>>>>> the consistency. Thought? IMO either of them works for me.
>>>>>>> I'd like to hear more opinons about this.
>>>>>>
>>>>>> I think pg_stat_bgwriter is now a misnomer, because it contains
>>>>> the backends' activity.  Likewise, pg_stat_walwriter leads to
>>>>> misunderstanding because its information is not limited to WAL
>>>>> writer.
>>>>>>
>>>>>> How about simply pg_stat_wal?  In the future, we may want to
>>>>> include WAL reads in this view, e.g. reading undo logs in zheap.
>>>>>
>>>>> Sounds reasonable.
>>>>
>>>> +1.
>>>>
>>>> pg_stat_bgwriter has had the "wrong name" for quite some time now --
>>>> it became even more apparent when the checkpointer was split out to
>>>> it's own process, and that's not exactly a recent change. And it had
>>>> allocs in it from day one...
>>>>
>>>> I think naming it for what the data in it is ("wal") rather than which
>>>> process deals with it ("walwriter") is correct, unless the statistics
>>>> can be known to only *ever* affect one type of process. (And then
>>>> different processes can affect different columns in the view). As a
>>>> general rule -- and that's from what I can tell exactly what's being
>>>> proposed.
>>>
>>> Thanks for your comments. I agree with your opinions.
>>> I changed the view name to "pg_stat_wal".
>>>
>>>
>>> I fixed the code to send the WAL statistics from not only backend and walwriter
>>> but also checkpointer, walsender and autovacuum worker.
>>
>> Good point! Thanks for updating the patch!
>>
>>
>> @@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
>>                           onerel->rd_rel->relisshared,
>>                           Max(new_live_tuples, 0),
>>                           vacrelstats->new_dead_tuples);
>> +    pgstat_send_wal();
>>
>> I guess that you changed heap_vacuum_rel() as above so that autovacuum
>> workers can send WAL stats. But heap_vacuum_rel() can be called by
>> the processes (e.g., backends) other than autovacuum workers? Also
>> what happens if autovacuum workers just do ANALYZE only? In that case,
>> heap_vacuum_rel() may not be called.
>>
>> Currently autovacuum worker reports the stats at the exit via
>> pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker
>> is not the process that basically keeps running during the service. It exits
>> after it does vacuum or analyze. So ISTM that it's not bad to report the stats
>> only at the exit, in autovacuum worker case. There is no need to add extra
>> code for WAL stats report by autovacuum worker. Thought?
> 
> Thanks, I understood. I removed this code.
> 
>>
>> @@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc)
>>          else
>>              RecentFlushPtr = GetXLogReplayRecPtr(NULL);
>>  +        /* Send wal statistics */
>> +        pgstat_send_wal();
>>
>> AFAIR logical walsender uses three loops in WalSndLoop(), WalSndWriteData()
>> and WalSndWaitForWal(). But could you tell me why added pgstat_send_wal()
>> into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is the best
>> for that purpose.
> 
> I checked what function calls XLogBackgroundFlush() which calls
> AdvanceXLInsertBuffer() to increment m_wal_buffers_full.
> 
> I found that WalSndWaitForWal() calls it, so I added it.

Ok. But XLogBackgroundFlush() calls AdvanceXLInsertBuffer() wit the second argument opportunistic=true, so in this case
WALwrite by wal_buffers full seems to never happen. Right? If this understanding is right, WalSndWaitForWal() doesn't
needto call pgstat_send_wal(). Probably also walwriter doesn't need to do that.
 

The logical rep walsender can generate WAL and call AdvanceXLInsertBuffer() when it executes the replication commands
likeCREATE_REPLICATION_SLOT. But this case is already covered by pgstat_report_activity()->pgstat_send_wal() called in
PostgresMain(),with your patch. So no more calls to pgstat_send_wal() seems necessary for logical rep walsender. 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Kyotaro Horiguchi
Date:
Hello.

At Wed, 09 Sep 2020 13:57:37 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in 
> I checked what function calls XLogBackgroundFlush() which calls
> AdvanceXLInsertBuffer() to increment m_wal_buffers_full.
> 
> I found that WalSndWaitForWal() calls it, so I added it.
> Is it better to move it in WalSndLoop() like the attached patch?

By the way, we are counting some wal-related numbers in
pgWalUsage.(bytes, records, fpi).  Since now that we are going to have
a new view related to WAL statistics, wouln't it be more useful to
show them together in the view?

(Another reason to propose this is that a substantially one-column
 table may look not-great..)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/09/11 12:17, Kyotaro Horiguchi wrote:
> Hello.
> 
> At Wed, 09 Sep 2020 13:57:37 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in
>> I checked what function calls XLogBackgroundFlush() which calls
>> AdvanceXLInsertBuffer() to increment m_wal_buffers_full.
>>
>> I found that WalSndWaitForWal() calls it, so I added it.
>> Is it better to move it in WalSndLoop() like the attached patch?
> 
> By the way, we are counting some wal-related numbers in
> pgWalUsage.(bytes, records, fpi).  Since now that we are going to have
> a new view related to WAL statistics, wouln't it be more useful to
> show them together in the view?

Probably yes. But IMO it's better to commit the current patch first, and then add those stats into the view after
confirmingexposing them is useful.
 

BTW, to expose the total WAL bytes, I think it's better to just save the LSN at when pg_stat_wal is reset rather than
countingpgWalUsage.bytes. If we do that, we can easily total WAL bytes by subtracting that LSN from the latest LSN.
Alsosaving the LSN at the reset timing causes obviously less overhead than counting pgWalUsage.bytes.
 


> (Another reason to propose this is that a substantially one-column
>   table may look not-great..)

I'm ok with such "small" view. But if this is really problem, I'm ok to expose only functions
pg_stat_get_wal_buffers_full()and pg_stat_get_wal_stat_reset_time(), without the view, at first.
 

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Kyotaro Horiguchi
Date:
At Fri, 11 Sep 2020 13:48:49 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2020/09/11 12:17, Kyotaro Horiguchi wrote:
> > Hello.
> > At Wed, 09 Sep 2020 13:57:37 +0900, Masahiro Ikeda
> > <ikedamsh@oss.nttdata.com> wrote in
> >> I checked what function calls XLogBackgroundFlush() which calls
> >> AdvanceXLInsertBuffer() to increment m_wal_buffers_full.
> >>
> >> I found that WalSndWaitForWal() calls it, so I added it.
> >> Is it better to move it in WalSndLoop() like the attached patch?
> > By the way, we are counting some wal-related numbers in
> > pgWalUsage.(bytes, records, fpi).  Since now that we are going to have
> > a new view related to WAL statistics, wouln't it be more useful to
> > show them together in the view?
> 
> Probably yes. But IMO it's better to commit the current patch first,
> and then add those stats into the view after confirming exposing them
> is useful.

I'm fine with that.

> BTW, to expose the total WAL bytes, I think it's better to just save
> the LSN at when pg_stat_wal is reset rather than counting
> pgWalUsage.bytes. If we do that, we can easily total WAL bytes by
> subtracting that LSN from the latest LSN. Also saving the LSN at the
> reset timing causes obviously less overhead than counting
> pgWalUsage.bytes.

pgWalUsage is always counting so it doesn't add any overhead. But
since it cannot be reset, the value needs to be saved at reset time
like LSN. I don't mind either way we take from performance
perspective.

> > (Another reason to propose this is that a substantially one-column
> >   table may look not-great..)
> 
> I'm ok with such "small" view. But if this is really problem, I'm ok
> to expose only functions pg_stat_get_wal_buffers_full() and
> pg_stat_get_wal_stat_reset_time(), without the view, at first.

I don't mind that we have such small views as far as it is promised to
grow up:p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/09/11 16:54, Kyotaro Horiguchi wrote:
> At Fri, 11 Sep 2020 13:48:49 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>
>>
>> On 2020/09/11 12:17, Kyotaro Horiguchi wrote:
>>> Hello.
>>> At Wed, 09 Sep 2020 13:57:37 +0900, Masahiro Ikeda
>>> <ikedamsh@oss.nttdata.com> wrote in
>>>> I checked what function calls XLogBackgroundFlush() which calls
>>>> AdvanceXLInsertBuffer() to increment m_wal_buffers_full.
>>>>
>>>> I found that WalSndWaitForWal() calls it, so I added it.
>>>> Is it better to move it in WalSndLoop() like the attached patch?
>>> By the way, we are counting some wal-related numbers in
>>> pgWalUsage.(bytes, records, fpi).  Since now that we are going to have
>>> a new view related to WAL statistics, wouln't it be more useful to
>>> show them together in the view?
>>
>> Probably yes. But IMO it's better to commit the current patch first,
>> and then add those stats into the view after confirming exposing them
>> is useful.
> 
> I'm fine with that.
> 
>> BTW, to expose the total WAL bytes, I think it's better to just save
>> the LSN at when pg_stat_wal is reset rather than counting
>> pgWalUsage.bytes. If we do that, we can easily total WAL bytes by
>> subtracting that LSN from the latest LSN. Also saving the LSN at the
>> reset timing causes obviously less overhead than counting
>> pgWalUsage.bytes.
> 
> pgWalUsage is always counting so it doesn't add any overhead.

Yes. And I'm a bit concerned about the overhead by frequent message sent for WAL bytes to the stats collector.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
On 2020-09-11 01:40, Fujii Masao wrote:
> On 2020/09/09 13:57, Masahiro Ikeda wrote:
>> On 2020-09-07 16:19, Fujii Masao wrote:
>>> On 2020/09/07 9:58, Masahiro Ikeda wrote:
>>>> Thanks for the review and advice!
>>>> 
>>>> On 2020-09-03 16:05, Fujii Masao wrote:
>>>>> On 2020/09/02 18:56, Masahiro Ikeda wrote:
>>>>>>> +/* ----------
>>>>>>> + * Backend types
>>>>>>> + * ----------
>>>>>>> 
>>>>>>> You seem to forget to add "*/" into the above comment.
>>>>>>> This issue could cause the following compiler warning.
>>>>>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block 
>>>>>>> comment [-Wcomment]
>>>>>> 
>>>>>> Thanks for the comment. I fixed.
>>>>> 
>>>>> Thanks for the fix! But why are those comments necessary?
>>>> 
>>>> Sorry about that. This comment is not necessary.
>>>> I removed it.
>>>> 
>>>>>> The pg_stat_walwriter is not security restricted now, so ordinary 
>>>>>> users can access it.
>>>>>> It has the same security level as pg_stat_archiver. If you have 
>>>>>> any comments, please let me know.
>>>>> 
>>>>> +       <structfield>dirty_writes</structfield> <type>bigint</type>
>>>>> 
>>>>> I guess that the column name "dirty_writes" derived from
>>>>> the DTrace probe name. Isn't this name confusing? We should
>>>>> rename it to "wal_buffers_full" or something?
>>>> 
>>>> I agree and rename it to "wal_buffers_full".
>>>> 
>>>>> +/* ----------
>>>>> + * PgStat_MsgWalWriter            Sent by the walwriter to update 
>>>>> statistics.
>>>>> 
>>>>> This comment seems not accurate because backends also send it.
>>>>> 
>>>>> +/*
>>>>> + * WAL writes statistics counter is updated in XLogWrite function
>>>>> + */
>>>>> +extern PgStat_MsgWalWriter WalWriterStats;
>>>>> 
>>>>> This comment seems not right because the counter is not updated in 
>>>>> XLogWrite().
>>>> 
>>>> Right. I fixed it to "Sent by each backend and background workers to 
>>>> update WAL statistics."
>>>> In the future, other statistics will be included so I remove the 
>>>> function's name.
>>>> 
>>>> 
>>>>> +-- There will surely and maximum one record
>>>>> +select count(*) = 1 as ok from pg_stat_walwriter;
>>>>> 
>>>>> What about changing this comment to "There must be only one 
>>>>> record"?
>>>> 
>>>> Thanks, I fixed.
>>>> 
>>>>> +                    WalWriterStats.m_xlog_dirty_writes++;
>>>>>                      LWLockRelease(WALWriteLock);
>>>>> 
>>>>> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be 
>>>>> protected
>>>>> with WALWriteLock, isn't it better to increment that after 
>>>>> releasing the lock?
>>>> 
>>>> Thanks, I fixed.
>>>> 
>>>>> +CREATE VIEW pg_stat_walwriter AS
>>>>> +    SELECT
>>>>> +        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
>>>>> +        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
>>>>> +
>>>>>  CREATE VIEW pg_stat_progress_vacuum AS
>>>>> 
>>>>> In system_views.sql, the definition of pg_stat_walwriter should be
>>>>> placed just after that of pg_stat_bgwriter not 
>>>>> pg_stat_progress_analyze.
>>>> 
>>>> OK, I fixed it.
>>>> 
>>>>>      }
>>>>> -
>>>>>      /*
>>>>>       * We found an existing collector stats file. Read it and put 
>>>>> all the
>>>>> 
>>>>> You seem to accidentally have removed the empty line here.
>>>> 
>>>> Sorry about that. I fixed it.
>>>> 
>>>>> -                 errhint("Target must be \"archiver\" or 
>>>>> \"bgwriter\".")));
>>>>> +                 errhint("Target must be \"archiver\" or 
>>>>> \"bgwriter\" or
>>>>> \"walwriter\".")));
>>>>> 
>>>>> There are two "or" in the message, but the former should be 
>>>>> replaced with ","?
>>>> 
>>>> Thanks, I fixed.
>>>> 
>>>> 
>>>> On 2020-09-05 18:40, Magnus Hagander wrote:
>>>>> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
>>>>> <masao.fujii@oss.nttdata.com> wrote:
>>>>> 
>>>>>> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:
>>>>>>> From: Fujii Masao <masao.fujii@oss.nttdata.com>
>>>>>>>>> I changed the view name from pg_stat_walwrites to
>>>>>> pg_stat_walwriter.
>>>>>>>>> I think it is better to match naming scheme with other views
>>>>>> like
>>>>>>>> pg_stat_bgwriter,
>>>>>>>>> which is for bgwriter statistics but it has the statistics
>>>>>> related to backend.
>>>>>>>> 
>>>>>>>> I prefer the view name pg_stat_walwriter for the consistency 
>>>>>>>> with
>>>>>>>> other view names. But we also have pg_stat_wal_receiver. Which
>>>>>>>> makes me think that maybe pg_stat_wal_writer is better for
>>>>>>>> the consistency. Thought? IMO either of them works for me.
>>>>>>>> I'd like to hear more opinons about this.
>>>>>>> 
>>>>>>> I think pg_stat_bgwriter is now a misnomer, because it contains
>>>>>> the backends' activity.  Likewise, pg_stat_walwriter leads to
>>>>>> misunderstanding because its information is not limited to WAL
>>>>>> writer.
>>>>>>> 
>>>>>>> How about simply pg_stat_wal?  In the future, we may want to
>>>>>> include WAL reads in this view, e.g. reading undo logs in zheap.
>>>>>> 
>>>>>> Sounds reasonable.
>>>>> 
>>>>> +1.
>>>>> 
>>>>> pg_stat_bgwriter has had the "wrong name" for quite some time now 
>>>>> --
>>>>> it became even more apparent when the checkpointer was split out to
>>>>> it's own process, and that's not exactly a recent change. And it 
>>>>> had
>>>>> allocs in it from day one...
>>>>> 
>>>>> I think naming it for what the data in it is ("wal") rather than 
>>>>> which
>>>>> process deals with it ("walwriter") is correct, unless the 
>>>>> statistics
>>>>> can be known to only *ever* affect one type of process. (And then
>>>>> different processes can affect different columns in the view). As a
>>>>> general rule -- and that's from what I can tell exactly what's 
>>>>> being
>>>>> proposed.
>>>> 
>>>> Thanks for your comments. I agree with your opinions.
>>>> I changed the view name to "pg_stat_wal".
>>>> 
>>>> 
>>>> I fixed the code to send the WAL statistics from not only backend 
>>>> and walwriter
>>>> but also checkpointer, walsender and autovacuum worker.
>>> 
>>> Good point! Thanks for updating the patch!
>>> 
>>> 
>>> @@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams 
>>> *params,
>>>                           onerel->rd_rel->relisshared,
>>>                           Max(new_live_tuples, 0),
>>>                           vacrelstats->new_dead_tuples);
>>> +    pgstat_send_wal();
>>> 
>>> I guess that you changed heap_vacuum_rel() as above so that 
>>> autovacuum
>>> workers can send WAL stats. But heap_vacuum_rel() can be called by
>>> the processes (e.g., backends) other than autovacuum workers? Also
>>> what happens if autovacuum workers just do ANALYZE only? In that 
>>> case,
>>> heap_vacuum_rel() may not be called.
>>> 
>>> Currently autovacuum worker reports the stats at the exit via
>>> pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker
>>> is not the process that basically keeps running during the service. 
>>> It exits
>>> after it does vacuum or analyze. So ISTM that it's not bad to report 
>>> the stats
>>> only at the exit, in autovacuum worker case. There is no need to add 
>>> extra
>>> code for WAL stats report by autovacuum worker. Thought?
>> 
>> Thanks, I understood. I removed this code.
>> 
>>> 
>>> @@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc)
>>>          else
>>>              RecentFlushPtr = GetXLogReplayRecPtr(NULL);
>>>  +        /* Send wal statistics */
>>> +        pgstat_send_wal();
>>> 
>>> AFAIR logical walsender uses three loops in WalSndLoop(), 
>>> WalSndWriteData()
>>> and WalSndWaitForWal(). But could you tell me why added 
>>> pgstat_send_wal()
>>> into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is 
>>> the best
>>> for that purpose.
>> 
>> I checked what function calls XLogBackgroundFlush() which calls
>> AdvanceXLInsertBuffer() to increment m_wal_buffers_full.
>> 
>> I found that WalSndWaitForWal() calls it, so I added it.
> 
> Ok. But XLogBackgroundFlush() calls AdvanceXLInsertBuffer() wit the
> second argument opportunistic=true, so in this case WAL write by
> wal_buffers full seems to never happen. Right? If this understanding
> is right, WalSndWaitForWal() doesn't need to call pgstat_send_wal().
> Probably also walwriter doesn't need to do that.
> 
> The logical rep walsender can generate WAL and call
> AdvanceXLInsertBuffer() when it executes the replication commands like
> CREATE_REPLICATION_SLOT. But this case is already covered by
> pgstat_report_activity()->pgstat_send_wal() called in PostgresMain(),
> with your patch. So no more calls to pgstat_send_wal() seems necessary
> for logical rep walsender.

Thanks for your reviews. I didn't notice that.
I updated the patches.


On 2020-09-11 17:13, Fujii Masao wrote:
> On 2020/09/11 16:54, Kyotaro Horiguchi wrote:
>> At Fri, 11 Sep 2020 13:48:49 +0900, Fujii Masao 
>> <masao.fujii@oss.nttdata.com> wrote in
>>> 
>>> 
>>> On 2020/09/11 12:17, Kyotaro Horiguchi wrote:
>>>> Hello.
>>>> At Wed, 09 Sep 2020 13:57:37 +0900, Masahiro Ikeda
>>>> <ikedamsh@oss.nttdata.com> wrote in
>>>>> I checked what function calls XLogBackgroundFlush() which calls
>>>>> AdvanceXLInsertBuffer() to increment m_wal_buffers_full.
>>>>> 
>>>>> I found that WalSndWaitForWal() calls it, so I added it.
>>>>> Is it better to move it in WalSndLoop() like the attached patch?
>>>> By the way, we are counting some wal-related numbers in
>>>> pgWalUsage.(bytes, records, fpi).  Since now that we are going to 
>>>> have
>>>> a new view related to WAL statistics, wouln't it be more useful to
>>>> show them together in the view?
>>> 
>>> Probably yes. But IMO it's better to commit the current patch first,
>>> and then add those stats into the view after confirming exposing them
>>> is useful.
>> 
>> I'm fine with that.
>> 
>>> BTW, to expose the total WAL bytes, I think it's better to just save
>>> the LSN at when pg_stat_wal is reset rather than counting
>>> pgWalUsage.bytes. If we do that, we can easily total WAL bytes by
>>> subtracting that LSN from the latest LSN. Also saving the LSN at the
>>> reset timing causes obviously less overhead than counting
>>> pgWalUsage.bytes.
>> 
>> pgWalUsage is always counting so it doesn't add any overhead.
> 
> Yes. And I'm a bit concerned about the overhead by frequent message
> sent for WAL bytes to the stats collector.

Thanks for the comments.
I agree that we need to add more wal-related statistics
after this patch is committed.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/09/15 15:52, Masahiro Ikeda wrote:
> On 2020-09-11 01:40, Fujii Masao wrote:
>> On 2020/09/09 13:57, Masahiro Ikeda wrote:
>>> On 2020-09-07 16:19, Fujii Masao wrote:
>>>> On 2020/09/07 9:58, Masahiro Ikeda wrote:
>>>>> Thanks for the review and advice!
>>>>>
>>>>> On 2020-09-03 16:05, Fujii Masao wrote:
>>>>>> On 2020/09/02 18:56, Masahiro Ikeda wrote:
>>>>>>>> +/* ----------
>>>>>>>> + * Backend types
>>>>>>>> + * ----------
>>>>>>>>
>>>>>>>> You seem to forget to add "*/" into the above comment.
>>>>>>>> This issue could cause the following compiler warning.
>>>>>>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block comment [-Wcomment]
>>>>>>>
>>>>>>> Thanks for the comment. I fixed.
>>>>>>
>>>>>> Thanks for the fix! But why are those comments necessary?
>>>>>
>>>>> Sorry about that. This comment is not necessary.
>>>>> I removed it.
>>>>>
>>>>>>> The pg_stat_walwriter is not security restricted now, so ordinary users can access it.
>>>>>>> It has the same security level as pg_stat_archiver. If you have any comments, please let me know.
>>>>>>
>>>>>> +       <structfield>dirty_writes</structfield> <type>bigint</type>
>>>>>>
>>>>>> I guess that the column name "dirty_writes" derived from
>>>>>> the DTrace probe name. Isn't this name confusing? We should
>>>>>> rename it to "wal_buffers_full" or something?
>>>>>
>>>>> I agree and rename it to "wal_buffers_full".
>>>>>
>>>>>> +/* ----------
>>>>>> + * PgStat_MsgWalWriter            Sent by the walwriter to update statistics.
>>>>>>
>>>>>> This comment seems not accurate because backends also send it.
>>>>>>
>>>>>> +/*
>>>>>> + * WAL writes statistics counter is updated in XLogWrite function
>>>>>> + */
>>>>>> +extern PgStat_MsgWalWriter WalWriterStats;
>>>>>>
>>>>>> This comment seems not right because the counter is not updated in XLogWrite().
>>>>>
>>>>> Right. I fixed it to "Sent by each backend and background workers to update WAL statistics."
>>>>> In the future, other statistics will be included so I remove the function's name.
>>>>>
>>>>>
>>>>>> +-- There will surely and maximum one record
>>>>>> +select count(*) = 1 as ok from pg_stat_walwriter;
>>>>>>
>>>>>> What about changing this comment to "There must be only one record"?
>>>>>
>>>>> Thanks, I fixed.
>>>>>
>>>>>> +                    WalWriterStats.m_xlog_dirty_writes++;
>>>>>>                      LWLockRelease(WALWriteLock);
>>>>>>
>>>>>> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected
>>>>>> with WALWriteLock, isn't it better to increment that after releasing the lock?
>>>>>
>>>>> Thanks, I fixed.
>>>>>
>>>>>> +CREATE VIEW pg_stat_walwriter AS
>>>>>> +    SELECT
>>>>>> +        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
>>>>>> +        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
>>>>>> +
>>>>>>  CREATE VIEW pg_stat_progress_vacuum AS
>>>>>>
>>>>>> In system_views.sql, the definition of pg_stat_walwriter should be
>>>>>> placed just after that of pg_stat_bgwriter not pg_stat_progress_analyze.
>>>>>
>>>>> OK, I fixed it.
>>>>>
>>>>>>      }
>>>>>> -
>>>>>>      /*
>>>>>>       * We found an existing collector stats file. Read it and put all the
>>>>>>
>>>>>> You seem to accidentally have removed the empty line here.
>>>>>
>>>>> Sorry about that. I fixed it.
>>>>>
>>>>>> -                 errhint("Target must be \"archiver\" or \"bgwriter\".")));
>>>>>> +                 errhint("Target must be \"archiver\" or \"bgwriter\" or
>>>>>> \"walwriter\".")));
>>>>>>
>>>>>> There are two "or" in the message, but the former should be replaced with ","?
>>>>>
>>>>> Thanks, I fixed.
>>>>>
>>>>>
>>>>> On 2020-09-05 18:40, Magnus Hagander wrote:
>>>>>> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
>>>>>> <masao.fujii@oss.nttdata.com> wrote:
>>>>>>
>>>>>>> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:
>>>>>>>> From: Fujii Masao <masao.fujii@oss.nttdata.com>
>>>>>>>>>> I changed the view name from pg_stat_walwrites to
>>>>>>> pg_stat_walwriter.
>>>>>>>>>> I think it is better to match naming scheme with other views
>>>>>>> like
>>>>>>>>> pg_stat_bgwriter,
>>>>>>>>>> which is for bgwriter statistics but it has the statistics
>>>>>>> related to backend.
>>>>>>>>>
>>>>>>>>> I prefer the view name pg_stat_walwriter for the consistency with
>>>>>>>>> other view names. But we also have pg_stat_wal_receiver. Which
>>>>>>>>> makes me think that maybe pg_stat_wal_writer is better for
>>>>>>>>> the consistency. Thought? IMO either of them works for me.
>>>>>>>>> I'd like to hear more opinons about this.
>>>>>>>>
>>>>>>>> I think pg_stat_bgwriter is now a misnomer, because it contains
>>>>>>> the backends' activity.  Likewise, pg_stat_walwriter leads to
>>>>>>> misunderstanding because its information is not limited to WAL
>>>>>>> writer.
>>>>>>>>
>>>>>>>> How about simply pg_stat_wal?  In the future, we may want to
>>>>>>> include WAL reads in this view, e.g. reading undo logs in zheap.
>>>>>>>
>>>>>>> Sounds reasonable.
>>>>>>
>>>>>> +1.
>>>>>>
>>>>>> pg_stat_bgwriter has had the "wrong name" for quite some time now --
>>>>>> it became even more apparent when the checkpointer was split out to
>>>>>> it's own process, and that's not exactly a recent change. And it had
>>>>>> allocs in it from day one...
>>>>>>
>>>>>> I think naming it for what the data in it is ("wal") rather than which
>>>>>> process deals with it ("walwriter") is correct, unless the statistics
>>>>>> can be known to only *ever* affect one type of process. (And then
>>>>>> different processes can affect different columns in the view). As a
>>>>>> general rule -- and that's from what I can tell exactly what's being
>>>>>> proposed.
>>>>>
>>>>> Thanks for your comments. I agree with your opinions.
>>>>> I changed the view name to "pg_stat_wal".
>>>>>
>>>>>
>>>>> I fixed the code to send the WAL statistics from not only backend and walwriter
>>>>> but also checkpointer, walsender and autovacuum worker.
>>>>
>>>> Good point! Thanks for updating the patch!
>>>>
>>>>
>>>> @@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
>>>>                           onerel->rd_rel->relisshared,
>>>>                           Max(new_live_tuples, 0),
>>>>                           vacrelstats->new_dead_tuples);
>>>> +    pgstat_send_wal();
>>>>
>>>> I guess that you changed heap_vacuum_rel() as above so that autovacuum
>>>> workers can send WAL stats. But heap_vacuum_rel() can be called by
>>>> the processes (e.g., backends) other than autovacuum workers? Also
>>>> what happens if autovacuum workers just do ANALYZE only? In that case,
>>>> heap_vacuum_rel() may not be called.
>>>>
>>>> Currently autovacuum worker reports the stats at the exit via
>>>> pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker
>>>> is not the process that basically keeps running during the service. It exits
>>>> after it does vacuum or analyze. So ISTM that it's not bad to report the stats
>>>> only at the exit, in autovacuum worker case. There is no need to add extra
>>>> code for WAL stats report by autovacuum worker. Thought?
>>>
>>> Thanks, I understood. I removed this code.
>>>
>>>>
>>>> @@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc)
>>>>          else
>>>>              RecentFlushPtr = GetXLogReplayRecPtr(NULL);
>>>>  +        /* Send wal statistics */
>>>> +        pgstat_send_wal();
>>>>
>>>> AFAIR logical walsender uses three loops in WalSndLoop(), WalSndWriteData()
>>>> and WalSndWaitForWal(). But could you tell me why added pgstat_send_wal()
>>>> into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is the best
>>>> for that purpose.
>>>
>>> I checked what function calls XLogBackgroundFlush() which calls
>>> AdvanceXLInsertBuffer() to increment m_wal_buffers_full.
>>>
>>> I found that WalSndWaitForWal() calls it, so I added it.
>>
>> Ok. But XLogBackgroundFlush() calls AdvanceXLInsertBuffer() wit the
>> second argument opportunistic=true, so in this case WAL write by
>> wal_buffers full seems to never happen. Right? If this understanding
>> is right, WalSndWaitForWal() doesn't need to call pgstat_send_wal().
>> Probably also walwriter doesn't need to do that.

Thanks for updating the patch! This patch adds pgstat_send_wal() in
walwriter main loop. But isn't this unnecessary because of the above reason?
That is, since walwriter calls AdvanceXLInsertBuffer() with
the second argument "opportunistic" = true via XLogBackgroundFlush(),
the event of full wal_buffers will never happen. No?


>>
>> The logical rep walsender can generate WAL and call
>> AdvanceXLInsertBuffer() when it executes the replication commands like
>> CREATE_REPLICATION_SLOT. But this case is already covered by
>> pgstat_report_activity()->pgstat_send_wal() called in PostgresMain(),
>> with your patch. So no more calls to pgstat_send_wal() seems necessary
>> for logical rep walsender.
> 
> Thanks for your reviews. I didn't notice that.
> I updated the patches.

Sorry, the above my analysis might be incorrect. During logical replication,
walsender may access to the system table. Which may cause HOT pruning
or killing of dead index tuple. Also which can cause WAL and
full wal_buffers event. Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
On 2020-09-15 17:10, Fujii Masao wrote:
> On 2020/09/15 15:52, Masahiro Ikeda wrote:
>> On 2020-09-11 01:40, Fujii Masao wrote:
>>> On 2020/09/09 13:57, Masahiro Ikeda wrote:
>>>> On 2020-09-07 16:19, Fujii Masao wrote:
>>>>> On 2020/09/07 9:58, Masahiro Ikeda wrote:
>>>>>> Thanks for the review and advice!
>>>>>> 
>>>>>> On 2020-09-03 16:05, Fujii Masao wrote:
>>>>>>> On 2020/09/02 18:56, Masahiro Ikeda wrote:
>>>>>>>>> +/* ----------
>>>>>>>>> + * Backend types
>>>>>>>>> + * ----------
>>>>>>>>> 
>>>>>>>>> You seem to forget to add "*/" into the above comment.
>>>>>>>>> This issue could cause the following compiler warning.
>>>>>>>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block 
>>>>>>>>> comment [-Wcomment]
>>>>>>>> 
>>>>>>>> Thanks for the comment. I fixed.
>>>>>>> 
>>>>>>> Thanks for the fix! But why are those comments necessary?
>>>>>> 
>>>>>> Sorry about that. This comment is not necessary.
>>>>>> I removed it.
>>>>>> 
>>>>>>>> The pg_stat_walwriter is not security restricted now, so 
>>>>>>>> ordinary users can access it.
>>>>>>>> It has the same security level as pg_stat_archiver. If you have 
>>>>>>>> any comments, please let me know.
>>>>>>> 
>>>>>>> +       <structfield>dirty_writes</structfield> 
>>>>>>> <type>bigint</type>
>>>>>>> 
>>>>>>> I guess that the column name "dirty_writes" derived from
>>>>>>> the DTrace probe name. Isn't this name confusing? We should
>>>>>>> rename it to "wal_buffers_full" or something?
>>>>>> 
>>>>>> I agree and rename it to "wal_buffers_full".
>>>>>> 
>>>>>>> +/* ----------
>>>>>>> + * PgStat_MsgWalWriter            Sent by the walwriter to 
>>>>>>> update statistics.
>>>>>>> 
>>>>>>> This comment seems not accurate because backends also send it.
>>>>>>> 
>>>>>>> +/*
>>>>>>> + * WAL writes statistics counter is updated in XLogWrite 
>>>>>>> function
>>>>>>> + */
>>>>>>> +extern PgStat_MsgWalWriter WalWriterStats;
>>>>>>> 
>>>>>>> This comment seems not right because the counter is not updated 
>>>>>>> in XLogWrite().
>>>>>> 
>>>>>> Right. I fixed it to "Sent by each backend and background workers 
>>>>>> to update WAL statistics."
>>>>>> In the future, other statistics will be included so I remove the 
>>>>>> function's name.
>>>>>> 
>>>>>> 
>>>>>>> +-- There will surely and maximum one record
>>>>>>> +select count(*) = 1 as ok from pg_stat_walwriter;
>>>>>>> 
>>>>>>> What about changing this comment to "There must be only one 
>>>>>>> record"?
>>>>>> 
>>>>>> Thanks, I fixed.
>>>>>> 
>>>>>>> +                    WalWriterStats.m_xlog_dirty_writes++;
>>>>>>>                      LWLockRelease(WALWriteLock);
>>>>>>> 
>>>>>>> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be 
>>>>>>> protected
>>>>>>> with WALWriteLock, isn't it better to increment that after 
>>>>>>> releasing the lock?
>>>>>> 
>>>>>> Thanks, I fixed.
>>>>>> 
>>>>>>> +CREATE VIEW pg_stat_walwriter AS
>>>>>>> +    SELECT
>>>>>>> +        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
>>>>>>> +        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
>>>>>>> +
>>>>>>>  CREATE VIEW pg_stat_progress_vacuum AS
>>>>>>> 
>>>>>>> In system_views.sql, the definition of pg_stat_walwriter should 
>>>>>>> be
>>>>>>> placed just after that of pg_stat_bgwriter not 
>>>>>>> pg_stat_progress_analyze.
>>>>>> 
>>>>>> OK, I fixed it.
>>>>>> 
>>>>>>>      }
>>>>>>> -
>>>>>>>      /*
>>>>>>>       * We found an existing collector stats file. Read it and 
>>>>>>> put all the
>>>>>>> 
>>>>>>> You seem to accidentally have removed the empty line here.
>>>>>> 
>>>>>> Sorry about that. I fixed it.
>>>>>> 
>>>>>>> -                 errhint("Target must be \"archiver\" or 
>>>>>>> \"bgwriter\".")));
>>>>>>> +                 errhint("Target must be \"archiver\" or 
>>>>>>> \"bgwriter\" or
>>>>>>> \"walwriter\".")));
>>>>>>> 
>>>>>>> There are two "or" in the message, but the former should be 
>>>>>>> replaced with ","?
>>>>>> 
>>>>>> Thanks, I fixed.
>>>>>> 
>>>>>> 
>>>>>> On 2020-09-05 18:40, Magnus Hagander wrote:
>>>>>>> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
>>>>>>> <masao.fujii@oss.nttdata.com> wrote:
>>>>>>> 
>>>>>>>> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:
>>>>>>>>> From: Fujii Masao <masao.fujii@oss.nttdata.com>
>>>>>>>>>>> I changed the view name from pg_stat_walwrites to
>>>>>>>> pg_stat_walwriter.
>>>>>>>>>>> I think it is better to match naming scheme with other views
>>>>>>>> like
>>>>>>>>>> pg_stat_bgwriter,
>>>>>>>>>>> which is for bgwriter statistics but it has the statistics
>>>>>>>> related to backend.
>>>>>>>>>> 
>>>>>>>>>> I prefer the view name pg_stat_walwriter for the consistency 
>>>>>>>>>> with
>>>>>>>>>> other view names. But we also have pg_stat_wal_receiver. Which
>>>>>>>>>> makes me think that maybe pg_stat_wal_writer is better for
>>>>>>>>>> the consistency. Thought? IMO either of them works for me.
>>>>>>>>>> I'd like to hear more opinons about this.
>>>>>>>>> 
>>>>>>>>> I think pg_stat_bgwriter is now a misnomer, because it contains
>>>>>>>> the backends' activity.  Likewise, pg_stat_walwriter leads to
>>>>>>>> misunderstanding because its information is not limited to WAL
>>>>>>>> writer.
>>>>>>>>> 
>>>>>>>>> How about simply pg_stat_wal?  In the future, we may want to
>>>>>>>> include WAL reads in this view, e.g. reading undo logs in zheap.
>>>>>>>> 
>>>>>>>> Sounds reasonable.
>>>>>>> 
>>>>>>> +1.
>>>>>>> 
>>>>>>> pg_stat_bgwriter has had the "wrong name" for quite some time now 
>>>>>>> --
>>>>>>> it became even more apparent when the checkpointer was split out 
>>>>>>> to
>>>>>>> it's own process, and that's not exactly a recent change. And it 
>>>>>>> had
>>>>>>> allocs in it from day one...
>>>>>>> 
>>>>>>> I think naming it for what the data in it is ("wal") rather than 
>>>>>>> which
>>>>>>> process deals with it ("walwriter") is correct, unless the 
>>>>>>> statistics
>>>>>>> can be known to only *ever* affect one type of process. (And then
>>>>>>> different processes can affect different columns in the view). As 
>>>>>>> a
>>>>>>> general rule -- and that's from what I can tell exactly what's 
>>>>>>> being
>>>>>>> proposed.
>>>>>> 
>>>>>> Thanks for your comments. I agree with your opinions.
>>>>>> I changed the view name to "pg_stat_wal".
>>>>>> 
>>>>>> 
>>>>>> I fixed the code to send the WAL statistics from not only backend 
>>>>>> and walwriter
>>>>>> but also checkpointer, walsender and autovacuum worker.
>>>>> 
>>>>> Good point! Thanks for updating the patch!
>>>>> 
>>>>> 
>>>>> @@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams 
>>>>> *params,
>>>>>                           onerel->rd_rel->relisshared,
>>>>>                           Max(new_live_tuples, 0),
>>>>>                           vacrelstats->new_dead_tuples);
>>>>> +    pgstat_send_wal();
>>>>> 
>>>>> I guess that you changed heap_vacuum_rel() as above so that 
>>>>> autovacuum
>>>>> workers can send WAL stats. But heap_vacuum_rel() can be called by
>>>>> the processes (e.g., backends) other than autovacuum workers? Also
>>>>> what happens if autovacuum workers just do ANALYZE only? In that 
>>>>> case,
>>>>> heap_vacuum_rel() may not be called.
>>>>> 
>>>>> Currently autovacuum worker reports the stats at the exit via
>>>>> pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker
>>>>> is not the process that basically keeps running during the service. 
>>>>> It exits
>>>>> after it does vacuum or analyze. So ISTM that it's not bad to 
>>>>> report the stats
>>>>> only at the exit, in autovacuum worker case. There is no need to 
>>>>> add extra
>>>>> code for WAL stats report by autovacuum worker. Thought?
>>>> 
>>>> Thanks, I understood. I removed this code.
>>>> 
>>>>> 
>>>>> @@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc)
>>>>>          else
>>>>>              RecentFlushPtr = GetXLogReplayRecPtr(NULL);
>>>>>  +        /* Send wal statistics */
>>>>> +        pgstat_send_wal();
>>>>> 
>>>>> AFAIR logical walsender uses three loops in WalSndLoop(), 
>>>>> WalSndWriteData()
>>>>> and WalSndWaitForWal(). But could you tell me why added 
>>>>> pgstat_send_wal()
>>>>> into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is 
>>>>> the best
>>>>> for that purpose.
>>>> 
>>>> I checked what function calls XLogBackgroundFlush() which calls
>>>> AdvanceXLInsertBuffer() to increment m_wal_buffers_full.
>>>> 
>>>> I found that WalSndWaitForWal() calls it, so I added it.
>>> 
>>> Ok. But XLogBackgroundFlush() calls AdvanceXLInsertBuffer() wit the
>>> second argument opportunistic=true, so in this case WAL write by
>>> wal_buffers full seems to never happen. Right? If this understanding
>>> is right, WalSndWaitForWal() doesn't need to call pgstat_send_wal().
>>> Probably also walwriter doesn't need to do that.
> 
> Thanks for updating the patch! This patch adds pgstat_send_wal() in
> walwriter main loop. But isn't this unnecessary because of the above 
> reason?
> That is, since walwriter calls AdvanceXLInsertBuffer() with
> the second argument "opportunistic" = true via XLogBackgroundFlush(),
> the event of full wal_buffers will never happen. No?

Right, I fixed it.

>>> 
>>> The logical rep walsender can generate WAL and call
>>> AdvanceXLInsertBuffer() when it executes the replication commands 
>>> like
>>> CREATE_REPLICATION_SLOT. But this case is already covered by
>>> pgstat_report_activity()->pgstat_send_wal() called in PostgresMain(),
>>> with your patch. So no more calls to pgstat_send_wal() seems 
>>> necessary
>>> for logical rep walsender.
>> 
>> Thanks for your reviews. I didn't notice that.
>> I updated the patches.
> 
> Sorry, the above my analysis might be incorrect. During logical 
> replication,
> walsender may access to the system table. Which may cause HOT pruning
> or killing of dead index tuple. Also which can cause WAL and
> full wal_buffers event. Thought?

Thanks. I confirmed that it causes HOT pruning or killing of
dead index tuple if DecodeCommit() is called.

As you said, DecodeCommit() may access the system table.

WalSndLoop()
  -> XLogSendLogical()
   -> LogicalDecodingProcessRecord()
    -> DecodeXactOp()
     -> DecodeCommit()
      -> ReorderBufferCommit()
       -> ReorderBufferProcessTXN()
        -> RelidByRelfilenode()
         -> systable_getnext()

The wals are generated only when logical replication is performed.
So, I added pgstat_send_wal() in XLogSendLogical().

But, I concerned that it causes poor performance
since pgstat_send_wal() is called per wal record,

Is it necessary to introduce a mechanism to send in bulk?
But I worried about how to implement is best. Is it good to send wal 
statistics per X recoreds?


I think there are other background processes that access the system 
tables,
so I organized which process must send wal metrics and added 
pgstat_send_wal() to the main loop of some background processes
for example, autovacuum launcher, logical replication launcher, and 
logical replication worker's one.

(*) [x]: it needs to send it
     [ ]: it don't need to send it

* [ ] postmaster
* [ ] background writer
* [x] checkpointer: it generates wal for checkpoint.
* [ ] walwriter
* [x] autovacuum launcher: it accesses to the system tables to get the 
database list.
* [x] autovacuum worker: it generates wal for vacuum.
* [ ] stats collector
* [x] backend: it generate wal for query execution.
* [ ] startup
* [ ] archiver
* [x] walsender: it accesses to the system tables if logical replication 
is performed.
* [ ] walreceiver
* [x] logical replication launcher: it accesses to the system tables to 
get the subscription list.
* [x] logical replication worker: it accesses to the system tables to 
get oid from relname.
* [x] parallel worker: it generates wal for query execution.

If my understanding is wrong, please let me know.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: New statistics for tuning WAL buffer size

From
Kyotaro Horiguchi
Date:
At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in 
> Thanks. I confirmed that it causes HOT pruning or killing of
> dead index tuple if DecodeCommit() is called.
> 
> As you said, DecodeCommit() may access the system table.
...
> The wals are generated only when logical replication is performed.
> So, I added pgstat_send_wal() in XLogSendLogical().
> 
> But, I concerned that it causes poor performance
> since pgstat_send_wal() is called per wal record,

I think that's too frequent.  If we want to send any stats to the
collector, it is usually done at commit time using
pgstat_report_stat(), and the function avoids sending stats too
frequently. For logrep-worker, apply_handle_commit() is calling it. It
seems to be the place if we want to send the wal stats.  Or it may be
better to call pgstat_send_wal() via pgstat_report_stat(), like
pg_stat_slru().

Currently logrep-laucher, logrep-worker and autovac-launcher (and some
other processes?) don't seem (AFAICS) sending scan stats at all but
according to the discussion here, we should let such processes send
stats.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
On 2020-09-18 11:11, Kyotaro Horiguchi wrote:
> At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda
> <ikedamsh@oss.nttdata.com> wrote in
>> Thanks. I confirmed that it causes HOT pruning or killing of
>> dead index tuple if DecodeCommit() is called.
>> 
>> As you said, DecodeCommit() may access the system table.
> ...
>> The wals are generated only when logical replication is performed.
>> So, I added pgstat_send_wal() in XLogSendLogical().
>> 
>> But, I concerned that it causes poor performance
>> since pgstat_send_wal() is called per wal record,
> 
> I think that's too frequent.  If we want to send any stats to the
> collector, it is usually done at commit time using
> pgstat_report_stat(), and the function avoids sending stats too
> frequently. For logrep-worker, apply_handle_commit() is calling it. It
> seems to be the place if we want to send the wal stats.  Or it may be
> better to call pgstat_send_wal() via pgstat_report_stat(), like
> pg_stat_slru().

Thanks for your comments.
Since I changed to use pgstat_report_stat() and DecodeCommit() is 
calling it,
the frequency to send statistics is not so high.

> Currently logrep-laucher, logrep-worker and autovac-launcher (and some
> other processes?) don't seem (AFAICS) sending scan stats at all but
> according to the discussion here, we should let such processes send
> stats.

I added pgstat_report_stat() to logrep-laucher and autovac-launcher.
As you said, logrep-worker already calls apply_handle_commit() and 
pgstat_report_stat().

The checkpointer doesn't seem to call pgstat_report_stat() currently,
but since there is a possibility to send wal statistics, I added 
pgstat_report_stat().

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/09/25 12:06, Masahiro Ikeda wrote:
> On 2020-09-18 11:11, Kyotaro Horiguchi wrote:
>> At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda
>> <ikedamsh@oss.nttdata.com> wrote in
>>> Thanks. I confirmed that it causes HOT pruning or killing of
>>> dead index tuple if DecodeCommit() is called.
>>>
>>> As you said, DecodeCommit() may access the system table.
>> ...
>>> The wals are generated only when logical replication is performed.
>>> So, I added pgstat_send_wal() in XLogSendLogical().
>>>
>>> But, I concerned that it causes poor performance
>>> since pgstat_send_wal() is called per wal record,
>>
>> I think that's too frequent.  If we want to send any stats to the
>> collector, it is usually done at commit time using
>> pgstat_report_stat(), and the function avoids sending stats too
>> frequently. For logrep-worker, apply_handle_commit() is calling it. It
>> seems to be the place if we want to send the wal stats.  Or it may be
>> better to call pgstat_send_wal() via pgstat_report_stat(), like
>> pg_stat_slru().
> 
> Thanks for your comments.
> Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it,
> the frequency to send statistics is not so high.

On second thought, it's strange to include this change in pg_stat_wal patch.
Because pgstat_report_stat() sends various stats and that change would
affect not only pg_stat_wal but also other stats views. That is, if we really
want to make some processes call pgstat_report_stat() newly, which
should be implemented as a separate patch. But I'm not sure how useful
this change is because probably the stats are almost negligibly small
in those processes.

This thought seems valid for pgstat_send_wal(). I changed the thought
and am inclined to be ok not to call pgstat_send_wal() in some background
processes that are very unlikely to generate WAL. For example, logical-rep
launcher, logical-rep walsender, and autovacuum launcher. Thought?


> 
>> Currently logrep-laucher, logrep-worker and autovac-launcher (and some
>> other processes?) don't seem (AFAICS) sending scan stats at all but
>> according to the discussion here, we should let such processes send
>> stats.
> 
> I added pgstat_report_stat() to logrep-laucher and autovac-launcher.
> As you said, logrep-worker already calls apply_handle_commit() and pgstat_report_stat().

Right.


> The checkpointer doesn't seem to call pgstat_report_stat() currently,
> but since there is a possibility to send wal statistics, I added pgstat_report_stat().

IMO it's better to call pgstat_send_wal() in the checkpointer, instead,
because of the above reason.

Thanks for updating the patch! I'd like to share my review comments.

+       <xref linkend="monitoring-pg-stat-wal-view"/> for details.

Like the description for pg_stat_bgwriter, <link> tag should be used
instead of <xref>.


+      <para>
+       Number of WAL writes when the <xref linkend="guc-wal-buffers"/> are full
+      </para></entry>

I prefer the following description. Thought?

"Number of times WAL data was written to the disk because wal_buffers got full"


+        the <structname>pg_stat_archiver</structname> view ,or <literal>wal</literal>

A comma should be just after "view" (not just before "or").


+/*
+ * WAL global statistics counter.
+ * This counter is incremented by both each backend and background.
+ * And then, sent to the stat collector process.
+ */
+PgStat_MsgWal WalStats;

What about merging the comments for BgWriterStats and WalStats into one because they are almost the same? For example,

-------------------------------
/*
  * BgWriter and WAL global statistics counters.
  * Stored directly in a stats message structure so they can be sent
  * without needing to copy things around.  We assume these init to zeroes.
  */
PgStat_MsgBgWriter BgWriterStats;
PgStat_MsgWal WalStats;
-------------------------------

BTW, originally there was the comment "(unused in other processes)"
for BgWriterStats. But it seems not true, so I removed it from
the above example.


+    rc = fwrite(&walStats, sizeof(walStats), 1, fpout);
+    (void) rc;                    /* we'll check for error with ferror */

Since the patch changes the pgstat file format,
PGSTAT_FILE_FORMAT_ID should also be changed?


-     * Clear out global and archiver statistics so they start from zero in
+     * Clear out global, archiver and wal statistics so they start from zero in

This is not the issue of this patch, but isn't it better to mention
also SLRU stats here? That is, what about "Clear out global, archiver,
WAL and SLRU statistics so they start from zero in"?


I found "wal statistics" and "wal stats" in some comments in the patch,
but isn't it better to use "WAL statistics" and "WAL stats", instead,
if there is no special reason to use lowercase?


+    /*
+     * Read wal stats struct
+     */
+    if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))

In pgstat_read_db_statsfile_timestamp(), the local variable myWalStats
should be declared and be used to store the WAL stats read via fread(),
instead.


+{ oid => '1136', descr => 'statistics: number of WAL writes when the wal buffers are full',

If we change the description of wal_buffers_full column in the document
as I proposed, we should also use the proposed description here.


+{ oid => '1137', descr => 'statistics: last reset for the walwriter',

"the walwriter" should be "WAL" or "WAL activity", etc?


+ * PgStat_MsgWal            Sent by each backend and background workers to update WAL statistics.

If your intention here is to mention background processes like checkpointer,
"each backend and background workers" should be "backends and background
processes"?


+    PgStat_Counter m_wal_buffers_full;    /* number of WAL write caused by full of WAL buffers */

I don't think this comment is necessary.


+    PgStat_Counter wal_buffers_full;    /* number of WAL write caused by full of WAL buffers */
+    TimestampTz stat_reset_timestamp;    /* last time when the stats reset */

I don't think these comments are necessary.


+/*
+ * WAL writes statistics counter is updated by backend and background workers

Same as above.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Amit Kapila
Date:
On Fri, Sep 25, 2020 at 11:06 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
>
> On 2020/09/25 12:06, Masahiro Ikeda wrote:
> > On 2020-09-18 11:11, Kyotaro Horiguchi wrote:
> >> At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda
> >> <ikedamsh@oss.nttdata.com> wrote in
> >>> Thanks. I confirmed that it causes HOT pruning or killing of
> >>> dead index tuple if DecodeCommit() is called.
> >>>
> >>> As you said, DecodeCommit() may access the system table.
> >> ...
> >>> The wals are generated only when logical replication is performed.
> >>> So, I added pgstat_send_wal() in XLogSendLogical().
> >>>
> >>> But, I concerned that it causes poor performance
> >>> since pgstat_send_wal() is called per wal record,
> >>
> >> I think that's too frequent.  If we want to send any stats to the
> >> collector, it is usually done at commit time using
> >> pgstat_report_stat(), and the function avoids sending stats too
> >> frequently. For logrep-worker, apply_handle_commit() is calling it. It
> >> seems to be the place if we want to send the wal stats.  Or it may be
> >> better to call pgstat_send_wal() via pgstat_report_stat(), like
> >> pg_stat_slru().
> >
> > Thanks for your comments.
> > Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it,
> > the frequency to send statistics is not so high.
>
> On second thought, it's strange to include this change in pg_stat_wal patch.
> Because pgstat_report_stat() sends various stats and that change would
> affect not only pg_stat_wal but also other stats views. That is, if we really
> want to make some processes call pgstat_report_stat() newly, which
> should be implemented as a separate patch. But I'm not sure how useful
> this change is because probably the stats are almost negligibly small
> in those processes.
>
> This thought seems valid for pgstat_send_wal(). I changed the thought
> and am inclined to be ok not to call pgstat_send_wal() in some background
> processes that are very unlikely to generate WAL.
>

This makes sense to me. I think even if such background processes have
to write WAL due to wal_buffers, it will be accounted next time the
backend sends the stats.

One minor point, don't we need to reset the counter
WalStats.m_wal_buffers_full once we sent the stats, otherwise the same
stats will be accounted multiple times.


-- 
With Regards,
Amit Kapila.



Re: New statistics for tuning WAL buffer size

From
Kyotaro Horiguchi
Date:
At Sat, 26 Sep 2020 15:48:49 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in 
> On Fri, Sep 25, 2020 at 11:06 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
> >
> > On 2020/09/25 12:06, Masahiro Ikeda wrote:
> > > On 2020-09-18 11:11, Kyotaro Horiguchi wrote:
> > >> At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda
> > >> <ikedamsh@oss.nttdata.com> wrote in
> > >>> Thanks. I confirmed that it causes HOT pruning or killing of
> > >>> dead index tuple if DecodeCommit() is called.
> > >>>
> > >>> As you said, DecodeCommit() may access the system table.
> > >> ...
> > >>> The wals are generated only when logical replication is performed.
> > >>> So, I added pgstat_send_wal() in XLogSendLogical().
> > >>>
> > >>> But, I concerned that it causes poor performance
> > >>> since pgstat_send_wal() is called per wal record,
> > >>
> > >> I think that's too frequent.  If we want to send any stats to the
> > >> collector, it is usually done at commit time using
> > >> pgstat_report_stat(), and the function avoids sending stats too
> > >> frequently. For logrep-worker, apply_handle_commit() is calling it. It
> > >> seems to be the place if we want to send the wal stats.  Or it may be
> > >> better to call pgstat_send_wal() via pgstat_report_stat(), like
> > >> pg_stat_slru().
> > >
> > > Thanks for your comments.
> > > Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it,
> > > the frequency to send statistics is not so high.
> >
> > On second thought, it's strange to include this change in pg_stat_wal patch.
> > Because pgstat_report_stat() sends various stats and that change would
> > affect not only pg_stat_wal but also other stats views. That is, if we really
> > want to make some processes call pgstat_report_stat() newly, which
> > should be implemented as a separate patch. But I'm not sure how useful
> > this change is because probably the stats are almost negligibly small
> > in those processes.
> >
> > This thought seems valid for pgstat_send_wal(). I changed the thought
> > and am inclined to be ok not to call pgstat_send_wal() in some background
> > processes that are very unlikely to generate WAL.
> >
> 
> This makes sense to me. I think even if such background processes have

+1

> This makes sense to me. I think even if such background processes have
> to write WAL due to wal_buffers, it will be accounted next time the
> backend sends the stats.

Where do they send the stats? (I think it's ok to omit seding stats at
all for such low-wal/heap activity processes.)

> One minor point, don't we need to reset the counter
> WalStats.m_wal_buffers_full once we sent the stats, otherwise the same
> stats will be accounted multiple times.

Isn't this doing that?

+    /*
+     * Clear out the statistics buffer, so it can be re-used.
+     */
+    MemSet(&WalStats, 0, sizeof(WalStats));

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
On 2020-09-26 19:18, Amit Kapila wrote:
> On Fri, Sep 25, 2020 at 11:06 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>> 
>> On 2020/09/25 12:06, Masahiro Ikeda wrote:
>> > On 2020-09-18 11:11, Kyotaro Horiguchi wrote:
>> >> At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda
>> >> <ikedamsh@oss.nttdata.com> wrote in
>> >>> Thanks. I confirmed that it causes HOT pruning or killing of
>> >>> dead index tuple if DecodeCommit() is called.
>> >>>
>> >>> As you said, DecodeCommit() may access the system table.
>> >> ...
>> >>> The wals are generated only when logical replication is performed.
>> >>> So, I added pgstat_send_wal() in XLogSendLogical().
>> >>>
>> >>> But, I concerned that it causes poor performance
>> >>> since pgstat_send_wal() is called per wal record,
>> >>
>> >> I think that's too frequent.  If we want to send any stats to the
>> >> collector, it is usually done at commit time using
>> >> pgstat_report_stat(), and the function avoids sending stats too
>> >> frequently. For logrep-worker, apply_handle_commit() is calling it. It
>> >> seems to be the place if we want to send the wal stats.  Or it may be
>> >> better to call pgstat_send_wal() via pgstat_report_stat(), like
>> >> pg_stat_slru().
>> >
>> > Thanks for your comments.
>> > Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it,
>> > the frequency to send statistics is not so high.
>> 
>> On second thought, it's strange to include this change in pg_stat_wal 
>> patch.
>> Because pgstat_report_stat() sends various stats and that change would
>> affect not only pg_stat_wal but also other stats views. That is, if we 
>> really
>> want to make some processes call pgstat_report_stat() newly, which
>> should be implemented as a separate patch. But I'm not sure how useful
>> this change is because probably the stats are almost negligibly small
>> in those processes.
>> 
>> This thought seems valid for pgstat_send_wal(). I changed the thought
>> and am inclined to be ok not to call pgstat_send_wal() in some 
>> background
>> processes that are very unlikely to generate WAL.
>> 

OK, I removed to pgstat_report_stat() for autovaccum launcher, 
logrep-worker and logrep-launcher.


> This makes sense to me. I think even if such background processes have
> to write WAL due to wal_buffers, it will be accounted next time the
> backend sends the stats.

Thanks for your comments.

IIUC, since each process counts WalStats.m_wal_buffers_full,
backend can't send the counter which other background processes have to 
write WAL due to wal_buffers.
Although we can't track all WAL activity, the impact on the statistics 
is minimal so we can ignore it.

> One minor point, don't we need to reset the counter
> WalStats.m_wal_buffers_full once we sent the stats, otherwise the same
> stats will be accounted multiple times.

Now, the counter is reset in pgstat_send_wal.
Isn't it enough?


>> The checkpointer doesn't seem to call pgstat_report_stat() currently,
>> but since there is a possibility to send wal statistics, I added 
>> pgstat_report_stat().
> 
> IMO it's better to call pgstat_send_wal() in the checkpointer, instead,
> because of the above reason.

Ok, I changed.


> Thanks for updating the patch! I'd like to share my review comments.
> 
> +       <xref linkend="monitoring-pg-stat-wal-view"/> for details.
> 
> Like the description for pg_stat_bgwriter, <link> tag should be used
> instead of <xref>.

Thanks, fixed.

> +      <para>
> +       Number of WAL writes when the <xref linkend="guc-wal-buffers"/> 
> are full
> +      </para></entry>
> 
> I prefer the following description. Thought?
> 
> "Number of times WAL data was written to the disk because wal_buffers 
> got full"

Ok, I changed.

> +        the <structname>pg_stat_archiver</structname> view ,or
> <literal>wal</literal>
> 
> A comma should be just after "view" (not just before "or").

Sorry, anyway I think a comma is not necessary.
I removed it.

> +/*
> + * WAL global statistics counter.
> + * This counter is incremented by both each backend and background.
> + * And then, sent to the stat collector process.
> + */
> +PgStat_MsgWal WalStats;
> 
> What about merging the comments for BgWriterStats and WalStats into
> one because they are almost the same? For example,
> 
> -------------------------------
> /*
>  * BgWriter and WAL global statistics counters.
>  * Stored directly in a stats message structure so they can be sent
>  * without needing to copy things around.  We assume these init to 
> zeroes.
>  */
> PgStat_MsgBgWriter BgWriterStats;
> PgStat_MsgWal WalStats;
> -------------------------------
> 
> BTW, originally there was the comment "(unused in other processes)"
> for BgWriterStats. But it seems not true, so I removed it from
> the above example.

Thanks, I changed.

> +    rc = fwrite(&walStats, sizeof(walStats), 1, fpout);
> +    (void) rc;                    /* we'll check for error with ferror */
> 
> Since the patch changes the pgstat file format,
> PGSTAT_FILE_FORMAT_ID should also be changed?

Sorry about that.
I incremented PGSTAT_FILE_FORMAT_ID by +1.

> -     * Clear out global and archiver statistics so they start from zero 
> in
> +     * Clear out global, archiver and wal statistics so they start from 
> zero in
> 
> This is not the issue of this patch, but isn't it better to mention
> also SLRU stats here? That is, what about "Clear out global, archiver,
> WAL and SLRU statistics so they start from zero in"?

Thanks, I changed.

> I found "wal statistics" and "wal stats" in some comments in the patch,
> but isn't it better to use "WAL statistics" and "WAL stats", instead,
> if there is no special reason to use lowercase?

OK. I fixed it.

> +    /*
> +     * Read wal stats struct
> +     */
> +    if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))
> 
> In pgstat_read_db_statsfile_timestamp(), the local variable myWalStats
> should be declared and be used to store the WAL stats read via fread(),
> instead.

Thanks, I changed it to declare myWalStats.

> +{ oid => '1136', descr => 'statistics: number of WAL writes when the
> wal buffers are full',
> 
> If we change the description of wal_buffers_full column in the document
> as I proposed, we should also use the proposed description here.

OK, I fixed it.

> +{ oid => '1137', descr => 'statistics: last reset for the walwriter',
> 
> "the walwriter" should be "WAL" or "WAL activity", etc?

Thanks, I fixed it.

> + * PgStat_MsgWal            Sent by each backend and background workers to
> update WAL statistics.
> 
> If your intention here is to mention background processes like 
> checkpointer,
> "each backend and background workers" should be "backends and 
> background
> processes"?

Thanks, I fixed it.

> +    PgStat_Counter m_wal_buffers_full;    /* number of WAL write caused by
> full of WAL buffers */
> 
> I don't think this comment is necessary.

OK, I removed.

> +    PgStat_Counter wal_buffers_full;    /* number of WAL write caused by
> full of WAL buffers */
> +    TimestampTz stat_reset_timestamp;    /* last time when the stats reset 
> */
> 
> I don't think these comments are necessary.

OK, I removed

> +/*
> + * WAL writes statistics counter is updated by backend and background 
> workers
> 
> Same as above.

I fixed it.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: New statistics for tuning WAL buffer size

From
Amit Kapila
Date:
On Mon, Sep 28, 2020 at 7:00 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>
> On 2020-09-26 19:18, Amit Kapila wrote
>
> > This makes sense to me. I think even if such background processes have
> > to write WAL due to wal_buffers, it will be accounted next time the
> > backend sends the stats.
>
> Thanks for your comments.
>
> IIUC, since each process counts WalStats.m_wal_buffers_full,
> backend can't send the counter which other background processes have to
> write WAL due to wal_buffers.
>

Right, I misunderstood it.

> Although we can't track all WAL activity, the impact on the statistics
> is minimal so we can ignore it.
>

Yeah, that is probably true.

> > One minor point, don't we need to reset the counter
> > WalStats.m_wal_buffers_full once we sent the stats, otherwise the same
> > stats will be accounted multiple times.
>
> Now, the counter is reset in pgstat_send_wal.
> Isn't it enough?
>

That should be enough.

One other thing that occurred to me today is can't we keep this as
part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
reset it. It seems to me this is a cluster-wide stats and somewhat
similar to some of the other stats we maintain there.


-- 
With Regards,
Amit Kapila.



Re: New statistics for tuning WAL buffer size

From
Kyotaro Horiguchi
Date:
At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in 
> One other thing that occurred to me today is can't we keep this as
> part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
> reset it. It seems to me this is a cluster-wide stats and somewhat
> similar to some of the other stats we maintain there.

I like that direction, but PgStat_GlobalStats is actually
PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: New statistics for tuning WAL buffer size

From
Amit Kapila
Date:
On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > One other thing that occurred to me today is can't we keep this as
> > part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
> > reset it. It seems to me this is a cluster-wide stats and somewhat
> > similar to some of the other stats we maintain there.
>
> I like that direction, but PgStat_GlobalStats is actually
> PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.
>

Yeah, I think if we want to pursue this direction then we probably
need to have a separate message to set/reset WAL-related stuff. I
guess we probably need to have a separate reset timestamp for WAL. I
think the difference would be that we can have one structure to refer
to global_stats instead of referring to multiple structures and we
don't need to issue separate read/write calls but OTOH I don't see
many disadvantages of the current approach as well.

-- 
With Regards,
Amit Kapila.



Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
On 2020-09-28 12:43, Amit Kapila wrote:
> On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
>> 
>> At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila 
>> <amit.kapila16@gmail.com> wrote in
>> > One other thing that occurred to me today is can't we keep this as
>> > part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
>> > reset it. It seems to me this is a cluster-wide stats and somewhat
>> > similar to some of the other stats we maintain there.
>> 
>> I like that direction, but PgStat_GlobalStats is actually
>> PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.
>> 
> 
> Yeah, I think if we want to pursue this direction then we probably
> need to have a separate message to set/reset WAL-related stuff. I
> guess we probably need to have a separate reset timestamp for WAL. I
> think the difference would be that we can have one structure to refer
> to global_stats instead of referring to multiple structures and we
> don't need to issue separate read/write calls but OTOH I don't see
> many disadvantages of the current approach as well.

IIUC, if we keep wal stats as part of PgStat_GlobalStats,
don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats
to PgStat_GlobalStats too?

Since this is refactoring, I think it's better to make another patch
after the current patch is merged.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Amit Kapila
Date:
On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>
> On 2020-09-28 12:43, Amit Kapila wrote:
> > On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> >>
> >> At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila
> >> <amit.kapila16@gmail.com> wrote in
> >> > One other thing that occurred to me today is can't we keep this as
> >> > part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
> >> > reset it. It seems to me this is a cluster-wide stats and somewhat
> >> > similar to some of the other stats we maintain there.
> >>
> >> I like that direction, but PgStat_GlobalStats is actually
> >> PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.
> >>
> >
> > Yeah, I think if we want to pursue this direction then we probably
> > need to have a separate message to set/reset WAL-related stuff. I
> > guess we probably need to have a separate reset timestamp for WAL. I
> > think the difference would be that we can have one structure to refer
> > to global_stats instead of referring to multiple structures and we
> > don't need to issue separate read/write calls but OTOH I don't see
> > many disadvantages of the current approach as well.
>
> IIUC, if we keep wal stats as part of PgStat_GlobalStats,
> don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats
> to PgStat_GlobalStats too?
>

I have given the idea for wal_stats because there is just one counter
in that. I think you can just try to evaluate the merits of each
approach and choose whichever you feel is good. This is just a
suggestion, if you don't like it feel free to proceed with the current
approach.

-- 
With Regards,
Amit Kapila.



Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
On 2020-09-29 11:43, Amit Kapila wrote:
> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda 
> <ikedamsh@oss.nttdata.com> wrote:
>> 
>> On 2020-09-28 12:43, Amit Kapila wrote:
>> > On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
>> > <horikyota.ntt@gmail.com> wrote:
>> >>
>> >> At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila
>> >> <amit.kapila16@gmail.com> wrote in
>> >> > One other thing that occurred to me today is can't we keep this as
>> >> > part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
>> >> > reset it. It seems to me this is a cluster-wide stats and somewhat
>> >> > similar to some of the other stats we maintain there.
>> >>
>> >> I like that direction, but PgStat_GlobalStats is actually
>> >> PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.
>> >>
>> >
>> > Yeah, I think if we want to pursue this direction then we probably
>> > need to have a separate message to set/reset WAL-related stuff. I
>> > guess we probably need to have a separate reset timestamp for WAL. I
>> > think the difference would be that we can have one structure to refer
>> > to global_stats instead of referring to multiple structures and we
>> > don't need to issue separate read/write calls but OTOH I don't see
>> > many disadvantages of the current approach as well.
>> 
>> IIUC, if we keep wal stats as part of PgStat_GlobalStats,
>> don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats
>> to PgStat_GlobalStats too?
>> 
> 
> I have given the idea for wal_stats because there is just one counter
> in that. I think you can just try to evaluate the merits of each
> approach and choose whichever you feel is good. This is just a
> suggestion, if you don't like it feel free to proceed with the current
> approach.

Thanks for your suggestion.
I understood that the point is that WAL-related stats have just one 
counter now.

Since we may add some WAL-related stats like pgWalUsage.(bytes, records, 
fpi),
I think that the current approach is good.

-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/09/29 11:51, Masahiro Ikeda wrote:
> On 2020-09-29 11:43, Amit Kapila wrote:
>> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>>>
>>> On 2020-09-28 12:43, Amit Kapila wrote:
>>> > On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
>>> > <horikyota.ntt@gmail.com> wrote:
>>> >>
>>> >> At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila
>>> >> <amit.kapila16@gmail.com> wrote in
>>> >> > One other thing that occurred to me today is can't we keep this as
>>> >> > part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
>>> >> > reset it. It seems to me this is a cluster-wide stats and somewhat
>>> >> > similar to some of the other stats we maintain there.
>>> >>
>>> >> I like that direction, but PgStat_GlobalStats is actually
>>> >> PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.
>>> >>
>>> >
>>> > Yeah, I think if we want to pursue this direction then we probably
>>> > need to have a separate message to set/reset WAL-related stuff. I
>>> > guess we probably need to have a separate reset timestamp for WAL. I
>>> > think the difference would be that we can have one structure to refer
>>> > to global_stats instead of referring to multiple structures and we
>>> > don't need to issue separate read/write calls but OTOH I don't see
>>> > many disadvantages of the current approach as well.
>>>
>>> IIUC, if we keep wal stats as part of PgStat_GlobalStats,
>>> don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats
>>> to PgStat_GlobalStats too?
>>>
>>
>> I have given the idea for wal_stats because there is just one counter
>> in that. I think you can just try to evaluate the merits of each
>> approach and choose whichever you feel is good. This is just a
>> suggestion, if you don't like it feel free to proceed with the current
>> approach.
> 
> Thanks for your suggestion.
> I understood that the point is that WAL-related stats have just one counter now.
> 
> Since we may add some WAL-related stats like pgWalUsage.(bytes, records, fpi),
> I think that the current approach is good.

+1

I marked this patch as ready for committer.
Barring any objection, I will commit the patch.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Amit Kapila
Date:
On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> On 2020/09/29 11:51, Masahiro Ikeda wrote:
> > On 2020-09-29 11:43, Amit Kapila wrote:
> >> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
> >>>
> >>> On 2020-09-28 12:43, Amit Kapila wrote:
> >>> > On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
> >>> > <horikyota.ntt@gmail.com> wrote:
> >>> >>
> >>> >> At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila
> >>> >> <amit.kapila16@gmail.com> wrote in
> >>> >> > One other thing that occurred to me today is can't we keep this as
> >>> >> > part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
> >>> >> > reset it. It seems to me this is a cluster-wide stats and somewhat
> >>> >> > similar to some of the other stats we maintain there.
> >>> >>
> >>> >> I like that direction, but PgStat_GlobalStats is actually
> >>> >> PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.
> >>> >>
> >>> >
> >>> > Yeah, I think if we want to pursue this direction then we probably
> >>> > need to have a separate message to set/reset WAL-related stuff. I
> >>> > guess we probably need to have a separate reset timestamp for WAL. I
> >>> > think the difference would be that we can have one structure to refer
> >>> > to global_stats instead of referring to multiple structures and we
> >>> > don't need to issue separate read/write calls but OTOH I don't see
> >>> > many disadvantages of the current approach as well.
> >>>
> >>> IIUC, if we keep wal stats as part of PgStat_GlobalStats,
> >>> don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats
> >>> to PgStat_GlobalStats too?
> >>>
> >>
> >> I have given the idea for wal_stats because there is just one counter
> >> in that. I think you can just try to evaluate the merits of each
> >> approach and choose whichever you feel is good. This is just a
> >> suggestion, if you don't like it feel free to proceed with the current
> >> approach.
> >
> > Thanks for your suggestion.
> > I understood that the point is that WAL-related stats have just one counter now.
> >
> > Since we may add some WAL-related stats like pgWalUsage.(bytes, records, fpi),
> > I think that the current approach is good.
>
> +1
>

Okay, it makes sense to keep it in the current form if we have a plan
to extend this view with additional stats. However, why don't we
expose it with a function similar to pg_stat_get_archiver() instead of
providing individual functions like pg_stat_get_wal_buffers_full() and
pg_stat_get_wal_stat_reset_time?


-- 
With Regards,
Amit Kapila.



Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/09/30 20:21, Amit Kapila wrote:
> On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>> On 2020/09/29 11:51, Masahiro Ikeda wrote:
>>> On 2020-09-29 11:43, Amit Kapila wrote:
>>>> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>>>>>
>>>>> On 2020-09-28 12:43, Amit Kapila wrote:
>>>>>> On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
>>>>>> <horikyota.ntt@gmail.com> wrote:
>>>>>>>
>>>>>>> At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila
>>>>>>> <amit.kapila16@gmail.com> wrote in
>>>>>>>> One other thing that occurred to me today is can't we keep this as
>>>>>>>> part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
>>>>>>>> reset it. It seems to me this is a cluster-wide stats and somewhat
>>>>>>>> similar to some of the other stats we maintain there.
>>>>>>>
>>>>>>> I like that direction, but PgStat_GlobalStats is actually
>>>>>>> PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.
>>>>>>>
>>>>>>
>>>>>> Yeah, I think if we want to pursue this direction then we probably
>>>>>> need to have a separate message to set/reset WAL-related stuff. I
>>>>>> guess we probably need to have a separate reset timestamp for WAL. I
>>>>>> think the difference would be that we can have one structure to refer
>>>>>> to global_stats instead of referring to multiple structures and we
>>>>>> don't need to issue separate read/write calls but OTOH I don't see
>>>>>> many disadvantages of the current approach as well.
>>>>>
>>>>> IIUC, if we keep wal stats as part of PgStat_GlobalStats,
>>>>> don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats
>>>>> to PgStat_GlobalStats too?
>>>>>
>>>>
>>>> I have given the idea for wal_stats because there is just one counter
>>>> in that. I think you can just try to evaluate the merits of each
>>>> approach and choose whichever you feel is good. This is just a
>>>> suggestion, if you don't like it feel free to proceed with the current
>>>> approach.
>>>
>>> Thanks for your suggestion.
>>> I understood that the point is that WAL-related stats have just one counter now.
>>>
>>> Since we may add some WAL-related stats like pgWalUsage.(bytes, records, fpi),
>>> I think that the current approach is good.
>>
>> +1
>>
> 
> Okay, it makes sense to keep it in the current form if we have a plan
> to extend this view with additional stats. However, why don't we
> expose it with a function similar to pg_stat_get_archiver() instead of
> providing individual functions like pg_stat_get_wal_buffers_full() and
> pg_stat_get_wal_stat_reset_time?

We can adopt either of those approaches for pg_stat_wal. I think that
the former is a bit more flexible because we can collect only one of
WAL information even when pg_stat_wal will contain many information
in the future, by using the function. But you thought there are some
reasons that the latter is better for pg_stat_wal?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Kyotaro Horiguchi
Date:
At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2020/09/30 20:21, Amit Kapila wrote:
> > On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao
> > <masao.fujii@oss.nttdata.com> wrote:
> >>
> >> On 2020/09/29 11:51, Masahiro Ikeda wrote:
> >>> On 2020-09-29 11:43, Amit Kapila wrote:
> >>>> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda
> >>>> <ikedamsh@oss.nttdata.com> wrote:
> >>> Thanks for your suggestion.
> >>> I understood that the point is that WAL-related stats have just one
> >>> counter now.
> >>>
> >>> Since we may add some WAL-related stats like pgWalUsage.(bytes,
> >>> records, fpi),
> >>> I think that the current approach is good.
> >>
> >> +1
> >>
> > Okay, it makes sense to keep it in the current form if we have a plan
> > to extend this view with additional stats. However, why don't we
> > expose it with a function similar to pg_stat_get_archiver() instead of
> > providing individual functions like pg_stat_get_wal_buffers_full() and
> > pg_stat_get_wal_stat_reset_time?
> 
> We can adopt either of those approaches for pg_stat_wal. I think that
> the former is a bit more flexible because we can collect only one of
> WAL information even when pg_stat_wal will contain many information
> in the future, by using the function. But you thought there are some
> reasons that the latter is better for pg_stat_wal?

FWIW I prefer to expose it by one SRF function rather than by
subdivided functions.  One of the reasons is the less oid consumption
and/or reduction of definitions for intrinsic functions.

Another reason is at least for me subdivided functions are not useful
so much for on-the-fly examination on psql console.  I'm often annoyed
by realizing I can't recall the exact name of a function, say,
pg_last_wal_receive_lsn or such but function names cannot be
auto-completed on psql console. "select proname from pg_proc where
proname like.. " is one of my friends:p On the other hand "select *
from pg_stat_wal" requires no detailed memory.

However subdivided functions might be useful if I wanted use just one
number of wal-stats in a function, I think it is not a major usage and
we can use a SQL query on the view instead.

Another reason that I mildly want to object to subdivided functions is
I was annoyed that a stats view makes many individual calls to
functions that internally share the same statistics entry.  That
behavior required me to provide an entry-caching feature to my
shared-memory statistics patch.

regrds.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: New statistics for tuning WAL buffer size

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> Another reason that I mildly want to object to subdivided functions is
> I was annoyed that a stats view makes many individual calls to
> functions that internally share the same statistics entry.  That
> behavior required me to provide an entry-caching feature to my
> shared-memory statistics patch.

+1
The views for troubleshooting performance problems should be as light as possible.  IIRC, we saw  frequently searching
pg_stat_replicationconsume unexpectedly high CPU power, because it calls pg_stat_get_activity(null) to get all sessions
andjoin them with the walsenders.  At that time, we had hundreds of client sessions.  We expected pg_stat_replication
tobe very lightweight because it provides information about a few walsenders. 


Regards
Takayuki Tsunakawa






Re: New statistics for tuning WAL buffer size

From
Amit Kapila
Date:
On Thu, Oct 1, 2020 at 6:53 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
> >
> >
> > On 2020/09/30 20:21, Amit Kapila wrote:
> > > On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao
> > > <masao.fujii@oss.nttdata.com> wrote:
> > >>
> > >> On 2020/09/29 11:51, Masahiro Ikeda wrote:
> > >>> On 2020-09-29 11:43, Amit Kapila wrote:
> > >>>> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda
> > >>>> <ikedamsh@oss.nttdata.com> wrote:
> > >>> Thanks for your suggestion.
> > >>> I understood that the point is that WAL-related stats have just one
> > >>> counter now.
> > >>>
> > >>> Since we may add some WAL-related stats like pgWalUsage.(bytes,
> > >>> records, fpi),
> > >>> I think that the current approach is good.
> > >>
> > >> +1
> > >>
> > > Okay, it makes sense to keep it in the current form if we have a plan
> > > to extend this view with additional stats. However, why don't we
> > > expose it with a function similar to pg_stat_get_archiver() instead of
> > > providing individual functions like pg_stat_get_wal_buffers_full() and
> > > pg_stat_get_wal_stat_reset_time?
> >
> > We can adopt either of those approaches for pg_stat_wal. I think that
> > the former is a bit more flexible because we can collect only one of
> > WAL information even when pg_stat_wal will contain many information
> > in the future, by using the function. But you thought there are some
> > reasons that the latter is better for pg_stat_wal?
>
> FWIW I prefer to expose it by one SRF function rather than by
> subdivided functions.  One of the reasons is the less oid consumption
> and/or reduction of definitions for intrinsic functions.
>
> Another reason is at least for me subdivided functions are not useful
> so much for on-the-fly examination on psql console.  I'm often annoyed
> by realizing I can't recall the exact name of a function, say,
> pg_last_wal_receive_lsn or such but function names cannot be
> auto-completed on psql console. "select proname from pg_proc where
> proname like.. " is one of my friends:p On the other hand "select *
> from pg_stat_wal" requires no detailed memory.
>
> However subdivided functions might be useful if I wanted use just one
> number of wal-stats in a function, I think it is not a major usage and
> we can use a SQL query on the view instead.
>
> Another reason that I mildly want to object to subdivided functions is
> I was annoyed that a stats view makes many individual calls to
> functions that internally share the same statistics entry.  That
> behavior required me to provide an entry-caching feature to my
> shared-memory statistics patch.
>

All these are good reasons to expose it via one function and I think
that is why most of our existing views also use one function approach.

-- 
With Regards,
Amit Kapila.



Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
On 2020-10-01 11:33, Amit Kapila wrote:
> On Thu, Oct 1, 2020 at 6:53 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
>> 
>> At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao 
>> <masao.fujii@oss.nttdata.com> wrote in
>> >
>> >
>> > On 2020/09/30 20:21, Amit Kapila wrote:
>> > > On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao
>> > > <masao.fujii@oss.nttdata.com> wrote:
>> > >>
>> > >> On 2020/09/29 11:51, Masahiro Ikeda wrote:
>> > >>> On 2020-09-29 11:43, Amit Kapila wrote:
>> > >>>> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda
>> > >>>> <ikedamsh@oss.nttdata.com> wrote:
>> > >>> Thanks for your suggestion.
>> > >>> I understood that the point is that WAL-related stats have just one
>> > >>> counter now.
>> > >>>
>> > >>> Since we may add some WAL-related stats like pgWalUsage.(bytes,
>> > >>> records, fpi),
>> > >>> I think that the current approach is good.
>> > >>
>> > >> +1
>> > >>
>> > > Okay, it makes sense to keep it in the current form if we have a plan
>> > > to extend this view with additional stats. However, why don't we
>> > > expose it with a function similar to pg_stat_get_archiver() instead of
>> > > providing individual functions like pg_stat_get_wal_buffers_full() and
>> > > pg_stat_get_wal_stat_reset_time?
>> >
>> > We can adopt either of those approaches for pg_stat_wal. I think that
>> > the former is a bit more flexible because we can collect only one of
>> > WAL information even when pg_stat_wal will contain many information
>> > in the future, by using the function. But you thought there are some
>> > reasons that the latter is better for pg_stat_wal?
>> 
>> FWIW I prefer to expose it by one SRF function rather than by
>> subdivided functions.  One of the reasons is the less oid consumption
>> and/or reduction of definitions for intrinsic functions.
>> 
>> Another reason is at least for me subdivided functions are not useful
>> so much for on-the-fly examination on psql console.  I'm often annoyed
>> by realizing I can't recall the exact name of a function, say,
>> pg_last_wal_receive_lsn or such but function names cannot be
>> auto-completed on psql console. "select proname from pg_proc where
>> proname like.. " is one of my friends:p On the other hand "select *
>> from pg_stat_wal" requires no detailed memory.
>> 
>> However subdivided functions might be useful if I wanted use just one
>> number of wal-stats in a function, I think it is not a major usage and
>> we can use a SQL query on the view instead.
>> 
>> Another reason that I mildly want to object to subdivided functions is
>> I was annoyed that a stats view makes many individual calls to
>> functions that internally share the same statistics entry.  That
>> behavior required me to provide an entry-caching feature to my
>> shared-memory statistics patch.
>> 
> 
> All these are good reasons to expose it via one function and I think
> that is why most of our existing views also use one function approach.

Thanks for your comments.
I didn't notice there are the above disadvantages to provide individual 
functions.

I changed the latest patch to expose it via one function.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/10/01 12:56, Masahiro Ikeda wrote:
> On 2020-10-01 11:33, Amit Kapila wrote:
>> On Thu, Oct 1, 2020 at 6:53 AM Kyotaro Horiguchi
>> <horikyota.ntt@gmail.com> wrote:
>>>
>>> At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>> >
>>> >
>>> > On 2020/09/30 20:21, Amit Kapila wrote:
>>> > > On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao
>>> > > <masao.fujii@oss.nttdata.com> wrote:
>>> > >>
>>> > >> On 2020/09/29 11:51, Masahiro Ikeda wrote:
>>> > >>> On 2020-09-29 11:43, Amit Kapila wrote:
>>> > >>>> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda
>>> > >>>> <ikedamsh@oss.nttdata.com> wrote:
>>> > >>> Thanks for your suggestion.
>>> > >>> I understood that the point is that WAL-related stats have just one
>>> > >>> counter now.
>>> > >>>
>>> > >>> Since we may add some WAL-related stats like pgWalUsage.(bytes,
>>> > >>> records, fpi),
>>> > >>> I think that the current approach is good.
>>> > >>
>>> > >> +1
>>> > >>
>>> > > Okay, it makes sense to keep it in the current form if we have a plan
>>> > > to extend this view with additional stats. However, why don't we
>>> > > expose it with a function similar to pg_stat_get_archiver() instead of
>>> > > providing individual functions like pg_stat_get_wal_buffers_full() and
>>> > > pg_stat_get_wal_stat_reset_time?
>>> >
>>> > We can adopt either of those approaches for pg_stat_wal. I think that
>>> > the former is a bit more flexible because we can collect only one of
>>> > WAL information even when pg_stat_wal will contain many information
>>> > in the future, by using the function. But you thought there are some
>>> > reasons that the latter is better for pg_stat_wal?
>>>
>>> FWIW I prefer to expose it by one SRF function rather than by
>>> subdivided functions.  One of the reasons is the less oid consumption
>>> and/or reduction of definitions for intrinsic functions.
>>>
>>> Another reason is at least for me subdivided functions are not useful
>>> so much for on-the-fly examination on psql console.  I'm often annoyed
>>> by realizing I can't recall the exact name of a function, say,
>>> pg_last_wal_receive_lsn or such but function names cannot be
>>> auto-completed on psql console. "select proname from pg_proc where
>>> proname like.. " is one of my friends:p On the other hand "select *
>>> from pg_stat_wal" requires no detailed memory.
>>>
>>> However subdivided functions might be useful if I wanted use just one
>>> number of wal-stats in a function, I think it is not a major usage and
>>> we can use a SQL query on the view instead.
>>>
>>> Another reason that I mildly want to object to subdivided functions is
>>> I was annoyed that a stats view makes many individual calls to
>>> functions that internally share the same statistics entry.  That
>>> behavior required me to provide an entry-caching feature to my
>>> shared-memory statistics patch.
>>>
>>
>> All these are good reasons to expose it via one function and I think

Understood. +1 to expose it as one function.


>> that is why most of our existing views also use one function approach.
> 
> Thanks for your comments.
> I didn't notice there are the above disadvantages to provide individual functions.
> 
> I changed the latest patch to expose it via one function.

Thanks for updating the patch! LGTM.
Barring any other objection, I will commit it.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/10/01 10:50, tsunakawa.takay@fujitsu.com wrote:
> From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
>> Another reason that I mildly want to object to subdivided functions is
>> I was annoyed that a stats view makes many individual calls to
>> functions that internally share the same statistics entry.  That
>> behavior required me to provide an entry-caching feature to my
>> shared-memory statistics patch.
> 
> +1
> The views for troubleshooting performance problems should be as light as possible.  IIRC, we saw  frequently
searchingpg_stat_replication consume unexpectedly high CPU power, because it calls pg_stat_get_activity(null) to get
allsessions and join them with the walsenders.  At that time, we had hundreds of client sessions.  We expected
pg_stat_replicationto be very lightweight because it provides information about a few walsenders.
 

I think that we can improve that, for example, by storing backend id
into WalSndCtl and making pg_stat_get_wal_senders() directly
get the walsender's LocalPgBackendStatus with the backend id,
rather than joining pg_stat_get_activity() and pg_stat_get_wal_senders().

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



RE: New statistics for tuning WAL buffer size

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Fujii Masao <masao.fujii@oss.nttdata.com>
> I think that we can improve that, for example, by storing backend id
> into WalSndCtl and making pg_stat_get_wal_senders() directly
> get the walsender's LocalPgBackendStatus with the backend id,
> rather than joining pg_stat_get_activity() and pg_stat_get_wal_senders().

Yeah, I had something like that in mind.  I think I'll take note of this as my private homework.  (Of course, anyone
cando it.) 


Regards
Takayuki Tsunakawa






Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/10/01 13:35, Fujii Masao wrote:
> 
> 
> On 2020/10/01 12:56, Masahiro Ikeda wrote:
>> On 2020-10-01 11:33, Amit Kapila wrote:
>>> On Thu, Oct 1, 2020 at 6:53 AM Kyotaro Horiguchi
>>> <horikyota.ntt@gmail.com> wrote:
>>>>
>>>> At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>>> >
>>>> >
>>>> > On 2020/09/30 20:21, Amit Kapila wrote:
>>>> > > On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao
>>>> > > <masao.fujii@oss.nttdata.com> wrote:
>>>> > >>
>>>> > >> On 2020/09/29 11:51, Masahiro Ikeda wrote:
>>>> > >>> On 2020-09-29 11:43, Amit Kapila wrote:
>>>> > >>>> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda
>>>> > >>>> <ikedamsh@oss.nttdata.com> wrote:
>>>> > >>> Thanks for your suggestion.
>>>> > >>> I understood that the point is that WAL-related stats have just one
>>>> > >>> counter now.
>>>> > >>>
>>>> > >>> Since we may add some WAL-related stats like pgWalUsage.(bytes,
>>>> > >>> records, fpi),
>>>> > >>> I think that the current approach is good.
>>>> > >>
>>>> > >> +1
>>>> > >>
>>>> > > Okay, it makes sense to keep it in the current form if we have a plan
>>>> > > to extend this view with additional stats. However, why don't we
>>>> > > expose it with a function similar to pg_stat_get_archiver() instead of
>>>> > > providing individual functions like pg_stat_get_wal_buffers_full() and
>>>> > > pg_stat_get_wal_stat_reset_time?
>>>> >
>>>> > We can adopt either of those approaches for pg_stat_wal. I think that
>>>> > the former is a bit more flexible because we can collect only one of
>>>> > WAL information even when pg_stat_wal will contain many information
>>>> > in the future, by using the function. But you thought there are some
>>>> > reasons that the latter is better for pg_stat_wal?
>>>>
>>>> FWIW I prefer to expose it by one SRF function rather than by
>>>> subdivided functions.  One of the reasons is the less oid consumption
>>>> and/or reduction of definitions for intrinsic functions.
>>>>
>>>> Another reason is at least for me subdivided functions are not useful
>>>> so much for on-the-fly examination on psql console.  I'm often annoyed
>>>> by realizing I can't recall the exact name of a function, say,
>>>> pg_last_wal_receive_lsn or such but function names cannot be
>>>> auto-completed on psql console. "select proname from pg_proc where
>>>> proname like.. " is one of my friends:p On the other hand "select *
>>>> from pg_stat_wal" requires no detailed memory.
>>>>
>>>> However subdivided functions might be useful if I wanted use just one
>>>> number of wal-stats in a function, I think it is not a major usage and
>>>> we can use a SQL query on the view instead.
>>>>
>>>> Another reason that I mildly want to object to subdivided functions is
>>>> I was annoyed that a stats view makes many individual calls to
>>>> functions that internally share the same statistics entry.  That
>>>> behavior required me to provide an entry-caching feature to my
>>>> shared-memory statistics patch.
>>>>
>>>
>>> All these are good reasons to expose it via one function and I think
> 
> Understood. +1 to expose it as one function.
> 
> 
>>> that is why most of our existing views also use one function approach.
>>
>> Thanks for your comments.
>> I didn't notice there are the above disadvantages to provide individual functions.
>>
>> I changed the latest patch to expose it via one function.
> 
> Thanks for updating the patch! LGTM.
> Barring any other objection, I will commit it.

I updated typedefs.list and pushed the patch. Thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
On 2020-10-02 10:21, Fujii Masao wrote:
> On 2020/10/01 13:35, Fujii Masao wrote:
>> 
>> 
>> On 2020/10/01 12:56, Masahiro Ikeda wrote:
>>> On 2020-10-01 11:33, Amit Kapila wrote:
>>>> On Thu, Oct 1, 2020 at 6:53 AM Kyotaro Horiguchi
>>>> <horikyota.ntt@gmail.com> wrote:
>>>>> 
>>>>> At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao 
>>>>> <masao.fujii@oss.nttdata.com> wrote in
>>>>> >
>>>>> >
>>>>> > On 2020/09/30 20:21, Amit Kapila wrote:
>>>>> > > On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao
>>>>> > > <masao.fujii@oss.nttdata.com> wrote:
>>>>> > >>
>>>>> > >> On 2020/09/29 11:51, Masahiro Ikeda wrote:
>>>>> > >>> On 2020-09-29 11:43, Amit Kapila wrote:
>>>>> > >>>> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda
>>>>> > >>>> <ikedamsh@oss.nttdata.com> wrote:
>>>>> > >>> Thanks for your suggestion.
>>>>> > >>> I understood that the point is that WAL-related stats have just one
>>>>> > >>> counter now.
>>>>> > >>>
>>>>> > >>> Since we may add some WAL-related stats like pgWalUsage.(bytes,
>>>>> > >>> records, fpi),
>>>>> > >>> I think that the current approach is good.
>>>>> > >>
>>>>> > >> +1
>>>>> > >>
>>>>> > > Okay, it makes sense to keep it in the current form if we have a plan
>>>>> > > to extend this view with additional stats. However, why don't we
>>>>> > > expose it with a function similar to pg_stat_get_archiver() instead of
>>>>> > > providing individual functions like pg_stat_get_wal_buffers_full() and
>>>>> > > pg_stat_get_wal_stat_reset_time?
>>>>> >
>>>>> > We can adopt either of those approaches for pg_stat_wal. I think that
>>>>> > the former is a bit more flexible because we can collect only one of
>>>>> > WAL information even when pg_stat_wal will contain many information
>>>>> > in the future, by using the function. But you thought there are some
>>>>> > reasons that the latter is better for pg_stat_wal?
>>>>> 
>>>>> FWIW I prefer to expose it by one SRF function rather than by
>>>>> subdivided functions.  One of the reasons is the less oid 
>>>>> consumption
>>>>> and/or reduction of definitions for intrinsic functions.
>>>>> 
>>>>> Another reason is at least for me subdivided functions are not 
>>>>> useful
>>>>> so much for on-the-fly examination on psql console.  I'm often 
>>>>> annoyed
>>>>> by realizing I can't recall the exact name of a function, say,
>>>>> pg_last_wal_receive_lsn or such but function names cannot be
>>>>> auto-completed on psql console. "select proname from pg_proc where
>>>>> proname like.. " is one of my friends:p On the other hand "select *
>>>>> from pg_stat_wal" requires no detailed memory.
>>>>> 
>>>>> However subdivided functions might be useful if I wanted use just 
>>>>> one
>>>>> number of wal-stats in a function, I think it is not a major usage 
>>>>> and
>>>>> we can use a SQL query on the view instead.
>>>>> 
>>>>> Another reason that I mildly want to object to subdivided functions 
>>>>> is
>>>>> I was annoyed that a stats view makes many individual calls to
>>>>> functions that internally share the same statistics entry.  That
>>>>> behavior required me to provide an entry-caching feature to my
>>>>> shared-memory statistics patch.
>>>>> 
>>>> 
>>>> All these are good reasons to expose it via one function and I think
>> 
>> Understood. +1 to expose it as one function.
>> 
>> 
>>>> that is why most of our existing views also use one function 
>>>> approach.
>>> 
>>> Thanks for your comments.
>>> I didn't notice there are the above disadvantages to provide 
>>> individual functions.
>>> 
>>> I changed the latest patch to expose it via one function.
>> 
>> Thanks for updating the patch! LGTM.
>> Barring any other objection, I will commit it.
> 
> I updated typedefs.list and pushed the patch. Thanks!

Thanks to all reviewers!

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
Hi,

I think it's better to add other WAL statistics to the pg_stat_wal view.
I'm thinking to add the following statistics. Please let me know your 
thoughts.

1.  Basic wal statistics

* wal_records: Total number of WAL records generated
* wal_fpi: Total number of WAL full page images generated
* wal_bytes: Total amount of WAL bytes generated

To understand DB's performance, first, we will check the performance 
trends for the entire database instance.
For example, if the number of wal_fpi becomes higher, users may tune 
"wal_compression", "checkpoint_timeout" and so on.

Although users can check the above statistics via EXPLAIN, auto_explain, 
autovacuum
and pg_stat_statements now, if users want to see the performance trends 
for the entire database,
they must preprocess the statistics.

Is it useful to add the sum of the above statistics to the pg_stat_wal 
view?


2.  Number of when new WAL file is created and zero-filled.

As Fujii-san already commented, I think it's good for tuning.

> Just idea; it may be worth exposing the number of when new WAL file is 
> created and zero-filled. This initialization may have impact on the 
> performance of write-heavy workload generating lots of WAL. If this 
> number is reported high, to reduce the number of this initialization, 
> we can tune WAL-related parameters so that more "recycled" WAL files 
> can be hold.


3.  Number of when to switch the WAL logfile segment.

This is similar to 2, but this counts the number of when WAL file is 
recylcled too.
I think it's useful for tuning "wal_segment_size"
if the number is high relative to the startup time, "wal_segment_size" 
must be bigger.


4. Number of when WAL is flushed

I think it's useful for tuning "synchronous_commit" and "commit_delay" 
for query executions.
If the number of WAL is flushed is high, users can know 
"synchronous_commit" is useful for the workload.

Also, it's useful for tuning "wal_writer_delay" and 
"wal_writer_flush_after" for wal writer.
If the number is high, users can change the parameter for performance.

I think it's better to separate this for backends and wal writer.


5.  Wait time when WAL is flushed.

This is the accumulated time when wal is flushed.
If the time becomes much higher, users can detect the possibility of 
disk failure.

Since users can see how much flash time occupies of the query execution 
time,
it may lead to query tuning and so on.

Since there is the above reason, I think it's better to separate this 
for backends and wal writer.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
On 2020-10-06 15:57, Masahiro Ikeda wrote:
> Hi,
> 
> I think it's better to add other WAL statistics to the pg_stat_wal 
> view.
> I'm thinking to add the following statistics. Please let me know your 
> thoughts.
> 
> 1.  Basic wal statistics
> 
> * wal_records: Total number of WAL records generated
> * wal_fpi: Total number of WAL full page images generated
> * wal_bytes: Total amount of WAL bytes generated
> 
> To understand DB's performance, first, we will check the performance
> trends for the entire database instance.
> For example, if the number of wal_fpi becomes higher, users may tune
> "wal_compression", "checkpoint_timeout" and so on.
> 
> Although users can check the above statistics via EXPLAIN,
> auto_explain, autovacuum
> and pg_stat_statements now, if users want to see the performance
> trends for the entire database,
> they must preprocess the statistics.
> 
> Is it useful to add the sum of the above statistics to the pg_stat_wal 
> view?
> 
> 
> 2.  Number of when new WAL file is created and zero-filled.
> 
> As Fujii-san already commented, I think it's good for tuning.
> 
>> Just idea; it may be worth exposing the number of when new WAL file is 
>> created and zero-filled. This initialization may have impact on the 
>> performance of write-heavy workload generating lots of WAL. If this 
>> number is reported high, to reduce the number of this initialization, 
>> we can tune WAL-related parameters so that more "recycled" WAL files 
>> can be hold.
> 
> 
> 3.  Number of when to switch the WAL logfile segment.
> 
> This is similar to 2, but this counts the number of when WAL file is
> recylcled too.
> I think it's useful for tuning "wal_segment_size"
> if the number is high relative to the startup time, "wal_segment_size"
> must be bigger.
> 
> 
> 4. Number of when WAL is flushed
> 
> I think it's useful for tuning "synchronous_commit" and "commit_delay"
> for query executions.
> If the number of WAL is flushed is high, users can know
> "synchronous_commit" is useful for the workload.
> 
> Also, it's useful for tuning "wal_writer_delay" and
> "wal_writer_flush_after" for wal writer.
> If the number is high, users can change the parameter for performance.
> 
> I think it's better to separate this for backends and wal writer.
> 
> 
> 5.  Wait time when WAL is flushed.
> 
> This is the accumulated time when wal is flushed.
> If the time becomes much higher, users can detect the possibility of
> disk failure.
> 
> Since users can see how much flash time occupies of the query execution 
> time,
> it may lead to query tuning and so on.
> 
> Since there is the above reason, I think it's better to separate this
> for backends and wal writer.

I made a patch for collecting the above statistics.
If you have any comments, please let me know.

I think it's better to separate some statistics for backend and 
backgrounds because
tuning target parameters like "synchronous_commit", "wal_writer_delay" 
and so on are different.
But first, I want to get a consensus to collect them.

Best regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

Re: New statistics for tuning WAL buffer size

From
Fujii Masao
Date:

On 2020/10/13 11:57, Masahiro Ikeda wrote:
> On 2020-10-06 15:57, Masahiro Ikeda wrote:
>> Hi,
>>
>> I think it's better to add other WAL statistics to the pg_stat_wal view.
>> I'm thinking to add the following statistics. Please let me know your thoughts.
>>
>> 1.  Basic wal statistics
>>
>> * wal_records: Total number of WAL records generated
>> * wal_fpi: Total number of WAL full page images generated
>> * wal_bytes: Total amount of WAL bytes generated

+1

>>
>> To understand DB's performance, first, we will check the performance
>> trends for the entire database instance.
>> For example, if the number of wal_fpi becomes higher, users may tune
>> "wal_compression", "checkpoint_timeout" and so on.
>>
>> Although users can check the above statistics via EXPLAIN,
>> auto_explain, autovacuum
>> and pg_stat_statements now, if users want to see the performance
>> trends for the entire database,
>> they must preprocess the statistics.
>>
>> Is it useful to add the sum of the above statistics to the pg_stat_wal view?
>>
>>
>> 2.  Number of when new WAL file is created and zero-filled.
>>
>> As Fujii-san already commented, I think it's good for tuning.
>>
>>> Just idea; it may be worth exposing the number of when new WAL file is created and zero-filled. This initialization
mayhave impact on the performance of write-heavy workload generating lots of WAL. If this number is reported high, to
reducethe number of this initialization, we can tune WAL-related parameters so that more "recycled" WAL files can be
hold.

+1

But it might be better to track the number of when new WAL file is
created whether it's zero-filled or not, if file creation and sync itself
takes time.

>>
>>
>> 3.  Number of when to switch the WAL logfile segment.
>>
>> This is similar to 2, but this counts the number of when WAL file is
>> recylcled too.
>> I think it's useful for tuning "wal_segment_size"
>> if the number is high relative to the startup time, "wal_segment_size"
>> must be bigger.

You're thinking to count all the WAL file switch? That number is equal
to the number of WAL files generated since the last reset of pg_stat_wal?

>>
>>
>> 4. Number of when WAL is flushed
>>
>> I think it's useful for tuning "synchronous_commit" and "commit_delay"
>> for query executions.
>> If the number of WAL is flushed is high, users can know
>> "synchronous_commit" is useful for the workload.
>>
>> Also, it's useful for tuning "wal_writer_delay" and
>> "wal_writer_flush_after" for wal writer.
>> If the number is high, users can change the parameter for performance.
>>
>> I think it's better to separate this for backends and wal writer.

+1

>>
>>
>> 5.  Wait time when WAL is flushed.
>>
>> This is the accumulated time when wal is flushed.
>> If the time becomes much higher, users can detect the possibility of
>> disk failure.

This should be tracked, e.g., only when track_io_timing is enabled?
Otherwise, tracking that may cause performance overhead.

>>
>> Since users can see how much flash time occupies of the query execution time,
>> it may lead to query tuning and so on.
>>
>> Since there is the above reason, I think it's better to separate this
>> for backends and wal writer.


I'm afraid that this counter for a backend may be a bit confusing. Because
when the counter indicates small time, we may think that walwriter almost
write WAL data and a backend doesn't take time to write WAL. But a backend
may be just waiting for walwriter to write WAL.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: New statistics for tuning WAL buffer size

From
Masahiro Ikeda
Date:
On 2020-10-15 19:49, Fujii Masao wrote:
> On 2020/10/13 11:57, Masahiro Ikeda wrote:
>> On 2020-10-06 15:57, Masahiro Ikeda wrote:
>>> 2.  Number of when new WAL file is created and zero-filled.
>>> 
>>> As Fujii-san already commented, I think it's good for tuning.
>>> 
>>>> Just idea; it may be worth exposing the number of when new WAL file 
>>>> is created and zero-filled. This initialization may have impact on 
>>>> the performance of write-heavy workload generating lots of WAL. If 
>>>> this number is reported high, to reduce the number of this 
>>>> initialization, we can tune WAL-related parameters so that more 
>>>> "recycled" WAL files can be hold.
> 
> +1
> 
> But it might be better to track the number of when new WAL file is
> created whether it's zero-filled or not, if file creation and sync 
> itself
> takes time.

OK, I changed to track the number of when new WAL file is created.

>>> 
>>> 
>>> 3.  Number of when to switch the WAL logfile segment.
>>> 
>>> This is similar to 2, but this counts the number of when WAL file is
>>> recylcled too.
>>> I think it's useful for tuning "wal_segment_size"
>>> if the number is high relative to the startup time, 
>>> "wal_segment_size"
>>> must be bigger.
> 
> You're thinking to count all the WAL file switch? That number is equal
> to the number of WAL files generated since the last reset of 
> pg_stat_wal?

Yes. I think it might be better to count it because I think the ratio in 
which a new WAL file is created is important.
To calculate it, we need the count all the WAL file switch.


>>> 4. Number of when WAL is flushed
>>> 
>>> I think it's useful for tuning "synchronous_commit" and 
>>> "commit_delay"
>>> for query executions.
>>> If the number of WAL is flushed is high, users can know
>>> "synchronous_commit" is useful for the workload.
>>> 
>>> Also, it's useful for tuning "wal_writer_delay" and
>>> "wal_writer_flush_after" for wal writer.
>>> If the number is high, users can change the parameter for 
>>> performance.
>>> 
>>> I think it's better to separate this for backends and wal writer.
> 
> +1

Thanks, I separated the statistics for backends and wal writer.
When checkpointer process flushes the WAL, the statistics for backends 
are counted now.
Although I think its impact is not big, is it better to make statistics 
for checkpointer?


>>> 5.  Wait time when WAL is flushed.
>>> 
>>> This is the accumulated time when wal is flushed.
>>> If the time becomes much higher, users can detect the possibility of
>>> disk failure.
> 
> This should be tracked, e.g., only when track_io_timing is enabled?
> Otherwise, tracking that may cause performance overhead.

OK,  I changed the implementation.


>>> Since users can see how much flash time occupies of the query 
>>> execution time,
>>> it may lead to query tuning and so on.
>>> 
>>> Since there is the above reason, I think it's better to separate this
>>> for backends and wal writer.
> 
> 
> I'm afraid that this counter for a backend may be a bit confusing. 
> Because
> when the counter indicates small time, we may think that walwriter 
> almost
> write WAL data and a backend doesn't take time to write WAL. But a 
> backend
> may be just waiting for walwriter to write WAL.

Thanks for your comments. I agreed.



Now, the following is the view implemented in the attached patch.
If you have any other comments, please let me know.

```
postgres=# SELECT * FROM pg_stat_wal;
-[ RECORD 1 ]-------+------------------------------
wal_records         | 1000128       # Total number of WAL records 
generated
wal_fpi             | 1             # Total number of WAL full page 
images generated
wal_bytes           | 124013682     #Total amount of WAL bytes generated
wal_buffers_full    | 7952       #Total number of WAL data written to 
the disk because WAL buffers got full
wal_file            | 14   #Total number of WAL file segment created or 
opened a pre-existing one
wal_init_file       | 7  #Total number of WAL file segment created
wal_write_backend   | 7956    #Total number of WAL data written to the 
disk by backends
wal_write_walwriter | 27     #Total number of WAL data written to the 
disk by walwriter
wal_write_time      | 40    # Total amount of time that has been spent 
in the portion of WAL data was written to disk by backend and walwriter, 
in milliseconds
wal_sync_backend    | 1   # Total number of WAL data synced to the disk 
by backends
wal_sync_walwriter  | 6   #Total number of WAL data synced to the disk 
by walwriter
wal_sync_time       | 0  # Total amount of time that has been spent in 
the portion of WAL data was synced to disk by backend and walwriter, in 
milliseconds
stats_reset         | 2020-10-16 19:41:01.892272+09
```

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment