Thread: New statistics for tuning WAL buffer size
Hi, It's important to provide the metrics for tuning the size of WAL buffers. For now, it's lack of the statistics how often processes wait to write WAL because WAL buffer is full. If those situation are often occurred, WAL buffer is too small for the workload. DBAs must to tune the WAL buffer size for performance improvement. There are related threads, but those are not merged. https://www.postgresql.org/message-id/4FF824F3.5090407@uptime.jp https://www.postgresql.org/message-id/flat/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST%2BvwJcFtCSCEySnA%40mail.gmail.com What do you think? If we can have a consensus, I will make a PoC patch. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
From: Masahiro Ikeda <ikedamsh@oss.nttdata.com> > It's important to provide the metrics for tuning the size of WAL buffers. > For now, it's lack of the statistics how often processes wait to write WAL > because WAL buffer is full. > > If those situation are often occurred, WAL buffer is too small for the workload. > DBAs must to tune the WAL buffer size for performance improvement. Yes, it's helpful to know if we need to enlarge the WAL buffer. That's why our colleague HariBabu proposed the patch. We'dbe happy if it could be committed in some form. > There are related threads, but those are not merged. > https://www.postgresql.org/message-id/4FF824F3.5090407@uptime.jp > https://www.postgresql.org/message-id/flat/CAJrrPGc6APFUGYNcPe4qcNx > pL8gXKYv1KST%2BvwJcFtCSCEySnA%40mail.gmail.com What's the difference between those patches? What blocked them from being committed? Regards Takayuki Tsunakawa
On 2020-08-18 16:35, tsunakawa.takay@fujitsu.com wrote: > From: Masahiro Ikeda <ikedamsh@oss.nttdata.com> >> It's important to provide the metrics for tuning the size of WAL >> buffers. >> For now, it's lack of the statistics how often processes wait to write >> WAL >> because WAL buffer is full. >> >> If those situation are often occurred, WAL buffer is too small for the >> workload. >> DBAs must to tune the WAL buffer size for performance improvement. > > Yes, it's helpful to know if we need to enlarge the WAL buffer. > That's why our colleague HariBabu proposed the patch. We'd be happy > if it could be committed in some form. > >> There are related threads, but those are not merged. >> https://www.postgresql.org/message-id/4FF824F3.5090407@uptime.jp >> https://www.postgresql.org/message-id/flat/CAJrrPGc6APFUGYNcPe4qcNx >> pL8gXKYv1KST%2BvwJcFtCSCEySnA%40mail.gmail.com > > What's the difference between those patches? What blocked them from > being committed? Thanks for replying. Since the above threads are not active now and those patches can't be applied HEAD, I made this thread. If it is better to reply the above thread, I will do so. If my understanding is correct, we have to measure the performance impact first. Do you know HariBabu is now trying to solve it? If not, I will try to modify patches to apply HEAD. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
From: Masahiro Ikeda <ikedamsh@oss.nttdata.com> > If my understanding is correct, we have to measure the performance > impact first. > Do you know HariBabu is now trying to solve it? If not, I will try to > modify patches to apply HEAD. No, he's not doing it anymore. It'd be great if you could resume it. However, I recommend sharing your understanding aboutwhat were the issues with those two threads and how you're trying to solve them. Was the performance overhead the blockerin both of the threads? Regards Takayuki Tsunakawa
On 2020-08-19 13:49, tsunakawa.takay@fujitsu.com wrote: > From: Masahiro Ikeda <ikedamsh@oss.nttdata.com> >> If my understanding is correct, we have to measure the performance >> impact first. >> Do you know HariBabu is now trying to solve it? If not, I will try to >> modify patches to apply HEAD. > > No, he's not doing it anymore. It'd be great if you could resume it. OK, thanks. > However, I recommend sharing your understanding about what were the > issues with those two threads and how you're trying to solve them. > Was the performance overhead the blocker in both of the threads? In my understanding, some comments are not solved in both of the threads. I think the following works are remained. 1) Modify patches to apply HEAD 2) Get consensus what metrics we collect and how to use them for tuning. 3) Measure performance impact and if it leads poor performance, we solve it. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On 2020/08/19 14:10, Masahiro Ikeda wrote: > On 2020-08-19 13:49, tsunakawa.takay@fujitsu.com wrote: >> From: Masahiro Ikeda <ikedamsh@oss.nttdata.com> >>> If my understanding is correct, we have to measure the performance >>> impact first. >>> Do you know HariBabu is now trying to solve it? If not, I will try to >>> modify patches to apply HEAD. >> >> No, he's not doing it anymore. It'd be great if you could resume it. > > OK, thanks. > >> However, I recommend sharing your understanding about what were the >> issues with those two threads and how you're trying to solve them. >> Was the performance overhead the blocker in both of the threads? > > In my understanding, some comments are not solved in both of the threads. > I think the following works are remained. > > 1) Modify patches to apply HEAD > 2) Get consensus what metrics we collect and how to use them for tuning. I agree to expose the number of WAL write caused by full of WAL buffers. It's helpful when tuning wal_buffers size. Haribabu separated that number into two fields in his patch; one is the number of WAL write by backend, and another is by background processes and workers. But I'm not sure how useful such separation is. I'm ok with just one field for that number. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/08/20 20:01, Fujii Masao wrote: > > > On 2020/08/19 14:10, Masahiro Ikeda wrote: >> On 2020-08-19 13:49, tsunakawa.takay@fujitsu.com wrote: >>> From: Masahiro Ikeda <ikedamsh@oss.nttdata.com> >>>> If my understanding is correct, we have to measure the performance >>>> impact first. >>>> Do you know HariBabu is now trying to solve it? If not, I will try to >>>> modify patches to apply HEAD. >>> >>> No, he's not doing it anymore. It'd be great if you could resume it. >> >> OK, thanks. >> >>> However, I recommend sharing your understanding about what were the >>> issues with those two threads and how you're trying to solve them. >>> Was the performance overhead the blocker in both of the threads? >> >> In my understanding, some comments are not solved in both of the threads. >> I think the following works are remained. >> >> 1) Modify patches to apply HEAD >> 2) Get consensus what metrics we collect and how to use them for tuning. > > I agree to expose the number of WAL write caused by full of WAL buffers. > It's helpful when tuning wal_buffers size. Haribabu separated that number > into two fields in his patch; one is the number of WAL write by backend, > and another is by background processes and workers. But I'm not sure > how useful such separation is. I'm ok with just one field for that number. Just idea; it may be worth exposing the number of when new WAL file is created and zero-filled. This initialization may have impact on the performance of write-heavy workload generating lots of WAL. If this number is reported high, to reduce the number of this initialization, we can tune WAL-related parameters so that more "recycled" WAL files can be hold. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
From: Fujii Masao <masao.fujii@oss.nttdata.com> > I agree to expose the number of WAL write caused by full of WAL buffers. > It's helpful when tuning wal_buffers size. Haribabu separated that number > into two fields in his patch; one is the number of WAL write by backend, > and another is by background processes and workers. But I'm not sure > how useful such separation is. I'm ok with just one field for that number. I agree with you. I don't think we need to separate the numbers for foreground processes and background ones. WAL bufferis a single resource. So "Writes due to full WAL buffer are happening. We may be able to boost performance by increasingwal_buffers" would be enough. Regards Takayuki Tsunakawa
From: Fujii Masao <masao.fujii@oss.nttdata.com> > Just idea; it may be worth exposing the number of when new WAL file is > created and zero-filled. This initialization may have impact on > the performance of write-heavy workload generating lots of WAL. If this > number is reported high, to reduce the number of this initialization, > we can tune WAL-related parameters so that more "recycled" WAL files > can be hold. Sounds good. Actually, I want to know how much those zeroing affected the transaction response times, but it may be thetarget of the wait event statistics that Imai-san is addressing. (I wonder how the fallocate() patch went that tries to minimize the zeroing time.) Regards Takayuki Tsunakawa
On 2020/08/21 12:08, tsunakawa.takay@fujitsu.com wrote: > From: Fujii Masao <masao.fujii@oss.nttdata.com> >> Just idea; it may be worth exposing the number of when new WAL file is >> created and zero-filled. This initialization may have impact on >> the performance of write-heavy workload generating lots of WAL. If this >> number is reported high, to reduce the number of this initialization, >> we can tune WAL-related parameters so that more "recycled" WAL files >> can be hold. > > Sounds good. Actually, I want to know how much those zeroing affected the transaction response times, but it may be thetarget of the wait event statistics that Imai-san is addressing. Maybe, so I'm ok if the first pg_stat_walwriter patch doesn't expose this number. We can extend it to include that later after we confirm that number is really useful. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Hi, thanks for useful comments. >> I agree to expose the number of WAL write caused by full of WAL >> buffers. >> It's helpful when tuning wal_buffers size. Haribabu separated that >> number >> into two fields in his patch; one is the number of WAL write by >> backend, >> and another is by background processes and workers. But I'm not sure >> how useful such separation is. I'm ok with just one field for that >> number. > I agree with you. I don't think we need to separate the numbers for > foreground processes and background ones. WAL buffer is a single > resource. So "Writes due to full WAL buffer are happening. We may be > able to boost performance by increasing wal_buffers" would be enough. I made a patch to expose the number of WAL write caused by full of WAL buffers. I'm going to submit this patch to commitfests. As Fujii-san and Tsunakawa-san said, it expose the total number since I agreed that we don't need to separate the numbers for foreground processes and background ones. By the way, do we need to add another metrics related to WAL? For example, is the total number of WAL writes to the buffers useful to calculate the dirty WAL write ratio? Is it enough as a first step? Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2020-08-24 20:45, Masahiro Ikeda wrote: > Hi, thanks for useful comments. > >>> I agree to expose the number of WAL write caused by full of WAL >>> buffers. >>> It's helpful when tuning wal_buffers size. Haribabu separated that >>> number >>> into two fields in his patch; one is the number of WAL write by >>> backend, >>> and another is by background processes and workers. But I'm not sure >>> how useful such separation is. I'm ok with just one field for that >>> number. >> I agree with you. I don't think we need to separate the numbers for >> foreground processes and background ones. WAL buffer is a single >> resource. So "Writes due to full WAL buffer are happening. We may be >> able to boost performance by increasing wal_buffers" would be enough. > > I made a patch to expose the number of WAL write caused by full of WAL > buffers. > I'm going to submit this patch to commitfests. > > As Fujii-san and Tsunakawa-san said, it expose the total number > since I agreed that we don't need to separate the numbers for > foreground processes and background ones. > > By the way, do we need to add another metrics related to WAL? > For example, is the total number of WAL writes to the buffers useful > to calculate the dirty WAL write ratio? > > Is it enough as a first step? I forgot to rebase the current master. I've attached the rebased patch. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2020/08/24 21:00, Masahiro Ikeda wrote: > On 2020-08-24 20:45, Masahiro Ikeda wrote: >> Hi, thanks for useful comments. >> >>>> I agree to expose the number of WAL write caused by full of WAL buffers. >>>> It's helpful when tuning wal_buffers size. Haribabu separated that number >>>> into two fields in his patch; one is the number of WAL write by backend, >>>> and another is by background processes and workers. But I'm not sure >>>> how useful such separation is. I'm ok with just one field for that number. >>> I agree with you. I don't think we need to separate the numbers for foreground processes and background ones. WAL bufferis a single resource. So "Writes due to full WAL buffer are happening. We may be able to boost performance by increasingwal_buffers" would be enough. >> >> I made a patch to expose the number of WAL write caused by full of WAL buffers. >> I'm going to submit this patch to commitfests. >> >> As Fujii-san and Tsunakawa-san said, it expose the total number >> since I agreed that we don't need to separate the numbers for >> foreground processes and background ones. >> >> By the way, do we need to add another metrics related to WAL? >> For example, is the total number of WAL writes to the buffers useful >> to calculate the dirty WAL write ratio? >> >> Is it enough as a first step? > > I forgot to rebase the current master. > I've attached the rebased patch. Thanks for the patch! +/* ---------- + * Backend types + * ---------- You seem to forget to add "*/" into the above comment. This issue could cause the following compiler warning. ../../src/include/pgstat.h:761:1: warning: '/*' within block comment [-Wcomment] The contents of pg_stat_walwrites are reset when the server is restarted. Isn't this problematic? IMO since pg_stat_walwrites is a collected statistics view, basically its contents should be kept even in the case of server restart. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
> +/* ---------- > + * Backend types > + * ---------- > > You seem to forget to add "*/" into the above comment. > This issue could cause the following compiler warning. > ../../src/include/pgstat.h:761:1: warning: '/*' within block comment > [-Wcomment] Thanks for the comment. I fixed. > The contents of pg_stat_walwrites are reset when the server > is restarted. Isn't this problematic? IMO since pg_stat_walwrites > is a collected statistics view, basically its contents should be > kept even in the case of server restart. I agree your opinion. I modified to use the statistics collector and persist the wal statistics. I changed the view name from pg_stat_walwrites to pg_stat_walwriter. I think it is better to match naming scheme with other views like pg_stat_bgwriter, which is for bgwriter statistics but it has the statistics related to backend. The pg_stat_walwriter is not security restricted now, so ordinary users can access it. I has the same security level as pg_stat_archiver.If you have any comments, please let me know. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2020/09/02 18:56, Masahiro Ikeda wrote: >> +/* ---------- >> + * Backend types >> + * ---------- >> >> You seem to forget to add "*/" into the above comment. >> This issue could cause the following compiler warning. >> ../../src/include/pgstat.h:761:1: warning: '/*' within block comment [-Wcomment] > > Thanks for the comment. I fixed. Thanks for the fix! But why are those comments necessary? > >> The contents of pg_stat_walwrites are reset when the server >> is restarted. Isn't this problematic? IMO since pg_stat_walwrites >> is a collected statistics view, basically its contents should be >> kept even in the case of server restart. > > I agree your opinion. > I modified to use the statistics collector and persist the wal statistics. > > > I changed the view name from pg_stat_walwrites to pg_stat_walwriter. > I think it is better to match naming scheme with other views like pg_stat_bgwriter, > which is for bgwriter statistics but it has the statistics related to backend. I prefer the view name pg_stat_walwriter for the consistency with other view names. But we also have pg_stat_wal_receiver. Which makes me think that maybe pg_stat_wal_writer is better for the consistency. Thought? IMO either of them works for me. I'd like to hear more opinons about this. > > The pg_stat_walwriter is not security restricted now, so ordinary users can access it. > I has the same security level as pg_stat_archiver.If you have any comments, please let me know. + <structfield>dirty_writes</structfield> <type>bigint</type> I guess that the column name "dirty_writes" derived from the DTrace probe name. Isn't this name confusing? We should rename it to "wal_buffers_full" or something? +/* ---------- + * PgStat_MsgWalWriter Sent by the walwriter to update statistics. This comment seems not accurate because backends also send it. +/* + * WAL writes statistics counter is updated in XLogWrite function + */ +extern PgStat_MsgWalWriter WalWriterStats; This comment seems not right because the counter is not updated in XLogWrite(). +-- There will surely and maximum one record +select count(*) = 1 as ok from pg_stat_walwriter; What about changing this comment to "There must be only one record"? + WalWriterStats.m_xlog_dirty_writes++; LWLockRelease(WALWriteLock); Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected with WALWriteLock, isn't it better to increment that after releasing the lock? +CREATE VIEW pg_stat_walwriter AS + SELECT + pg_stat_get_xlog_dirty_writes() AS dirty_writes, + pg_stat_get_walwriter_stat_reset_time() AS stats_reset; + CREATE VIEW pg_stat_progress_vacuum AS In system_views.sql, the definition of pg_stat_walwriter should be placed just after that of pg_stat_bgwriter not pg_stat_progress_analyze. } - /* * We found an existing collector stats file. Read it and put all the You seem to accidentally have removed the empty line here. - errhint("Target must be \"archiver\" or \"bgwriter\"."))); + errhint("Target must be \"archiver\" or \"bgwriter\" or \"walwriter\"."))); There are two "or" in the message, but the former should be replaced with ","? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
From: Fujii Masao <masao.fujii@oss.nttdata.com> > > I changed the view name from pg_stat_walwrites to pg_stat_walwriter. > > I think it is better to match naming scheme with other views like > pg_stat_bgwriter, > > which is for bgwriter statistics but it has the statistics related to backend. > > I prefer the view name pg_stat_walwriter for the consistency with > other view names. But we also have pg_stat_wal_receiver. Which > makes me think that maybe pg_stat_wal_writer is better for > the consistency. Thought? IMO either of them works for me. > I'd like to hear more opinons about this. I think pg_stat_bgwriter is now a misnomer, because it contains the backends' activity. Likewise, pg_stat_walwriter leadsto misunderstanding because its information is not limited to WAL writer. How about simply pg_stat_wal? In the future, we may want to include WAL reads in this view, e.g. reading undo logs in zheap. Regards Takayuki Tsunakawa
On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote: > From: Fujii Masao <masao.fujii@oss.nttdata.com> >>> I changed the view name from pg_stat_walwrites to pg_stat_walwriter. >>> I think it is better to match naming scheme with other views like >> pg_stat_bgwriter, >>> which is for bgwriter statistics but it has the statistics related to backend. >> >> I prefer the view name pg_stat_walwriter for the consistency with >> other view names. But we also have pg_stat_wal_receiver. Which >> makes me think that maybe pg_stat_wal_writer is better for >> the consistency. Thought? IMO either of them works for me. >> I'd like to hear more opinons about this. > > I think pg_stat_bgwriter is now a misnomer, because it contains the backends' activity. Likewise, pg_stat_walwriter leadsto misunderstanding because its information is not limited to WAL writer. > > How about simply pg_stat_wal? In the future, we may want to include WAL reads in this view, e.g. reading undo logs inzheap. Sounds reasonable. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:
> From: Fujii Masao <masao.fujii@oss.nttdata.com>
>>> I changed the view name from pg_stat_walwrites to pg_stat_walwriter.
>>> I think it is better to match naming scheme with other views like
>> pg_stat_bgwriter,
>>> which is for bgwriter statistics but it has the statistics related to backend.
>>
>> I prefer the view name pg_stat_walwriter for the consistency with
>> other view names. But we also have pg_stat_wal_receiver. Which
>> makes me think that maybe pg_stat_wal_writer is better for
>> the consistency. Thought? IMO either of them works for me.
>> I'd like to hear more opinons about this.
>
> I think pg_stat_bgwriter is now a misnomer, because it contains the backends' activity. Likewise, pg_stat_walwriter leads to misunderstanding because its information is not limited to WAL writer.
>
> How about simply pg_stat_wal? In the future, we may want to include WAL reads in this view, e.g. reading undo logs in zheap.
Sounds reasonable.
+1.
pg_stat_bgwriter has had the "wrong name" for quite some time now -- it became even more apparent when the checkpointer was split out to it's own process, and that's not exactly a recent change. And it had allocs in it from day one...
I think naming it for what the data in it is ("wal") rather than which process deals with it ("walwriter") is correct, unless the statistics can be known to only *ever* affect one type of process. (And then different processes can affect different columns in the view). As a general rule -- and that's from what I can tell exactly what's being proposed.
Thanks for the review and advice! On 2020-09-03 16:05, Fujii Masao wrote: > On 2020/09/02 18:56, Masahiro Ikeda wrote: >>> +/* ---------- >>> + * Backend types >>> + * ---------- >>> >>> You seem to forget to add "*/" into the above comment. >>> This issue could cause the following compiler warning. >>> ../../src/include/pgstat.h:761:1: warning: '/*' within block comment >>> [-Wcomment] >> >> Thanks for the comment. I fixed. > > Thanks for the fix! But why are those comments necessary? Sorry about that. This comment is not necessary. I removed it. >> The pg_stat_walwriter is not security restricted now, so ordinary >> users can access it. >> It has the same security level as pg_stat_archiver. If you have any >> comments, please let me know. > > + <structfield>dirty_writes</structfield> <type>bigint</type> > > I guess that the column name "dirty_writes" derived from > the DTrace probe name. Isn't this name confusing? We should > rename it to "wal_buffers_full" or something? I agree and rename it to "wal_buffers_full". > +/* ---------- > + * PgStat_MsgWalWriter Sent by the walwriter to update statistics. > > This comment seems not accurate because backends also send it. > > +/* > + * WAL writes statistics counter is updated in XLogWrite function > + */ > +extern PgStat_MsgWalWriter WalWriterStats; > > This comment seems not right because the counter is not updated in > XLogWrite(). Right. I fixed it to "Sent by each backend and background workers to update WAL statistics." In the future, other statistics will be included so I remove the function's name. > +-- There will surely and maximum one record > +select count(*) = 1 as ok from pg_stat_walwriter; > > What about changing this comment to "There must be only one record"? Thanks, I fixed. > + WalWriterStats.m_xlog_dirty_writes++; > LWLockRelease(WALWriteLock); > > Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected > with WALWriteLock, isn't it better to increment that after releasing > the lock? Thanks, I fixed. > +CREATE VIEW pg_stat_walwriter AS > + SELECT > + pg_stat_get_xlog_dirty_writes() AS dirty_writes, > + pg_stat_get_walwriter_stat_reset_time() AS stats_reset; > + > CREATE VIEW pg_stat_progress_vacuum AS > > In system_views.sql, the definition of pg_stat_walwriter should be > placed just after that of pg_stat_bgwriter not > pg_stat_progress_analyze. OK, I fixed it. > } > - > /* > * We found an existing collector stats file. Read it and put all the > > You seem to accidentally have removed the empty line here. Sorry about that. I fixed it. > - errhint("Target must be \"archiver\" or \"bgwriter\"."))); > + errhint("Target must be \"archiver\" or \"bgwriter\" or > \"walwriter\"."))); > > There are two "or" in the message, but the former should be replaced > with ","? Thanks, I fixed. On 2020-09-05 18:40, Magnus Hagander wrote: > On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao > <masao.fujii@oss.nttdata.com> wrote: > >> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote: >>> From: Fujii Masao <masao.fujii@oss.nttdata.com> >>>>> I changed the view name from pg_stat_walwrites to >> pg_stat_walwriter. >>>>> I think it is better to match naming scheme with other views >> like >>>> pg_stat_bgwriter, >>>>> which is for bgwriter statistics but it has the statistics >> related to backend. >>>> >>>> I prefer the view name pg_stat_walwriter for the consistency with >>>> other view names. But we also have pg_stat_wal_receiver. Which >>>> makes me think that maybe pg_stat_wal_writer is better for >>>> the consistency. Thought? IMO either of them works for me. >>>> I'd like to hear more opinons about this. >>> >>> I think pg_stat_bgwriter is now a misnomer, because it contains >> the backends' activity. Likewise, pg_stat_walwriter leads to >> misunderstanding because its information is not limited to WAL >> writer. >>> >>> How about simply pg_stat_wal? In the future, we may want to >> include WAL reads in this view, e.g. reading undo logs in zheap. >> >> Sounds reasonable. > > +1. > > pg_stat_bgwriter has had the "wrong name" for quite some time now -- > it became even more apparent when the checkpointer was split out to > it's own process, and that's not exactly a recent change. And it had > allocs in it from day one... > > I think naming it for what the data in it is ("wal") rather than which > process deals with it ("walwriter") is correct, unless the statistics > can be known to only *ever* affect one type of process. (And then > different processes can affect different columns in the view). As a > general rule -- and that's from what I can tell exactly what's being > proposed. Thanks for your comments. I agree with your opinions. I changed the view name to "pg_stat_wal". I fixed the code to send the WAL statistics from not only backend and walwriter but also checkpointer, walsender and autovacuum worker. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2020/09/07 9:58, Masahiro Ikeda wrote: > Thanks for the review and advice! > > On 2020-09-03 16:05, Fujii Masao wrote: >> On 2020/09/02 18:56, Masahiro Ikeda wrote: >>>> +/* ---------- >>>> + * Backend types >>>> + * ---------- >>>> >>>> You seem to forget to add "*/" into the above comment. >>>> This issue could cause the following compiler warning. >>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block comment [-Wcomment] >>> >>> Thanks for the comment. I fixed. >> >> Thanks for the fix! But why are those comments necessary? > > Sorry about that. This comment is not necessary. > I removed it. > >>> The pg_stat_walwriter is not security restricted now, so ordinary users can access it. >>> It has the same security level as pg_stat_archiver. If you have any comments, please let me know. >> >> + <structfield>dirty_writes</structfield> <type>bigint</type> >> >> I guess that the column name "dirty_writes" derived from >> the DTrace probe name. Isn't this name confusing? We should >> rename it to "wal_buffers_full" or something? > > I agree and rename it to "wal_buffers_full". > >> +/* ---------- >> + * PgStat_MsgWalWriter Sent by the walwriter to update statistics. >> >> This comment seems not accurate because backends also send it. >> >> +/* >> + * WAL writes statistics counter is updated in XLogWrite function >> + */ >> +extern PgStat_MsgWalWriter WalWriterStats; >> >> This comment seems not right because the counter is not updated in XLogWrite(). > > Right. I fixed it to "Sent by each backend and background workers to update WAL statistics." > In the future, other statistics will be included so I remove the function's name. > > >> +-- There will surely and maximum one record >> +select count(*) = 1 as ok from pg_stat_walwriter; >> >> What about changing this comment to "There must be only one record"? > > Thanks, I fixed. > >> + WalWriterStats.m_xlog_dirty_writes++; >> LWLockRelease(WALWriteLock); >> >> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected >> with WALWriteLock, isn't it better to increment that after releasing the lock? > > Thanks, I fixed. > >> +CREATE VIEW pg_stat_walwriter AS >> + SELECT >> + pg_stat_get_xlog_dirty_writes() AS dirty_writes, >> + pg_stat_get_walwriter_stat_reset_time() AS stats_reset; >> + >> CREATE VIEW pg_stat_progress_vacuum AS >> >> In system_views.sql, the definition of pg_stat_walwriter should be >> placed just after that of pg_stat_bgwriter not pg_stat_progress_analyze. > > OK, I fixed it. > >> } >> - >> /* >> * We found an existing collector stats file. Read it and put all the >> >> You seem to accidentally have removed the empty line here. > > Sorry about that. I fixed it. > >> - errhint("Target must be \"archiver\" or \"bgwriter\"."))); >> + errhint("Target must be \"archiver\" or \"bgwriter\" or >> \"walwriter\"."))); >> >> There are two "or" in the message, but the former should be replaced with ","? > > Thanks, I fixed. > > > On 2020-09-05 18:40, Magnus Hagander wrote: >> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao >> <masao.fujii@oss.nttdata.com> wrote: >> >>> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote: >>>> From: Fujii Masao <masao.fujii@oss.nttdata.com> >>>>>> I changed the view name from pg_stat_walwrites to >>> pg_stat_walwriter. >>>>>> I think it is better to match naming scheme with other views >>> like >>>>> pg_stat_bgwriter, >>>>>> which is for bgwriter statistics but it has the statistics >>> related to backend. >>>>> >>>>> I prefer the view name pg_stat_walwriter for the consistency with >>>>> other view names. But we also have pg_stat_wal_receiver. Which >>>>> makes me think that maybe pg_stat_wal_writer is better for >>>>> the consistency. Thought? IMO either of them works for me. >>>>> I'd like to hear more opinons about this. >>>> >>>> I think pg_stat_bgwriter is now a misnomer, because it contains >>> the backends' activity. Likewise, pg_stat_walwriter leads to >>> misunderstanding because its information is not limited to WAL >>> writer. >>>> >>>> How about simply pg_stat_wal? In the future, we may want to >>> include WAL reads in this view, e.g. reading undo logs in zheap. >>> >>> Sounds reasonable. >> >> +1. >> >> pg_stat_bgwriter has had the "wrong name" for quite some time now -- >> it became even more apparent when the checkpointer was split out to >> it's own process, and that's not exactly a recent change. And it had >> allocs in it from day one... >> >> I think naming it for what the data in it is ("wal") rather than which >> process deals with it ("walwriter") is correct, unless the statistics >> can be known to only *ever* affect one type of process. (And then >> different processes can affect different columns in the view). As a >> general rule -- and that's from what I can tell exactly what's being >> proposed. > > Thanks for your comments. I agree with your opinions. > I changed the view name to "pg_stat_wal". > > > I fixed the code to send the WAL statistics from not only backend and walwriter > but also checkpointer, walsender and autovacuum worker. Good point! Thanks for updating the patch! @@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, onerel->rd_rel->relisshared, Max(new_live_tuples, 0), vacrelstats->new_dead_tuples); + pgstat_send_wal(); I guess that you changed heap_vacuum_rel() as above so that autovacuum workers can send WAL stats. But heap_vacuum_rel() can be called by the processes (e.g., backends) other than autovacuum workers? Also what happens if autovacuum workers just do ANALYZE only? In that case, heap_vacuum_rel() may not be called. Currently autovacuum worker reports the stats at the exit via pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker is not the process that basically keeps running during the service. It exits after it does vacuum or analyze. So ISTM that it's not bad to report the stats only at the exit, in autovacuum worker case. There is no need to add extra code for WAL stats report by autovacuum worker. Thought? @@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc) else RecentFlushPtr = GetXLogReplayRecPtr(NULL); + /* Send wal statistics */ + pgstat_send_wal(); AFAIR logical walsender uses three loops in WalSndLoop(), WalSndWriteData() and WalSndWaitForWal(). But could you tell me why added pgstat_send_wal() into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is the best for that purpose. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020-09-07 16:19, Fujii Masao wrote: > On 2020/09/07 9:58, Masahiro Ikeda wrote: >> Thanks for the review and advice! >> >> On 2020-09-03 16:05, Fujii Masao wrote: >>> On 2020/09/02 18:56, Masahiro Ikeda wrote: >>>>> +/* ---------- >>>>> + * Backend types >>>>> + * ---------- >>>>> >>>>> You seem to forget to add "*/" into the above comment. >>>>> This issue could cause the following compiler warning. >>>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block >>>>> comment [-Wcomment] >>>> >>>> Thanks for the comment. I fixed. >>> >>> Thanks for the fix! But why are those comments necessary? >> >> Sorry about that. This comment is not necessary. >> I removed it. >> >>>> The pg_stat_walwriter is not security restricted now, so ordinary >>>> users can access it. >>>> It has the same security level as pg_stat_archiver. If you have any >>>> comments, please let me know. >>> >>> + <structfield>dirty_writes</structfield> <type>bigint</type> >>> >>> I guess that the column name "dirty_writes" derived from >>> the DTrace probe name. Isn't this name confusing? We should >>> rename it to "wal_buffers_full" or something? >> >> I agree and rename it to "wal_buffers_full". >> >>> +/* ---------- >>> + * PgStat_MsgWalWriter Sent by the walwriter to update >>> statistics. >>> >>> This comment seems not accurate because backends also send it. >>> >>> +/* >>> + * WAL writes statistics counter is updated in XLogWrite function >>> + */ >>> +extern PgStat_MsgWalWriter WalWriterStats; >>> >>> This comment seems not right because the counter is not updated in >>> XLogWrite(). >> >> Right. I fixed it to "Sent by each backend and background workers to >> update WAL statistics." >> In the future, other statistics will be included so I remove the >> function's name. >> >> >>> +-- There will surely and maximum one record >>> +select count(*) = 1 as ok from pg_stat_walwriter; >>> >>> What about changing this comment to "There must be only one record"? >> >> Thanks, I fixed. >> >>> + WalWriterStats.m_xlog_dirty_writes++; >>> LWLockRelease(WALWriteLock); >>> >>> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected >>> with WALWriteLock, isn't it better to increment that after releasing >>> the lock? >> >> Thanks, I fixed. >> >>> +CREATE VIEW pg_stat_walwriter AS >>> + SELECT >>> + pg_stat_get_xlog_dirty_writes() AS dirty_writes, >>> + pg_stat_get_walwriter_stat_reset_time() AS stats_reset; >>> + >>> CREATE VIEW pg_stat_progress_vacuum AS >>> >>> In system_views.sql, the definition of pg_stat_walwriter should be >>> placed just after that of pg_stat_bgwriter not >>> pg_stat_progress_analyze. >> >> OK, I fixed it. >> >>> } >>> - >>> /* >>> * We found an existing collector stats file. Read it and put >>> all the >>> >>> You seem to accidentally have removed the empty line here. >> >> Sorry about that. I fixed it. >> >>> - errhint("Target must be \"archiver\" or >>> \"bgwriter\"."))); >>> + errhint("Target must be \"archiver\" or >>> \"bgwriter\" or >>> \"walwriter\"."))); >>> >>> There are two "or" in the message, but the former should be replaced >>> with ","? >> >> Thanks, I fixed. >> >> >> On 2020-09-05 18:40, Magnus Hagander wrote: >>> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao >>> <masao.fujii@oss.nttdata.com> wrote: >>> >>>> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote: >>>>> From: Fujii Masao <masao.fujii@oss.nttdata.com> >>>>>>> I changed the view name from pg_stat_walwrites to >>>> pg_stat_walwriter. >>>>>>> I think it is better to match naming scheme with other views >>>> like >>>>>> pg_stat_bgwriter, >>>>>>> which is for bgwriter statistics but it has the statistics >>>> related to backend. >>>>>> >>>>>> I prefer the view name pg_stat_walwriter for the consistency with >>>>>> other view names. But we also have pg_stat_wal_receiver. Which >>>>>> makes me think that maybe pg_stat_wal_writer is better for >>>>>> the consistency. Thought? IMO either of them works for me. >>>>>> I'd like to hear more opinons about this. >>>>> >>>>> I think pg_stat_bgwriter is now a misnomer, because it contains >>>> the backends' activity. Likewise, pg_stat_walwriter leads to >>>> misunderstanding because its information is not limited to WAL >>>> writer. >>>>> >>>>> How about simply pg_stat_wal? In the future, we may want to >>>> include WAL reads in this view, e.g. reading undo logs in zheap. >>>> >>>> Sounds reasonable. >>> >>> +1. >>> >>> pg_stat_bgwriter has had the "wrong name" for quite some time now -- >>> it became even more apparent when the checkpointer was split out to >>> it's own process, and that's not exactly a recent change. And it had >>> allocs in it from day one... >>> >>> I think naming it for what the data in it is ("wal") rather than >>> which >>> process deals with it ("walwriter") is correct, unless the statistics >>> can be known to only *ever* affect one type of process. (And then >>> different processes can affect different columns in the view). As a >>> general rule -- and that's from what I can tell exactly what's being >>> proposed. >> >> Thanks for your comments. I agree with your opinions. >> I changed the view name to "pg_stat_wal". >> >> >> I fixed the code to send the WAL statistics from not only backend and >> walwriter >> but also checkpointer, walsender and autovacuum worker. > > Good point! Thanks for updating the patch! > > > @@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams > *params, > onerel->rd_rel->relisshared, > Max(new_live_tuples, 0), > vacrelstats->new_dead_tuples); > + pgstat_send_wal(); > > I guess that you changed heap_vacuum_rel() as above so that autovacuum > workers can send WAL stats. But heap_vacuum_rel() can be called by > the processes (e.g., backends) other than autovacuum workers? Also > what happens if autovacuum workers just do ANALYZE only? In that case, > heap_vacuum_rel() may not be called. > > Currently autovacuum worker reports the stats at the exit via > pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker > is not the process that basically keeps running during the service. It > exits > after it does vacuum or analyze. So ISTM that it's not bad to report > the stats > only at the exit, in autovacuum worker case. There is no need to add > extra > code for WAL stats report by autovacuum worker. Thought? Thanks, I understood. I removed this code. > > @@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc) > else > RecentFlushPtr = GetXLogReplayRecPtr(NULL); > + /* Send wal statistics */ > + pgstat_send_wal(); > > AFAIR logical walsender uses three loops in WalSndLoop(), > WalSndWriteData() > and WalSndWaitForWal(). But could you tell me why added > pgstat_send_wal() > into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is the > best > for that purpose. I checked what function calls XLogBackgroundFlush() which calls AdvanceXLInsertBuffer() to increment m_wal_buffers_full. I found that WalSndWaitForWal() calls it, so I added it. Is it better to move it in WalSndLoop() like the attached patch? Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2020/09/09 13:57, Masahiro Ikeda wrote: > On 2020-09-07 16:19, Fujii Masao wrote: >> On 2020/09/07 9:58, Masahiro Ikeda wrote: >>> Thanks for the review and advice! >>> >>> On 2020-09-03 16:05, Fujii Masao wrote: >>>> On 2020/09/02 18:56, Masahiro Ikeda wrote: >>>>>> +/* ---------- >>>>>> + * Backend types >>>>>> + * ---------- >>>>>> >>>>>> You seem to forget to add "*/" into the above comment. >>>>>> This issue could cause the following compiler warning. >>>>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block comment [-Wcomment] >>>>> >>>>> Thanks for the comment. I fixed. >>>> >>>> Thanks for the fix! But why are those comments necessary? >>> >>> Sorry about that. This comment is not necessary. >>> I removed it. >>> >>>>> The pg_stat_walwriter is not security restricted now, so ordinary users can access it. >>>>> It has the same security level as pg_stat_archiver. If you have any comments, please let me know. >>>> >>>> + <structfield>dirty_writes</structfield> <type>bigint</type> >>>> >>>> I guess that the column name "dirty_writes" derived from >>>> the DTrace probe name. Isn't this name confusing? We should >>>> rename it to "wal_buffers_full" or something? >>> >>> I agree and rename it to "wal_buffers_full". >>> >>>> +/* ---------- >>>> + * PgStat_MsgWalWriter Sent by the walwriter to update statistics. >>>> >>>> This comment seems not accurate because backends also send it. >>>> >>>> +/* >>>> + * WAL writes statistics counter is updated in XLogWrite function >>>> + */ >>>> +extern PgStat_MsgWalWriter WalWriterStats; >>>> >>>> This comment seems not right because the counter is not updated in XLogWrite(). >>> >>> Right. I fixed it to "Sent by each backend and background workers to update WAL statistics." >>> In the future, other statistics will be included so I remove the function's name. >>> >>> >>>> +-- There will surely and maximum one record >>>> +select count(*) = 1 as ok from pg_stat_walwriter; >>>> >>>> What about changing this comment to "There must be only one record"? >>> >>> Thanks, I fixed. >>> >>>> + WalWriterStats.m_xlog_dirty_writes++; >>>> LWLockRelease(WALWriteLock); >>>> >>>> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected >>>> with WALWriteLock, isn't it better to increment that after releasing the lock? >>> >>> Thanks, I fixed. >>> >>>> +CREATE VIEW pg_stat_walwriter AS >>>> + SELECT >>>> + pg_stat_get_xlog_dirty_writes() AS dirty_writes, >>>> + pg_stat_get_walwriter_stat_reset_time() AS stats_reset; >>>> + >>>> CREATE VIEW pg_stat_progress_vacuum AS >>>> >>>> In system_views.sql, the definition of pg_stat_walwriter should be >>>> placed just after that of pg_stat_bgwriter not pg_stat_progress_analyze. >>> >>> OK, I fixed it. >>> >>>> } >>>> - >>>> /* >>>> * We found an existing collector stats file. Read it and put all the >>>> >>>> You seem to accidentally have removed the empty line here. >>> >>> Sorry about that. I fixed it. >>> >>>> - errhint("Target must be \"archiver\" or \"bgwriter\"."))); >>>> + errhint("Target must be \"archiver\" or \"bgwriter\" or >>>> \"walwriter\"."))); >>>> >>>> There are two "or" in the message, but the former should be replaced with ","? >>> >>> Thanks, I fixed. >>> >>> >>> On 2020-09-05 18:40, Magnus Hagander wrote: >>>> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao >>>> <masao.fujii@oss.nttdata.com> wrote: >>>> >>>>> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote: >>>>>> From: Fujii Masao <masao.fujii@oss.nttdata.com> >>>>>>>> I changed the view name from pg_stat_walwrites to >>>>> pg_stat_walwriter. >>>>>>>> I think it is better to match naming scheme with other views >>>>> like >>>>>>> pg_stat_bgwriter, >>>>>>>> which is for bgwriter statistics but it has the statistics >>>>> related to backend. >>>>>>> >>>>>>> I prefer the view name pg_stat_walwriter for the consistency with >>>>>>> other view names. But we also have pg_stat_wal_receiver. Which >>>>>>> makes me think that maybe pg_stat_wal_writer is better for >>>>>>> the consistency. Thought? IMO either of them works for me. >>>>>>> I'd like to hear more opinons about this. >>>>>> >>>>>> I think pg_stat_bgwriter is now a misnomer, because it contains >>>>> the backends' activity. Likewise, pg_stat_walwriter leads to >>>>> misunderstanding because its information is not limited to WAL >>>>> writer. >>>>>> >>>>>> How about simply pg_stat_wal? In the future, we may want to >>>>> include WAL reads in this view, e.g. reading undo logs in zheap. >>>>> >>>>> Sounds reasonable. >>>> >>>> +1. >>>> >>>> pg_stat_bgwriter has had the "wrong name" for quite some time now -- >>>> it became even more apparent when the checkpointer was split out to >>>> it's own process, and that's not exactly a recent change. And it had >>>> allocs in it from day one... >>>> >>>> I think naming it for what the data in it is ("wal") rather than which >>>> process deals with it ("walwriter") is correct, unless the statistics >>>> can be known to only *ever* affect one type of process. (And then >>>> different processes can affect different columns in the view). As a >>>> general rule -- and that's from what I can tell exactly what's being >>>> proposed. >>> >>> Thanks for your comments. I agree with your opinions. >>> I changed the view name to "pg_stat_wal". >>> >>> >>> I fixed the code to send the WAL statistics from not only backend and walwriter >>> but also checkpointer, walsender and autovacuum worker. >> >> Good point! Thanks for updating the patch! >> >> >> @@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, >> onerel->rd_rel->relisshared, >> Max(new_live_tuples, 0), >> vacrelstats->new_dead_tuples); >> + pgstat_send_wal(); >> >> I guess that you changed heap_vacuum_rel() as above so that autovacuum >> workers can send WAL stats. But heap_vacuum_rel() can be called by >> the processes (e.g., backends) other than autovacuum workers? Also >> what happens if autovacuum workers just do ANALYZE only? In that case, >> heap_vacuum_rel() may not be called. >> >> Currently autovacuum worker reports the stats at the exit via >> pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker >> is not the process that basically keeps running during the service. It exits >> after it does vacuum or analyze. So ISTM that it's not bad to report the stats >> only at the exit, in autovacuum worker case. There is no need to add extra >> code for WAL stats report by autovacuum worker. Thought? > > Thanks, I understood. I removed this code. > >> >> @@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc) >> else >> RecentFlushPtr = GetXLogReplayRecPtr(NULL); >> + /* Send wal statistics */ >> + pgstat_send_wal(); >> >> AFAIR logical walsender uses three loops in WalSndLoop(), WalSndWriteData() >> and WalSndWaitForWal(). But could you tell me why added pgstat_send_wal() >> into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is the best >> for that purpose. > > I checked what function calls XLogBackgroundFlush() which calls > AdvanceXLInsertBuffer() to increment m_wal_buffers_full. > > I found that WalSndWaitForWal() calls it, so I added it. Ok. But XLogBackgroundFlush() calls AdvanceXLInsertBuffer() wit the second argument opportunistic=true, so in this case WALwrite by wal_buffers full seems to never happen. Right? If this understanding is right, WalSndWaitForWal() doesn't needto call pgstat_send_wal(). Probably also walwriter doesn't need to do that. The logical rep walsender can generate WAL and call AdvanceXLInsertBuffer() when it executes the replication commands likeCREATE_REPLICATION_SLOT. But this case is already covered by pgstat_report_activity()->pgstat_send_wal() called in PostgresMain(),with your patch. So no more calls to pgstat_send_wal() seems necessary for logical rep walsender. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Hello. At Wed, 09 Sep 2020 13:57:37 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in > I checked what function calls XLogBackgroundFlush() which calls > AdvanceXLInsertBuffer() to increment m_wal_buffers_full. > > I found that WalSndWaitForWal() calls it, so I added it. > Is it better to move it in WalSndLoop() like the attached patch? By the way, we are counting some wal-related numbers in pgWalUsage.(bytes, records, fpi). Since now that we are going to have a new view related to WAL statistics, wouln't it be more useful to show them together in the view? (Another reason to propose this is that a substantially one-column table may look not-great..) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/09/11 12:17, Kyotaro Horiguchi wrote: > Hello. > > At Wed, 09 Sep 2020 13:57:37 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in >> I checked what function calls XLogBackgroundFlush() which calls >> AdvanceXLInsertBuffer() to increment m_wal_buffers_full. >> >> I found that WalSndWaitForWal() calls it, so I added it. >> Is it better to move it in WalSndLoop() like the attached patch? > > By the way, we are counting some wal-related numbers in > pgWalUsage.(bytes, records, fpi). Since now that we are going to have > a new view related to WAL statistics, wouln't it be more useful to > show them together in the view? Probably yes. But IMO it's better to commit the current patch first, and then add those stats into the view after confirmingexposing them is useful. BTW, to expose the total WAL bytes, I think it's better to just save the LSN at when pg_stat_wal is reset rather than countingpgWalUsage.bytes. If we do that, we can easily total WAL bytes by subtracting that LSN from the latest LSN. Alsosaving the LSN at the reset timing causes obviously less overhead than counting pgWalUsage.bytes. > (Another reason to propose this is that a substantially one-column > table may look not-great..) I'm ok with such "small" view. But if this is really problem, I'm ok to expose only functions pg_stat_get_wal_buffers_full()and pg_stat_get_wal_stat_reset_time(), without the view, at first. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Fri, 11 Sep 2020 13:48:49 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2020/09/11 12:17, Kyotaro Horiguchi wrote: > > Hello. > > At Wed, 09 Sep 2020 13:57:37 +0900, Masahiro Ikeda > > <ikedamsh@oss.nttdata.com> wrote in > >> I checked what function calls XLogBackgroundFlush() which calls > >> AdvanceXLInsertBuffer() to increment m_wal_buffers_full. > >> > >> I found that WalSndWaitForWal() calls it, so I added it. > >> Is it better to move it in WalSndLoop() like the attached patch? > > By the way, we are counting some wal-related numbers in > > pgWalUsage.(bytes, records, fpi). Since now that we are going to have > > a new view related to WAL statistics, wouln't it be more useful to > > show them together in the view? > > Probably yes. But IMO it's better to commit the current patch first, > and then add those stats into the view after confirming exposing them > is useful. I'm fine with that. > BTW, to expose the total WAL bytes, I think it's better to just save > the LSN at when pg_stat_wal is reset rather than counting > pgWalUsage.bytes. If we do that, we can easily total WAL bytes by > subtracting that LSN from the latest LSN. Also saving the LSN at the > reset timing causes obviously less overhead than counting > pgWalUsage.bytes. pgWalUsage is always counting so it doesn't add any overhead. But since it cannot be reset, the value needs to be saved at reset time like LSN. I don't mind either way we take from performance perspective. > > (Another reason to propose this is that a substantially one-column > > table may look not-great..) > > I'm ok with such "small" view. But if this is really problem, I'm ok > to expose only functions pg_stat_get_wal_buffers_full() and > pg_stat_get_wal_stat_reset_time(), without the view, at first. I don't mind that we have such small views as far as it is promised to grow up:p regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/09/11 16:54, Kyotaro Horiguchi wrote: > At Fri, 11 Sep 2020 13:48:49 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >> >> >> On 2020/09/11 12:17, Kyotaro Horiguchi wrote: >>> Hello. >>> At Wed, 09 Sep 2020 13:57:37 +0900, Masahiro Ikeda >>> <ikedamsh@oss.nttdata.com> wrote in >>>> I checked what function calls XLogBackgroundFlush() which calls >>>> AdvanceXLInsertBuffer() to increment m_wal_buffers_full. >>>> >>>> I found that WalSndWaitForWal() calls it, so I added it. >>>> Is it better to move it in WalSndLoop() like the attached patch? >>> By the way, we are counting some wal-related numbers in >>> pgWalUsage.(bytes, records, fpi). Since now that we are going to have >>> a new view related to WAL statistics, wouln't it be more useful to >>> show them together in the view? >> >> Probably yes. But IMO it's better to commit the current patch first, >> and then add those stats into the view after confirming exposing them >> is useful. > > I'm fine with that. > >> BTW, to expose the total WAL bytes, I think it's better to just save >> the LSN at when pg_stat_wal is reset rather than counting >> pgWalUsage.bytes. If we do that, we can easily total WAL bytes by >> subtracting that LSN from the latest LSN. Also saving the LSN at the >> reset timing causes obviously less overhead than counting >> pgWalUsage.bytes. > > pgWalUsage is always counting so it doesn't add any overhead. Yes. And I'm a bit concerned about the overhead by frequent message sent for WAL bytes to the stats collector. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020-09-11 01:40, Fujii Masao wrote: > On 2020/09/09 13:57, Masahiro Ikeda wrote: >> On 2020-09-07 16:19, Fujii Masao wrote: >>> On 2020/09/07 9:58, Masahiro Ikeda wrote: >>>> Thanks for the review and advice! >>>> >>>> On 2020-09-03 16:05, Fujii Masao wrote: >>>>> On 2020/09/02 18:56, Masahiro Ikeda wrote: >>>>>>> +/* ---------- >>>>>>> + * Backend types >>>>>>> + * ---------- >>>>>>> >>>>>>> You seem to forget to add "*/" into the above comment. >>>>>>> This issue could cause the following compiler warning. >>>>>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block >>>>>>> comment [-Wcomment] >>>>>> >>>>>> Thanks for the comment. I fixed. >>>>> >>>>> Thanks for the fix! But why are those comments necessary? >>>> >>>> Sorry about that. This comment is not necessary. >>>> I removed it. >>>> >>>>>> The pg_stat_walwriter is not security restricted now, so ordinary >>>>>> users can access it. >>>>>> It has the same security level as pg_stat_archiver. If you have >>>>>> any comments, please let me know. >>>>> >>>>> + <structfield>dirty_writes</structfield> <type>bigint</type> >>>>> >>>>> I guess that the column name "dirty_writes" derived from >>>>> the DTrace probe name. Isn't this name confusing? We should >>>>> rename it to "wal_buffers_full" or something? >>>> >>>> I agree and rename it to "wal_buffers_full". >>>> >>>>> +/* ---------- >>>>> + * PgStat_MsgWalWriter Sent by the walwriter to update >>>>> statistics. >>>>> >>>>> This comment seems not accurate because backends also send it. >>>>> >>>>> +/* >>>>> + * WAL writes statistics counter is updated in XLogWrite function >>>>> + */ >>>>> +extern PgStat_MsgWalWriter WalWriterStats; >>>>> >>>>> This comment seems not right because the counter is not updated in >>>>> XLogWrite(). >>>> >>>> Right. I fixed it to "Sent by each backend and background workers to >>>> update WAL statistics." >>>> In the future, other statistics will be included so I remove the >>>> function's name. >>>> >>>> >>>>> +-- There will surely and maximum one record >>>>> +select count(*) = 1 as ok from pg_stat_walwriter; >>>>> >>>>> What about changing this comment to "There must be only one >>>>> record"? >>>> >>>> Thanks, I fixed. >>>> >>>>> + WalWriterStats.m_xlog_dirty_writes++; >>>>> LWLockRelease(WALWriteLock); >>>>> >>>>> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be >>>>> protected >>>>> with WALWriteLock, isn't it better to increment that after >>>>> releasing the lock? >>>> >>>> Thanks, I fixed. >>>> >>>>> +CREATE VIEW pg_stat_walwriter AS >>>>> + SELECT >>>>> + pg_stat_get_xlog_dirty_writes() AS dirty_writes, >>>>> + pg_stat_get_walwriter_stat_reset_time() AS stats_reset; >>>>> + >>>>> CREATE VIEW pg_stat_progress_vacuum AS >>>>> >>>>> In system_views.sql, the definition of pg_stat_walwriter should be >>>>> placed just after that of pg_stat_bgwriter not >>>>> pg_stat_progress_analyze. >>>> >>>> OK, I fixed it. >>>> >>>>> } >>>>> - >>>>> /* >>>>> * We found an existing collector stats file. Read it and put >>>>> all the >>>>> >>>>> You seem to accidentally have removed the empty line here. >>>> >>>> Sorry about that. I fixed it. >>>> >>>>> - errhint("Target must be \"archiver\" or >>>>> \"bgwriter\"."))); >>>>> + errhint("Target must be \"archiver\" or >>>>> \"bgwriter\" or >>>>> \"walwriter\"."))); >>>>> >>>>> There are two "or" in the message, but the former should be >>>>> replaced with ","? >>>> >>>> Thanks, I fixed. >>>> >>>> >>>> On 2020-09-05 18:40, Magnus Hagander wrote: >>>>> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao >>>>> <masao.fujii@oss.nttdata.com> wrote: >>>>> >>>>>> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote: >>>>>>> From: Fujii Masao <masao.fujii@oss.nttdata.com> >>>>>>>>> I changed the view name from pg_stat_walwrites to >>>>>> pg_stat_walwriter. >>>>>>>>> I think it is better to match naming scheme with other views >>>>>> like >>>>>>>> pg_stat_bgwriter, >>>>>>>>> which is for bgwriter statistics but it has the statistics >>>>>> related to backend. >>>>>>>> >>>>>>>> I prefer the view name pg_stat_walwriter for the consistency >>>>>>>> with >>>>>>>> other view names. But we also have pg_stat_wal_receiver. Which >>>>>>>> makes me think that maybe pg_stat_wal_writer is better for >>>>>>>> the consistency. Thought? IMO either of them works for me. >>>>>>>> I'd like to hear more opinons about this. >>>>>>> >>>>>>> I think pg_stat_bgwriter is now a misnomer, because it contains >>>>>> the backends' activity. Likewise, pg_stat_walwriter leads to >>>>>> misunderstanding because its information is not limited to WAL >>>>>> writer. >>>>>>> >>>>>>> How about simply pg_stat_wal? In the future, we may want to >>>>>> include WAL reads in this view, e.g. reading undo logs in zheap. >>>>>> >>>>>> Sounds reasonable. >>>>> >>>>> +1. >>>>> >>>>> pg_stat_bgwriter has had the "wrong name" for quite some time now >>>>> -- >>>>> it became even more apparent when the checkpointer was split out to >>>>> it's own process, and that's not exactly a recent change. And it >>>>> had >>>>> allocs in it from day one... >>>>> >>>>> I think naming it for what the data in it is ("wal") rather than >>>>> which >>>>> process deals with it ("walwriter") is correct, unless the >>>>> statistics >>>>> can be known to only *ever* affect one type of process. (And then >>>>> different processes can affect different columns in the view). As a >>>>> general rule -- and that's from what I can tell exactly what's >>>>> being >>>>> proposed. >>>> >>>> Thanks for your comments. I agree with your opinions. >>>> I changed the view name to "pg_stat_wal". >>>> >>>> >>>> I fixed the code to send the WAL statistics from not only backend >>>> and walwriter >>>> but also checkpointer, walsender and autovacuum worker. >>> >>> Good point! Thanks for updating the patch! >>> >>> >>> @@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams >>> *params, >>> onerel->rd_rel->relisshared, >>> Max(new_live_tuples, 0), >>> vacrelstats->new_dead_tuples); >>> + pgstat_send_wal(); >>> >>> I guess that you changed heap_vacuum_rel() as above so that >>> autovacuum >>> workers can send WAL stats. But heap_vacuum_rel() can be called by >>> the processes (e.g., backends) other than autovacuum workers? Also >>> what happens if autovacuum workers just do ANALYZE only? In that >>> case, >>> heap_vacuum_rel() may not be called. >>> >>> Currently autovacuum worker reports the stats at the exit via >>> pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker >>> is not the process that basically keeps running during the service. >>> It exits >>> after it does vacuum or analyze. So ISTM that it's not bad to report >>> the stats >>> only at the exit, in autovacuum worker case. There is no need to add >>> extra >>> code for WAL stats report by autovacuum worker. Thought? >> >> Thanks, I understood. I removed this code. >> >>> >>> @@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc) >>> else >>> RecentFlushPtr = GetXLogReplayRecPtr(NULL); >>> + /* Send wal statistics */ >>> + pgstat_send_wal(); >>> >>> AFAIR logical walsender uses three loops in WalSndLoop(), >>> WalSndWriteData() >>> and WalSndWaitForWal(). But could you tell me why added >>> pgstat_send_wal() >>> into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is >>> the best >>> for that purpose. >> >> I checked what function calls XLogBackgroundFlush() which calls >> AdvanceXLInsertBuffer() to increment m_wal_buffers_full. >> >> I found that WalSndWaitForWal() calls it, so I added it. > > Ok. But XLogBackgroundFlush() calls AdvanceXLInsertBuffer() wit the > second argument opportunistic=true, so in this case WAL write by > wal_buffers full seems to never happen. Right? If this understanding > is right, WalSndWaitForWal() doesn't need to call pgstat_send_wal(). > Probably also walwriter doesn't need to do that. > > The logical rep walsender can generate WAL and call > AdvanceXLInsertBuffer() when it executes the replication commands like > CREATE_REPLICATION_SLOT. But this case is already covered by > pgstat_report_activity()->pgstat_send_wal() called in PostgresMain(), > with your patch. So no more calls to pgstat_send_wal() seems necessary > for logical rep walsender. Thanks for your reviews. I didn't notice that. I updated the patches. On 2020-09-11 17:13, Fujii Masao wrote: > On 2020/09/11 16:54, Kyotaro Horiguchi wrote: >> At Fri, 11 Sep 2020 13:48:49 +0900, Fujii Masao >> <masao.fujii@oss.nttdata.com> wrote in >>> >>> >>> On 2020/09/11 12:17, Kyotaro Horiguchi wrote: >>>> Hello. >>>> At Wed, 09 Sep 2020 13:57:37 +0900, Masahiro Ikeda >>>> <ikedamsh@oss.nttdata.com> wrote in >>>>> I checked what function calls XLogBackgroundFlush() which calls >>>>> AdvanceXLInsertBuffer() to increment m_wal_buffers_full. >>>>> >>>>> I found that WalSndWaitForWal() calls it, so I added it. >>>>> Is it better to move it in WalSndLoop() like the attached patch? >>>> By the way, we are counting some wal-related numbers in >>>> pgWalUsage.(bytes, records, fpi). Since now that we are going to >>>> have >>>> a new view related to WAL statistics, wouln't it be more useful to >>>> show them together in the view? >>> >>> Probably yes. But IMO it's better to commit the current patch first, >>> and then add those stats into the view after confirming exposing them >>> is useful. >> >> I'm fine with that. >> >>> BTW, to expose the total WAL bytes, I think it's better to just save >>> the LSN at when pg_stat_wal is reset rather than counting >>> pgWalUsage.bytes. If we do that, we can easily total WAL bytes by >>> subtracting that LSN from the latest LSN. Also saving the LSN at the >>> reset timing causes obviously less overhead than counting >>> pgWalUsage.bytes. >> >> pgWalUsage is always counting so it doesn't add any overhead. > > Yes. And I'm a bit concerned about the overhead by frequent message > sent for WAL bytes to the stats collector. Thanks for the comments. I agree that we need to add more wal-related statistics after this patch is committed. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2020/09/15 15:52, Masahiro Ikeda wrote: > On 2020-09-11 01:40, Fujii Masao wrote: >> On 2020/09/09 13:57, Masahiro Ikeda wrote: >>> On 2020-09-07 16:19, Fujii Masao wrote: >>>> On 2020/09/07 9:58, Masahiro Ikeda wrote: >>>>> Thanks for the review and advice! >>>>> >>>>> On 2020-09-03 16:05, Fujii Masao wrote: >>>>>> On 2020/09/02 18:56, Masahiro Ikeda wrote: >>>>>>>> +/* ---------- >>>>>>>> + * Backend types >>>>>>>> + * ---------- >>>>>>>> >>>>>>>> You seem to forget to add "*/" into the above comment. >>>>>>>> This issue could cause the following compiler warning. >>>>>>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block comment [-Wcomment] >>>>>>> >>>>>>> Thanks for the comment. I fixed. >>>>>> >>>>>> Thanks for the fix! But why are those comments necessary? >>>>> >>>>> Sorry about that. This comment is not necessary. >>>>> I removed it. >>>>> >>>>>>> The pg_stat_walwriter is not security restricted now, so ordinary users can access it. >>>>>>> It has the same security level as pg_stat_archiver. If you have any comments, please let me know. >>>>>> >>>>>> + <structfield>dirty_writes</structfield> <type>bigint</type> >>>>>> >>>>>> I guess that the column name "dirty_writes" derived from >>>>>> the DTrace probe name. Isn't this name confusing? We should >>>>>> rename it to "wal_buffers_full" or something? >>>>> >>>>> I agree and rename it to "wal_buffers_full". >>>>> >>>>>> +/* ---------- >>>>>> + * PgStat_MsgWalWriter Sent by the walwriter to update statistics. >>>>>> >>>>>> This comment seems not accurate because backends also send it. >>>>>> >>>>>> +/* >>>>>> + * WAL writes statistics counter is updated in XLogWrite function >>>>>> + */ >>>>>> +extern PgStat_MsgWalWriter WalWriterStats; >>>>>> >>>>>> This comment seems not right because the counter is not updated in XLogWrite(). >>>>> >>>>> Right. I fixed it to "Sent by each backend and background workers to update WAL statistics." >>>>> In the future, other statistics will be included so I remove the function's name. >>>>> >>>>> >>>>>> +-- There will surely and maximum one record >>>>>> +select count(*) = 1 as ok from pg_stat_walwriter; >>>>>> >>>>>> What about changing this comment to "There must be only one record"? >>>>> >>>>> Thanks, I fixed. >>>>> >>>>>> + WalWriterStats.m_xlog_dirty_writes++; >>>>>> LWLockRelease(WALWriteLock); >>>>>> >>>>>> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected >>>>>> with WALWriteLock, isn't it better to increment that after releasing the lock? >>>>> >>>>> Thanks, I fixed. >>>>> >>>>>> +CREATE VIEW pg_stat_walwriter AS >>>>>> + SELECT >>>>>> + pg_stat_get_xlog_dirty_writes() AS dirty_writes, >>>>>> + pg_stat_get_walwriter_stat_reset_time() AS stats_reset; >>>>>> + >>>>>> CREATE VIEW pg_stat_progress_vacuum AS >>>>>> >>>>>> In system_views.sql, the definition of pg_stat_walwriter should be >>>>>> placed just after that of pg_stat_bgwriter not pg_stat_progress_analyze. >>>>> >>>>> OK, I fixed it. >>>>> >>>>>> } >>>>>> - >>>>>> /* >>>>>> * We found an existing collector stats file. Read it and put all the >>>>>> >>>>>> You seem to accidentally have removed the empty line here. >>>>> >>>>> Sorry about that. I fixed it. >>>>> >>>>>> - errhint("Target must be \"archiver\" or \"bgwriter\"."))); >>>>>> + errhint("Target must be \"archiver\" or \"bgwriter\" or >>>>>> \"walwriter\"."))); >>>>>> >>>>>> There are two "or" in the message, but the former should be replaced with ","? >>>>> >>>>> Thanks, I fixed. >>>>> >>>>> >>>>> On 2020-09-05 18:40, Magnus Hagander wrote: >>>>>> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao >>>>>> <masao.fujii@oss.nttdata.com> wrote: >>>>>> >>>>>>> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote: >>>>>>>> From: Fujii Masao <masao.fujii@oss.nttdata.com> >>>>>>>>>> I changed the view name from pg_stat_walwrites to >>>>>>> pg_stat_walwriter. >>>>>>>>>> I think it is better to match naming scheme with other views >>>>>>> like >>>>>>>>> pg_stat_bgwriter, >>>>>>>>>> which is for bgwriter statistics but it has the statistics >>>>>>> related to backend. >>>>>>>>> >>>>>>>>> I prefer the view name pg_stat_walwriter for the consistency with >>>>>>>>> other view names. But we also have pg_stat_wal_receiver. Which >>>>>>>>> makes me think that maybe pg_stat_wal_writer is better for >>>>>>>>> the consistency. Thought? IMO either of them works for me. >>>>>>>>> I'd like to hear more opinons about this. >>>>>>>> >>>>>>>> I think pg_stat_bgwriter is now a misnomer, because it contains >>>>>>> the backends' activity. Likewise, pg_stat_walwriter leads to >>>>>>> misunderstanding because its information is not limited to WAL >>>>>>> writer. >>>>>>>> >>>>>>>> How about simply pg_stat_wal? In the future, we may want to >>>>>>> include WAL reads in this view, e.g. reading undo logs in zheap. >>>>>>> >>>>>>> Sounds reasonable. >>>>>> >>>>>> +1. >>>>>> >>>>>> pg_stat_bgwriter has had the "wrong name" for quite some time now -- >>>>>> it became even more apparent when the checkpointer was split out to >>>>>> it's own process, and that's not exactly a recent change. And it had >>>>>> allocs in it from day one... >>>>>> >>>>>> I think naming it for what the data in it is ("wal") rather than which >>>>>> process deals with it ("walwriter") is correct, unless the statistics >>>>>> can be known to only *ever* affect one type of process. (And then >>>>>> different processes can affect different columns in the view). As a >>>>>> general rule -- and that's from what I can tell exactly what's being >>>>>> proposed. >>>>> >>>>> Thanks for your comments. I agree with your opinions. >>>>> I changed the view name to "pg_stat_wal". >>>>> >>>>> >>>>> I fixed the code to send the WAL statistics from not only backend and walwriter >>>>> but also checkpointer, walsender and autovacuum worker. >>>> >>>> Good point! Thanks for updating the patch! >>>> >>>> >>>> @@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, >>>> onerel->rd_rel->relisshared, >>>> Max(new_live_tuples, 0), >>>> vacrelstats->new_dead_tuples); >>>> + pgstat_send_wal(); >>>> >>>> I guess that you changed heap_vacuum_rel() as above so that autovacuum >>>> workers can send WAL stats. But heap_vacuum_rel() can be called by >>>> the processes (e.g., backends) other than autovacuum workers? Also >>>> what happens if autovacuum workers just do ANALYZE only? In that case, >>>> heap_vacuum_rel() may not be called. >>>> >>>> Currently autovacuum worker reports the stats at the exit via >>>> pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker >>>> is not the process that basically keeps running during the service. It exits >>>> after it does vacuum or analyze. So ISTM that it's not bad to report the stats >>>> only at the exit, in autovacuum worker case. There is no need to add extra >>>> code for WAL stats report by autovacuum worker. Thought? >>> >>> Thanks, I understood. I removed this code. >>> >>>> >>>> @@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc) >>>> else >>>> RecentFlushPtr = GetXLogReplayRecPtr(NULL); >>>> + /* Send wal statistics */ >>>> + pgstat_send_wal(); >>>> >>>> AFAIR logical walsender uses three loops in WalSndLoop(), WalSndWriteData() >>>> and WalSndWaitForWal(). But could you tell me why added pgstat_send_wal() >>>> into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is the best >>>> for that purpose. >>> >>> I checked what function calls XLogBackgroundFlush() which calls >>> AdvanceXLInsertBuffer() to increment m_wal_buffers_full. >>> >>> I found that WalSndWaitForWal() calls it, so I added it. >> >> Ok. But XLogBackgroundFlush() calls AdvanceXLInsertBuffer() wit the >> second argument opportunistic=true, so in this case WAL write by >> wal_buffers full seems to never happen. Right? If this understanding >> is right, WalSndWaitForWal() doesn't need to call pgstat_send_wal(). >> Probably also walwriter doesn't need to do that. Thanks for updating the patch! This patch adds pgstat_send_wal() in walwriter main loop. But isn't this unnecessary because of the above reason? That is, since walwriter calls AdvanceXLInsertBuffer() with the second argument "opportunistic" = true via XLogBackgroundFlush(), the event of full wal_buffers will never happen. No? >> >> The logical rep walsender can generate WAL and call >> AdvanceXLInsertBuffer() when it executes the replication commands like >> CREATE_REPLICATION_SLOT. But this case is already covered by >> pgstat_report_activity()->pgstat_send_wal() called in PostgresMain(), >> with your patch. So no more calls to pgstat_send_wal() seems necessary >> for logical rep walsender. > > Thanks for your reviews. I didn't notice that. > I updated the patches. Sorry, the above my analysis might be incorrect. During logical replication, walsender may access to the system table. Which may cause HOT pruning or killing of dead index tuple. Also which can cause WAL and full wal_buffers event. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020-09-15 17:10, Fujii Masao wrote: > On 2020/09/15 15:52, Masahiro Ikeda wrote: >> On 2020-09-11 01:40, Fujii Masao wrote: >>> On 2020/09/09 13:57, Masahiro Ikeda wrote: >>>> On 2020-09-07 16:19, Fujii Masao wrote: >>>>> On 2020/09/07 9:58, Masahiro Ikeda wrote: >>>>>> Thanks for the review and advice! >>>>>> >>>>>> On 2020-09-03 16:05, Fujii Masao wrote: >>>>>>> On 2020/09/02 18:56, Masahiro Ikeda wrote: >>>>>>>>> +/* ---------- >>>>>>>>> + * Backend types >>>>>>>>> + * ---------- >>>>>>>>> >>>>>>>>> You seem to forget to add "*/" into the above comment. >>>>>>>>> This issue could cause the following compiler warning. >>>>>>>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block >>>>>>>>> comment [-Wcomment] >>>>>>>> >>>>>>>> Thanks for the comment. I fixed. >>>>>>> >>>>>>> Thanks for the fix! But why are those comments necessary? >>>>>> >>>>>> Sorry about that. This comment is not necessary. >>>>>> I removed it. >>>>>> >>>>>>>> The pg_stat_walwriter is not security restricted now, so >>>>>>>> ordinary users can access it. >>>>>>>> It has the same security level as pg_stat_archiver. If you have >>>>>>>> any comments, please let me know. >>>>>>> >>>>>>> + <structfield>dirty_writes</structfield> >>>>>>> <type>bigint</type> >>>>>>> >>>>>>> I guess that the column name "dirty_writes" derived from >>>>>>> the DTrace probe name. Isn't this name confusing? We should >>>>>>> rename it to "wal_buffers_full" or something? >>>>>> >>>>>> I agree and rename it to "wal_buffers_full". >>>>>> >>>>>>> +/* ---------- >>>>>>> + * PgStat_MsgWalWriter Sent by the walwriter to >>>>>>> update statistics. >>>>>>> >>>>>>> This comment seems not accurate because backends also send it. >>>>>>> >>>>>>> +/* >>>>>>> + * WAL writes statistics counter is updated in XLogWrite >>>>>>> function >>>>>>> + */ >>>>>>> +extern PgStat_MsgWalWriter WalWriterStats; >>>>>>> >>>>>>> This comment seems not right because the counter is not updated >>>>>>> in XLogWrite(). >>>>>> >>>>>> Right. I fixed it to "Sent by each backend and background workers >>>>>> to update WAL statistics." >>>>>> In the future, other statistics will be included so I remove the >>>>>> function's name. >>>>>> >>>>>> >>>>>>> +-- There will surely and maximum one record >>>>>>> +select count(*) = 1 as ok from pg_stat_walwriter; >>>>>>> >>>>>>> What about changing this comment to "There must be only one >>>>>>> record"? >>>>>> >>>>>> Thanks, I fixed. >>>>>> >>>>>>> + WalWriterStats.m_xlog_dirty_writes++; >>>>>>> LWLockRelease(WALWriteLock); >>>>>>> >>>>>>> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be >>>>>>> protected >>>>>>> with WALWriteLock, isn't it better to increment that after >>>>>>> releasing the lock? >>>>>> >>>>>> Thanks, I fixed. >>>>>> >>>>>>> +CREATE VIEW pg_stat_walwriter AS >>>>>>> + SELECT >>>>>>> + pg_stat_get_xlog_dirty_writes() AS dirty_writes, >>>>>>> + pg_stat_get_walwriter_stat_reset_time() AS stats_reset; >>>>>>> + >>>>>>> CREATE VIEW pg_stat_progress_vacuum AS >>>>>>> >>>>>>> In system_views.sql, the definition of pg_stat_walwriter should >>>>>>> be >>>>>>> placed just after that of pg_stat_bgwriter not >>>>>>> pg_stat_progress_analyze. >>>>>> >>>>>> OK, I fixed it. >>>>>> >>>>>>> } >>>>>>> - >>>>>>> /* >>>>>>> * We found an existing collector stats file. Read it and >>>>>>> put all the >>>>>>> >>>>>>> You seem to accidentally have removed the empty line here. >>>>>> >>>>>> Sorry about that. I fixed it. >>>>>> >>>>>>> - errhint("Target must be \"archiver\" or >>>>>>> \"bgwriter\"."))); >>>>>>> + errhint("Target must be \"archiver\" or >>>>>>> \"bgwriter\" or >>>>>>> \"walwriter\"."))); >>>>>>> >>>>>>> There are two "or" in the message, but the former should be >>>>>>> replaced with ","? >>>>>> >>>>>> Thanks, I fixed. >>>>>> >>>>>> >>>>>> On 2020-09-05 18:40, Magnus Hagander wrote: >>>>>>> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao >>>>>>> <masao.fujii@oss.nttdata.com> wrote: >>>>>>> >>>>>>>> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote: >>>>>>>>> From: Fujii Masao <masao.fujii@oss.nttdata.com> >>>>>>>>>>> I changed the view name from pg_stat_walwrites to >>>>>>>> pg_stat_walwriter. >>>>>>>>>>> I think it is better to match naming scheme with other views >>>>>>>> like >>>>>>>>>> pg_stat_bgwriter, >>>>>>>>>>> which is for bgwriter statistics but it has the statistics >>>>>>>> related to backend. >>>>>>>>>> >>>>>>>>>> I prefer the view name pg_stat_walwriter for the consistency >>>>>>>>>> with >>>>>>>>>> other view names. But we also have pg_stat_wal_receiver. Which >>>>>>>>>> makes me think that maybe pg_stat_wal_writer is better for >>>>>>>>>> the consistency. Thought? IMO either of them works for me. >>>>>>>>>> I'd like to hear more opinons about this. >>>>>>>>> >>>>>>>>> I think pg_stat_bgwriter is now a misnomer, because it contains >>>>>>>> the backends' activity. Likewise, pg_stat_walwriter leads to >>>>>>>> misunderstanding because its information is not limited to WAL >>>>>>>> writer. >>>>>>>>> >>>>>>>>> How about simply pg_stat_wal? In the future, we may want to >>>>>>>> include WAL reads in this view, e.g. reading undo logs in zheap. >>>>>>>> >>>>>>>> Sounds reasonable. >>>>>>> >>>>>>> +1. >>>>>>> >>>>>>> pg_stat_bgwriter has had the "wrong name" for quite some time now >>>>>>> -- >>>>>>> it became even more apparent when the checkpointer was split out >>>>>>> to >>>>>>> it's own process, and that's not exactly a recent change. And it >>>>>>> had >>>>>>> allocs in it from day one... >>>>>>> >>>>>>> I think naming it for what the data in it is ("wal") rather than >>>>>>> which >>>>>>> process deals with it ("walwriter") is correct, unless the >>>>>>> statistics >>>>>>> can be known to only *ever* affect one type of process. (And then >>>>>>> different processes can affect different columns in the view). As >>>>>>> a >>>>>>> general rule -- and that's from what I can tell exactly what's >>>>>>> being >>>>>>> proposed. >>>>>> >>>>>> Thanks for your comments. I agree with your opinions. >>>>>> I changed the view name to "pg_stat_wal". >>>>>> >>>>>> >>>>>> I fixed the code to send the WAL statistics from not only backend >>>>>> and walwriter >>>>>> but also checkpointer, walsender and autovacuum worker. >>>>> >>>>> Good point! Thanks for updating the patch! >>>>> >>>>> >>>>> @@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams >>>>> *params, >>>>> onerel->rd_rel->relisshared, >>>>> Max(new_live_tuples, 0), >>>>> vacrelstats->new_dead_tuples); >>>>> + pgstat_send_wal(); >>>>> >>>>> I guess that you changed heap_vacuum_rel() as above so that >>>>> autovacuum >>>>> workers can send WAL stats. But heap_vacuum_rel() can be called by >>>>> the processes (e.g., backends) other than autovacuum workers? Also >>>>> what happens if autovacuum workers just do ANALYZE only? In that >>>>> case, >>>>> heap_vacuum_rel() may not be called. >>>>> >>>>> Currently autovacuum worker reports the stats at the exit via >>>>> pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker >>>>> is not the process that basically keeps running during the service. >>>>> It exits >>>>> after it does vacuum or analyze. So ISTM that it's not bad to >>>>> report the stats >>>>> only at the exit, in autovacuum worker case. There is no need to >>>>> add extra >>>>> code for WAL stats report by autovacuum worker. Thought? >>>> >>>> Thanks, I understood. I removed this code. >>>> >>>>> >>>>> @@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc) >>>>> else >>>>> RecentFlushPtr = GetXLogReplayRecPtr(NULL); >>>>> + /* Send wal statistics */ >>>>> + pgstat_send_wal(); >>>>> >>>>> AFAIR logical walsender uses three loops in WalSndLoop(), >>>>> WalSndWriteData() >>>>> and WalSndWaitForWal(). But could you tell me why added >>>>> pgstat_send_wal() >>>>> into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is >>>>> the best >>>>> for that purpose. >>>> >>>> I checked what function calls XLogBackgroundFlush() which calls >>>> AdvanceXLInsertBuffer() to increment m_wal_buffers_full. >>>> >>>> I found that WalSndWaitForWal() calls it, so I added it. >>> >>> Ok. But XLogBackgroundFlush() calls AdvanceXLInsertBuffer() wit the >>> second argument opportunistic=true, so in this case WAL write by >>> wal_buffers full seems to never happen. Right? If this understanding >>> is right, WalSndWaitForWal() doesn't need to call pgstat_send_wal(). >>> Probably also walwriter doesn't need to do that. > > Thanks for updating the patch! This patch adds pgstat_send_wal() in > walwriter main loop. But isn't this unnecessary because of the above > reason? > That is, since walwriter calls AdvanceXLInsertBuffer() with > the second argument "opportunistic" = true via XLogBackgroundFlush(), > the event of full wal_buffers will never happen. No? Right, I fixed it. >>> >>> The logical rep walsender can generate WAL and call >>> AdvanceXLInsertBuffer() when it executes the replication commands >>> like >>> CREATE_REPLICATION_SLOT. But this case is already covered by >>> pgstat_report_activity()->pgstat_send_wal() called in PostgresMain(), >>> with your patch. So no more calls to pgstat_send_wal() seems >>> necessary >>> for logical rep walsender. >> >> Thanks for your reviews. I didn't notice that. >> I updated the patches. > > Sorry, the above my analysis might be incorrect. During logical > replication, > walsender may access to the system table. Which may cause HOT pruning > or killing of dead index tuple. Also which can cause WAL and > full wal_buffers event. Thought? Thanks. I confirmed that it causes HOT pruning or killing of dead index tuple if DecodeCommit() is called. As you said, DecodeCommit() may access the system table. WalSndLoop() -> XLogSendLogical() -> LogicalDecodingProcessRecord() -> DecodeXactOp() -> DecodeCommit() -> ReorderBufferCommit() -> ReorderBufferProcessTXN() -> RelidByRelfilenode() -> systable_getnext() The wals are generated only when logical replication is performed. So, I added pgstat_send_wal() in XLogSendLogical(). But, I concerned that it causes poor performance since pgstat_send_wal() is called per wal record, Is it necessary to introduce a mechanism to send in bulk? But I worried about how to implement is best. Is it good to send wal statistics per X recoreds? I think there are other background processes that access the system tables, so I organized which process must send wal metrics and added pgstat_send_wal() to the main loop of some background processes for example, autovacuum launcher, logical replication launcher, and logical replication worker's one. (*) [x]: it needs to send it [ ]: it don't need to send it * [ ] postmaster * [ ] background writer * [x] checkpointer: it generates wal for checkpoint. * [ ] walwriter * [x] autovacuum launcher: it accesses to the system tables to get the database list. * [x] autovacuum worker: it generates wal for vacuum. * [ ] stats collector * [x] backend: it generate wal for query execution. * [ ] startup * [ ] archiver * [x] walsender: it accesses to the system tables if logical replication is performed. * [ ] walreceiver * [x] logical replication launcher: it accesses to the system tables to get the subscription list. * [x] logical replication worker: it accesses to the system tables to get oid from relname. * [x] parallel worker: it generates wal for query execution. If my understanding is wrong, please let me know. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in > Thanks. I confirmed that it causes HOT pruning or killing of > dead index tuple if DecodeCommit() is called. > > As you said, DecodeCommit() may access the system table. ... > The wals are generated only when logical replication is performed. > So, I added pgstat_send_wal() in XLogSendLogical(). > > But, I concerned that it causes poor performance > since pgstat_send_wal() is called per wal record, I think that's too frequent. If we want to send any stats to the collector, it is usually done at commit time using pgstat_report_stat(), and the function avoids sending stats too frequently. For logrep-worker, apply_handle_commit() is calling it. It seems to be the place if we want to send the wal stats. Or it may be better to call pgstat_send_wal() via pgstat_report_stat(), like pg_stat_slru(). Currently logrep-laucher, logrep-worker and autovac-launcher (and some other processes?) don't seem (AFAICS) sending scan stats at all but according to the discussion here, we should let such processes send stats. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020-09-18 11:11, Kyotaro Horiguchi wrote: > At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda > <ikedamsh@oss.nttdata.com> wrote in >> Thanks. I confirmed that it causes HOT pruning or killing of >> dead index tuple if DecodeCommit() is called. >> >> As you said, DecodeCommit() may access the system table. > ... >> The wals are generated only when logical replication is performed. >> So, I added pgstat_send_wal() in XLogSendLogical(). >> >> But, I concerned that it causes poor performance >> since pgstat_send_wal() is called per wal record, > > I think that's too frequent. If we want to send any stats to the > collector, it is usually done at commit time using > pgstat_report_stat(), and the function avoids sending stats too > frequently. For logrep-worker, apply_handle_commit() is calling it. It > seems to be the place if we want to send the wal stats. Or it may be > better to call pgstat_send_wal() via pgstat_report_stat(), like > pg_stat_slru(). Thanks for your comments. Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it, the frequency to send statistics is not so high. > Currently logrep-laucher, logrep-worker and autovac-launcher (and some > other processes?) don't seem (AFAICS) sending scan stats at all but > according to the discussion here, we should let such processes send > stats. I added pgstat_report_stat() to logrep-laucher and autovac-launcher. As you said, logrep-worker already calls apply_handle_commit() and pgstat_report_stat(). The checkpointer doesn't seem to call pgstat_report_stat() currently, but since there is a possibility to send wal statistics, I added pgstat_report_stat(). Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2020/09/25 12:06, Masahiro Ikeda wrote: > On 2020-09-18 11:11, Kyotaro Horiguchi wrote: >> At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda >> <ikedamsh@oss.nttdata.com> wrote in >>> Thanks. I confirmed that it causes HOT pruning or killing of >>> dead index tuple if DecodeCommit() is called. >>> >>> As you said, DecodeCommit() may access the system table. >> ... >>> The wals are generated only when logical replication is performed. >>> So, I added pgstat_send_wal() in XLogSendLogical(). >>> >>> But, I concerned that it causes poor performance >>> since pgstat_send_wal() is called per wal record, >> >> I think that's too frequent. If we want to send any stats to the >> collector, it is usually done at commit time using >> pgstat_report_stat(), and the function avoids sending stats too >> frequently. For logrep-worker, apply_handle_commit() is calling it. It >> seems to be the place if we want to send the wal stats. Or it may be >> better to call pgstat_send_wal() via pgstat_report_stat(), like >> pg_stat_slru(). > > Thanks for your comments. > Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it, > the frequency to send statistics is not so high. On second thought, it's strange to include this change in pg_stat_wal patch. Because pgstat_report_stat() sends various stats and that change would affect not only pg_stat_wal but also other stats views. That is, if we really want to make some processes call pgstat_report_stat() newly, which should be implemented as a separate patch. But I'm not sure how useful this change is because probably the stats are almost negligibly small in those processes. This thought seems valid for pgstat_send_wal(). I changed the thought and am inclined to be ok not to call pgstat_send_wal() in some background processes that are very unlikely to generate WAL. For example, logical-rep launcher, logical-rep walsender, and autovacuum launcher. Thought? > >> Currently logrep-laucher, logrep-worker and autovac-launcher (and some >> other processes?) don't seem (AFAICS) sending scan stats at all but >> according to the discussion here, we should let such processes send >> stats. > > I added pgstat_report_stat() to logrep-laucher and autovac-launcher. > As you said, logrep-worker already calls apply_handle_commit() and pgstat_report_stat(). Right. > The checkpointer doesn't seem to call pgstat_report_stat() currently, > but since there is a possibility to send wal statistics, I added pgstat_report_stat(). IMO it's better to call pgstat_send_wal() in the checkpointer, instead, because of the above reason. Thanks for updating the patch! I'd like to share my review comments. + <xref linkend="monitoring-pg-stat-wal-view"/> for details. Like the description for pg_stat_bgwriter, <link> tag should be used instead of <xref>. + <para> + Number of WAL writes when the <xref linkend="guc-wal-buffers"/> are full + </para></entry> I prefer the following description. Thought? "Number of times WAL data was written to the disk because wal_buffers got full" + the <structname>pg_stat_archiver</structname> view ,or <literal>wal</literal> A comma should be just after "view" (not just before "or"). +/* + * WAL global statistics counter. + * This counter is incremented by both each backend and background. + * And then, sent to the stat collector process. + */ +PgStat_MsgWal WalStats; What about merging the comments for BgWriterStats and WalStats into one because they are almost the same? For example, ------------------------------- /* * BgWriter and WAL global statistics counters. * Stored directly in a stats message structure so they can be sent * without needing to copy things around. We assume these init to zeroes. */ PgStat_MsgBgWriter BgWriterStats; PgStat_MsgWal WalStats; ------------------------------- BTW, originally there was the comment "(unused in other processes)" for BgWriterStats. But it seems not true, so I removed it from the above example. + rc = fwrite(&walStats, sizeof(walStats), 1, fpout); + (void) rc; /* we'll check for error with ferror */ Since the patch changes the pgstat file format, PGSTAT_FILE_FORMAT_ID should also be changed? - * Clear out global and archiver statistics so they start from zero in + * Clear out global, archiver and wal statistics so they start from zero in This is not the issue of this patch, but isn't it better to mention also SLRU stats here? That is, what about "Clear out global, archiver, WAL and SLRU statistics so they start from zero in"? I found "wal statistics" and "wal stats" in some comments in the patch, but isn't it better to use "WAL statistics" and "WAL stats", instead, if there is no special reason to use lowercase? + /* + * Read wal stats struct + */ + if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats)) In pgstat_read_db_statsfile_timestamp(), the local variable myWalStats should be declared and be used to store the WAL stats read via fread(), instead. +{ oid => '1136', descr => 'statistics: number of WAL writes when the wal buffers are full', If we change the description of wal_buffers_full column in the document as I proposed, we should also use the proposed description here. +{ oid => '1137', descr => 'statistics: last reset for the walwriter', "the walwriter" should be "WAL" or "WAL activity", etc? + * PgStat_MsgWal Sent by each backend and background workers to update WAL statistics. If your intention here is to mention background processes like checkpointer, "each backend and background workers" should be "backends and background processes"? + PgStat_Counter m_wal_buffers_full; /* number of WAL write caused by full of WAL buffers */ I don't think this comment is necessary. + PgStat_Counter wal_buffers_full; /* number of WAL write caused by full of WAL buffers */ + TimestampTz stat_reset_timestamp; /* last time when the stats reset */ I don't think these comments are necessary. +/* + * WAL writes statistics counter is updated by backend and background workers Same as above. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Fri, Sep 25, 2020 at 11:06 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/09/25 12:06, Masahiro Ikeda wrote: > > On 2020-09-18 11:11, Kyotaro Horiguchi wrote: > >> At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda > >> <ikedamsh@oss.nttdata.com> wrote in > >>> Thanks. I confirmed that it causes HOT pruning or killing of > >>> dead index tuple if DecodeCommit() is called. > >>> > >>> As you said, DecodeCommit() may access the system table. > >> ... > >>> The wals are generated only when logical replication is performed. > >>> So, I added pgstat_send_wal() in XLogSendLogical(). > >>> > >>> But, I concerned that it causes poor performance > >>> since pgstat_send_wal() is called per wal record, > >> > >> I think that's too frequent. If we want to send any stats to the > >> collector, it is usually done at commit time using > >> pgstat_report_stat(), and the function avoids sending stats too > >> frequently. For logrep-worker, apply_handle_commit() is calling it. It > >> seems to be the place if we want to send the wal stats. Or it may be > >> better to call pgstat_send_wal() via pgstat_report_stat(), like > >> pg_stat_slru(). > > > > Thanks for your comments. > > Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it, > > the frequency to send statistics is not so high. > > On second thought, it's strange to include this change in pg_stat_wal patch. > Because pgstat_report_stat() sends various stats and that change would > affect not only pg_stat_wal but also other stats views. That is, if we really > want to make some processes call pgstat_report_stat() newly, which > should be implemented as a separate patch. But I'm not sure how useful > this change is because probably the stats are almost negligibly small > in those processes. > > This thought seems valid for pgstat_send_wal(). I changed the thought > and am inclined to be ok not to call pgstat_send_wal() in some background > processes that are very unlikely to generate WAL. > This makes sense to me. I think even if such background processes have to write WAL due to wal_buffers, it will be accounted next time the backend sends the stats. One minor point, don't we need to reset the counter WalStats.m_wal_buffers_full once we sent the stats, otherwise the same stats will be accounted multiple times. -- With Regards, Amit Kapila.
At Sat, 26 Sep 2020 15:48:49 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > On Fri, Sep 25, 2020 at 11:06 PM Fujii Masao > <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/09/25 12:06, Masahiro Ikeda wrote: > > > On 2020-09-18 11:11, Kyotaro Horiguchi wrote: > > >> At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda > > >> <ikedamsh@oss.nttdata.com> wrote in > > >>> Thanks. I confirmed that it causes HOT pruning or killing of > > >>> dead index tuple if DecodeCommit() is called. > > >>> > > >>> As you said, DecodeCommit() may access the system table. > > >> ... > > >>> The wals are generated only when logical replication is performed. > > >>> So, I added pgstat_send_wal() in XLogSendLogical(). > > >>> > > >>> But, I concerned that it causes poor performance > > >>> since pgstat_send_wal() is called per wal record, > > >> > > >> I think that's too frequent. If we want to send any stats to the > > >> collector, it is usually done at commit time using > > >> pgstat_report_stat(), and the function avoids sending stats too > > >> frequently. For logrep-worker, apply_handle_commit() is calling it. It > > >> seems to be the place if we want to send the wal stats. Or it may be > > >> better to call pgstat_send_wal() via pgstat_report_stat(), like > > >> pg_stat_slru(). > > > > > > Thanks for your comments. > > > Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it, > > > the frequency to send statistics is not so high. > > > > On second thought, it's strange to include this change in pg_stat_wal patch. > > Because pgstat_report_stat() sends various stats and that change would > > affect not only pg_stat_wal but also other stats views. That is, if we really > > want to make some processes call pgstat_report_stat() newly, which > > should be implemented as a separate patch. But I'm not sure how useful > > this change is because probably the stats are almost negligibly small > > in those processes. > > > > This thought seems valid for pgstat_send_wal(). I changed the thought > > and am inclined to be ok not to call pgstat_send_wal() in some background > > processes that are very unlikely to generate WAL. > > > > This makes sense to me. I think even if such background processes have +1 > This makes sense to me. I think even if such background processes have > to write WAL due to wal_buffers, it will be accounted next time the > backend sends the stats. Where do they send the stats? (I think it's ok to omit seding stats at all for such low-wal/heap activity processes.) > One minor point, don't we need to reset the counter > WalStats.m_wal_buffers_full once we sent the stats, otherwise the same > stats will be accounted multiple times. Isn't this doing that? + /* + * Clear out the statistics buffer, so it can be re-used. + */ + MemSet(&WalStats, 0, sizeof(WalStats)); regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020-09-26 19:18, Amit Kapila wrote: > On Fri, Sep 25, 2020 at 11:06 PM Fujii Masao > <masao.fujii@oss.nttdata.com> wrote: >> >> On 2020/09/25 12:06, Masahiro Ikeda wrote: >> > On 2020-09-18 11:11, Kyotaro Horiguchi wrote: >> >> At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda >> >> <ikedamsh@oss.nttdata.com> wrote in >> >>> Thanks. I confirmed that it causes HOT pruning or killing of >> >>> dead index tuple if DecodeCommit() is called. >> >>> >> >>> As you said, DecodeCommit() may access the system table. >> >> ... >> >>> The wals are generated only when logical replication is performed. >> >>> So, I added pgstat_send_wal() in XLogSendLogical(). >> >>> >> >>> But, I concerned that it causes poor performance >> >>> since pgstat_send_wal() is called per wal record, >> >> >> >> I think that's too frequent. If we want to send any stats to the >> >> collector, it is usually done at commit time using >> >> pgstat_report_stat(), and the function avoids sending stats too >> >> frequently. For logrep-worker, apply_handle_commit() is calling it. It >> >> seems to be the place if we want to send the wal stats. Or it may be >> >> better to call pgstat_send_wal() via pgstat_report_stat(), like >> >> pg_stat_slru(). >> > >> > Thanks for your comments. >> > Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it, >> > the frequency to send statistics is not so high. >> >> On second thought, it's strange to include this change in pg_stat_wal >> patch. >> Because pgstat_report_stat() sends various stats and that change would >> affect not only pg_stat_wal but also other stats views. That is, if we >> really >> want to make some processes call pgstat_report_stat() newly, which >> should be implemented as a separate patch. But I'm not sure how useful >> this change is because probably the stats are almost negligibly small >> in those processes. >> >> This thought seems valid for pgstat_send_wal(). I changed the thought >> and am inclined to be ok not to call pgstat_send_wal() in some >> background >> processes that are very unlikely to generate WAL. >> OK, I removed to pgstat_report_stat() for autovaccum launcher, logrep-worker and logrep-launcher. > This makes sense to me. I think even if such background processes have > to write WAL due to wal_buffers, it will be accounted next time the > backend sends the stats. Thanks for your comments. IIUC, since each process counts WalStats.m_wal_buffers_full, backend can't send the counter which other background processes have to write WAL due to wal_buffers. Although we can't track all WAL activity, the impact on the statistics is minimal so we can ignore it. > One minor point, don't we need to reset the counter > WalStats.m_wal_buffers_full once we sent the stats, otherwise the same > stats will be accounted multiple times. Now, the counter is reset in pgstat_send_wal. Isn't it enough? >> The checkpointer doesn't seem to call pgstat_report_stat() currently, >> but since there is a possibility to send wal statistics, I added >> pgstat_report_stat(). > > IMO it's better to call pgstat_send_wal() in the checkpointer, instead, > because of the above reason. Ok, I changed. > Thanks for updating the patch! I'd like to share my review comments. > > + <xref linkend="monitoring-pg-stat-wal-view"/> for details. > > Like the description for pg_stat_bgwriter, <link> tag should be used > instead of <xref>. Thanks, fixed. > + <para> > + Number of WAL writes when the <xref linkend="guc-wal-buffers"/> > are full > + </para></entry> > > I prefer the following description. Thought? > > "Number of times WAL data was written to the disk because wal_buffers > got full" Ok, I changed. > + the <structname>pg_stat_archiver</structname> view ,or > <literal>wal</literal> > > A comma should be just after "view" (not just before "or"). Sorry, anyway I think a comma is not necessary. I removed it. > +/* > + * WAL global statistics counter. > + * This counter is incremented by both each backend and background. > + * And then, sent to the stat collector process. > + */ > +PgStat_MsgWal WalStats; > > What about merging the comments for BgWriterStats and WalStats into > one because they are almost the same? For example, > > ------------------------------- > /* > * BgWriter and WAL global statistics counters. > * Stored directly in a stats message structure so they can be sent > * without needing to copy things around. We assume these init to > zeroes. > */ > PgStat_MsgBgWriter BgWriterStats; > PgStat_MsgWal WalStats; > ------------------------------- > > BTW, originally there was the comment "(unused in other processes)" > for BgWriterStats. But it seems not true, so I removed it from > the above example. Thanks, I changed. > + rc = fwrite(&walStats, sizeof(walStats), 1, fpout); > + (void) rc; /* we'll check for error with ferror */ > > Since the patch changes the pgstat file format, > PGSTAT_FILE_FORMAT_ID should also be changed? Sorry about that. I incremented PGSTAT_FILE_FORMAT_ID by +1. > - * Clear out global and archiver statistics so they start from zero > in > + * Clear out global, archiver and wal statistics so they start from > zero in > > This is not the issue of this patch, but isn't it better to mention > also SLRU stats here? That is, what about "Clear out global, archiver, > WAL and SLRU statistics so they start from zero in"? Thanks, I changed. > I found "wal statistics" and "wal stats" in some comments in the patch, > but isn't it better to use "WAL statistics" and "WAL stats", instead, > if there is no special reason to use lowercase? OK. I fixed it. > + /* > + * Read wal stats struct > + */ > + if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats)) > > In pgstat_read_db_statsfile_timestamp(), the local variable myWalStats > should be declared and be used to store the WAL stats read via fread(), > instead. Thanks, I changed it to declare myWalStats. > +{ oid => '1136', descr => 'statistics: number of WAL writes when the > wal buffers are full', > > If we change the description of wal_buffers_full column in the document > as I proposed, we should also use the proposed description here. OK, I fixed it. > +{ oid => '1137', descr => 'statistics: last reset for the walwriter', > > "the walwriter" should be "WAL" or "WAL activity", etc? Thanks, I fixed it. > + * PgStat_MsgWal Sent by each backend and background workers to > update WAL statistics. > > If your intention here is to mention background processes like > checkpointer, > "each backend and background workers" should be "backends and > background > processes"? Thanks, I fixed it. > + PgStat_Counter m_wal_buffers_full; /* number of WAL write caused by > full of WAL buffers */ > > I don't think this comment is necessary. OK, I removed. > + PgStat_Counter wal_buffers_full; /* number of WAL write caused by > full of WAL buffers */ > + TimestampTz stat_reset_timestamp; /* last time when the stats reset > */ > > I don't think these comments are necessary. OK, I removed > +/* > + * WAL writes statistics counter is updated by backend and background > workers > > Same as above. I fixed it. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On Mon, Sep 28, 2020 at 7:00 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > On 2020-09-26 19:18, Amit Kapila wrote > > > This makes sense to me. I think even if such background processes have > > to write WAL due to wal_buffers, it will be accounted next time the > > backend sends the stats. > > Thanks for your comments. > > IIUC, since each process counts WalStats.m_wal_buffers_full, > backend can't send the counter which other background processes have to > write WAL due to wal_buffers. > Right, I misunderstood it. > Although we can't track all WAL activity, the impact on the statistics > is minimal so we can ignore it. > Yeah, that is probably true. > > One minor point, don't we need to reset the counter > > WalStats.m_wal_buffers_full once we sent the stats, otherwise the same > > stats will be accounted multiple times. > > Now, the counter is reset in pgstat_send_wal. > Isn't it enough? > That should be enough. One other thing that occurred to me today is can't we keep this as part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to reset it. It seems to me this is a cluster-wide stats and somewhat similar to some of the other stats we maintain there. -- With Regards, Amit Kapila.
At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > One other thing that occurred to me today is can't we keep this as > part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to > reset it. It seems to me this is a cluster-wide stats and somewhat > similar to some of the other stats we maintain there. I like that direction, but PgStat_GlobalStats is actually PgStat_BgWriterStats and cleard by a RESET_BGWRITER message. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > > One other thing that occurred to me today is can't we keep this as > > part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to > > reset it. It seems to me this is a cluster-wide stats and somewhat > > similar to some of the other stats we maintain there. > > I like that direction, but PgStat_GlobalStats is actually > PgStat_BgWriterStats and cleard by a RESET_BGWRITER message. > Yeah, I think if we want to pursue this direction then we probably need to have a separate message to set/reset WAL-related stuff. I guess we probably need to have a separate reset timestamp for WAL. I think the difference would be that we can have one structure to refer to global_stats instead of referring to multiple structures and we don't need to issue separate read/write calls but OTOH I don't see many disadvantages of the current approach as well. -- With Regards, Amit Kapila.
On 2020-09-28 12:43, Amit Kapila wrote: > On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: >> >> At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila >> <amit.kapila16@gmail.com> wrote in >> > One other thing that occurred to me today is can't we keep this as >> > part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to >> > reset it. It seems to me this is a cluster-wide stats and somewhat >> > similar to some of the other stats we maintain there. >> >> I like that direction, but PgStat_GlobalStats is actually >> PgStat_BgWriterStats and cleard by a RESET_BGWRITER message. >> > > Yeah, I think if we want to pursue this direction then we probably > need to have a separate message to set/reset WAL-related stuff. I > guess we probably need to have a separate reset timestamp for WAL. I > think the difference would be that we can have one structure to refer > to global_stats instead of referring to multiple structures and we > don't need to issue separate read/write calls but OTOH I don't see > many disadvantages of the current approach as well. IIUC, if we keep wal stats as part of PgStat_GlobalStats, don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats to PgStat_GlobalStats too? Since this is refactoring, I think it's better to make another patch after the current patch is merged. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > On 2020-09-28 12:43, Amit Kapila wrote: > > On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi > > <horikyota.ntt@gmail.com> wrote: > >> > >> At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila > >> <amit.kapila16@gmail.com> wrote in > >> > One other thing that occurred to me today is can't we keep this as > >> > part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to > >> > reset it. It seems to me this is a cluster-wide stats and somewhat > >> > similar to some of the other stats we maintain there. > >> > >> I like that direction, but PgStat_GlobalStats is actually > >> PgStat_BgWriterStats and cleard by a RESET_BGWRITER message. > >> > > > > Yeah, I think if we want to pursue this direction then we probably > > need to have a separate message to set/reset WAL-related stuff. I > > guess we probably need to have a separate reset timestamp for WAL. I > > think the difference would be that we can have one structure to refer > > to global_stats instead of referring to multiple structures and we > > don't need to issue separate read/write calls but OTOH I don't see > > many disadvantages of the current approach as well. > > IIUC, if we keep wal stats as part of PgStat_GlobalStats, > don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats > to PgStat_GlobalStats too? > I have given the idea for wal_stats because there is just one counter in that. I think you can just try to evaluate the merits of each approach and choose whichever you feel is good. This is just a suggestion, if you don't like it feel free to proceed with the current approach. -- With Regards, Amit Kapila.
On 2020-09-29 11:43, Amit Kapila wrote: > On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda > <ikedamsh@oss.nttdata.com> wrote: >> >> On 2020-09-28 12:43, Amit Kapila wrote: >> > On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi >> > <horikyota.ntt@gmail.com> wrote: >> >> >> >> At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila >> >> <amit.kapila16@gmail.com> wrote in >> >> > One other thing that occurred to me today is can't we keep this as >> >> > part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to >> >> > reset it. It seems to me this is a cluster-wide stats and somewhat >> >> > similar to some of the other stats we maintain there. >> >> >> >> I like that direction, but PgStat_GlobalStats is actually >> >> PgStat_BgWriterStats and cleard by a RESET_BGWRITER message. >> >> >> > >> > Yeah, I think if we want to pursue this direction then we probably >> > need to have a separate message to set/reset WAL-related stuff. I >> > guess we probably need to have a separate reset timestamp for WAL. I >> > think the difference would be that we can have one structure to refer >> > to global_stats instead of referring to multiple structures and we >> > don't need to issue separate read/write calls but OTOH I don't see >> > many disadvantages of the current approach as well. >> >> IIUC, if we keep wal stats as part of PgStat_GlobalStats, >> don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats >> to PgStat_GlobalStats too? >> > > I have given the idea for wal_stats because there is just one counter > in that. I think you can just try to evaluate the merits of each > approach and choose whichever you feel is good. This is just a > suggestion, if you don't like it feel free to proceed with the current > approach. Thanks for your suggestion. I understood that the point is that WAL-related stats have just one counter now. Since we may add some WAL-related stats like pgWalUsage.(bytes, records, fpi), I think that the current approach is good. -- Masahiro Ikeda NTT DATA CORPORATION
On 2020/09/29 11:51, Masahiro Ikeda wrote: > On 2020-09-29 11:43, Amit Kapila wrote: >> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: >>> >>> On 2020-09-28 12:43, Amit Kapila wrote: >>> > On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi >>> > <horikyota.ntt@gmail.com> wrote: >>> >> >>> >> At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila >>> >> <amit.kapila16@gmail.com> wrote in >>> >> > One other thing that occurred to me today is can't we keep this as >>> >> > part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to >>> >> > reset it. It seems to me this is a cluster-wide stats and somewhat >>> >> > similar to some of the other stats we maintain there. >>> >> >>> >> I like that direction, but PgStat_GlobalStats is actually >>> >> PgStat_BgWriterStats and cleard by a RESET_BGWRITER message. >>> >> >>> > >>> > Yeah, I think if we want to pursue this direction then we probably >>> > need to have a separate message to set/reset WAL-related stuff. I >>> > guess we probably need to have a separate reset timestamp for WAL. I >>> > think the difference would be that we can have one structure to refer >>> > to global_stats instead of referring to multiple structures and we >>> > don't need to issue separate read/write calls but OTOH I don't see >>> > many disadvantages of the current approach as well. >>> >>> IIUC, if we keep wal stats as part of PgStat_GlobalStats, >>> don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats >>> to PgStat_GlobalStats too? >>> >> >> I have given the idea for wal_stats because there is just one counter >> in that. I think you can just try to evaluate the merits of each >> approach and choose whichever you feel is good. This is just a >> suggestion, if you don't like it feel free to proceed with the current >> approach. > > Thanks for your suggestion. > I understood that the point is that WAL-related stats have just one counter now. > > Since we may add some WAL-related stats like pgWalUsage.(bytes, records, fpi), > I think that the current approach is good. +1 I marked this patch as ready for committer. Barring any objection, I will commit the patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/09/29 11:51, Masahiro Ikeda wrote: > > On 2020-09-29 11:43, Amit Kapila wrote: > >> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > >>> > >>> On 2020-09-28 12:43, Amit Kapila wrote: > >>> > On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi > >>> > <horikyota.ntt@gmail.com> wrote: > >>> >> > >>> >> At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila > >>> >> <amit.kapila16@gmail.com> wrote in > >>> >> > One other thing that occurred to me today is can't we keep this as > >>> >> > part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to > >>> >> > reset it. It seems to me this is a cluster-wide stats and somewhat > >>> >> > similar to some of the other stats we maintain there. > >>> >> > >>> >> I like that direction, but PgStat_GlobalStats is actually > >>> >> PgStat_BgWriterStats and cleard by a RESET_BGWRITER message. > >>> >> > >>> > > >>> > Yeah, I think if we want to pursue this direction then we probably > >>> > need to have a separate message to set/reset WAL-related stuff. I > >>> > guess we probably need to have a separate reset timestamp for WAL. I > >>> > think the difference would be that we can have one structure to refer > >>> > to global_stats instead of referring to multiple structures and we > >>> > don't need to issue separate read/write calls but OTOH I don't see > >>> > many disadvantages of the current approach as well. > >>> > >>> IIUC, if we keep wal stats as part of PgStat_GlobalStats, > >>> don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats > >>> to PgStat_GlobalStats too? > >>> > >> > >> I have given the idea for wal_stats because there is just one counter > >> in that. I think you can just try to evaluate the merits of each > >> approach and choose whichever you feel is good. This is just a > >> suggestion, if you don't like it feel free to proceed with the current > >> approach. > > > > Thanks for your suggestion. > > I understood that the point is that WAL-related stats have just one counter now. > > > > Since we may add some WAL-related stats like pgWalUsage.(bytes, records, fpi), > > I think that the current approach is good. > > +1 > Okay, it makes sense to keep it in the current form if we have a plan to extend this view with additional stats. However, why don't we expose it with a function similar to pg_stat_get_archiver() instead of providing individual functions like pg_stat_get_wal_buffers_full() and pg_stat_get_wal_stat_reset_time? -- With Regards, Amit Kapila.
On 2020/09/30 20:21, Amit Kapila wrote: > On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> On 2020/09/29 11:51, Masahiro Ikeda wrote: >>> On 2020-09-29 11:43, Amit Kapila wrote: >>>> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: >>>>> >>>>> On 2020-09-28 12:43, Amit Kapila wrote: >>>>>> On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi >>>>>> <horikyota.ntt@gmail.com> wrote: >>>>>>> >>>>>>> At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila >>>>>>> <amit.kapila16@gmail.com> wrote in >>>>>>>> One other thing that occurred to me today is can't we keep this as >>>>>>>> part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to >>>>>>>> reset it. It seems to me this is a cluster-wide stats and somewhat >>>>>>>> similar to some of the other stats we maintain there. >>>>>>> >>>>>>> I like that direction, but PgStat_GlobalStats is actually >>>>>>> PgStat_BgWriterStats and cleard by a RESET_BGWRITER message. >>>>>>> >>>>>> >>>>>> Yeah, I think if we want to pursue this direction then we probably >>>>>> need to have a separate message to set/reset WAL-related stuff. I >>>>>> guess we probably need to have a separate reset timestamp for WAL. I >>>>>> think the difference would be that we can have one structure to refer >>>>>> to global_stats instead of referring to multiple structures and we >>>>>> don't need to issue separate read/write calls but OTOH I don't see >>>>>> many disadvantages of the current approach as well. >>>>> >>>>> IIUC, if we keep wal stats as part of PgStat_GlobalStats, >>>>> don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats >>>>> to PgStat_GlobalStats too? >>>>> >>>> >>>> I have given the idea for wal_stats because there is just one counter >>>> in that. I think you can just try to evaluate the merits of each >>>> approach and choose whichever you feel is good. This is just a >>>> suggestion, if you don't like it feel free to proceed with the current >>>> approach. >>> >>> Thanks for your suggestion. >>> I understood that the point is that WAL-related stats have just one counter now. >>> >>> Since we may add some WAL-related stats like pgWalUsage.(bytes, records, fpi), >>> I think that the current approach is good. >> >> +1 >> > > Okay, it makes sense to keep it in the current form if we have a plan > to extend this view with additional stats. However, why don't we > expose it with a function similar to pg_stat_get_archiver() instead of > providing individual functions like pg_stat_get_wal_buffers_full() and > pg_stat_get_wal_stat_reset_time? We can adopt either of those approaches for pg_stat_wal. I think that the former is a bit more flexible because we can collect only one of WAL information even when pg_stat_wal will contain many information in the future, by using the function. But you thought there are some reasons that the latter is better for pg_stat_wal? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2020/09/30 20:21, Amit Kapila wrote: > > On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao > > <masao.fujii@oss.nttdata.com> wrote: > >> > >> On 2020/09/29 11:51, Masahiro Ikeda wrote: > >>> On 2020-09-29 11:43, Amit Kapila wrote: > >>>> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda > >>>> <ikedamsh@oss.nttdata.com> wrote: > >>> Thanks for your suggestion. > >>> I understood that the point is that WAL-related stats have just one > >>> counter now. > >>> > >>> Since we may add some WAL-related stats like pgWalUsage.(bytes, > >>> records, fpi), > >>> I think that the current approach is good. > >> > >> +1 > >> > > Okay, it makes sense to keep it in the current form if we have a plan > > to extend this view with additional stats. However, why don't we > > expose it with a function similar to pg_stat_get_archiver() instead of > > providing individual functions like pg_stat_get_wal_buffers_full() and > > pg_stat_get_wal_stat_reset_time? > > We can adopt either of those approaches for pg_stat_wal. I think that > the former is a bit more flexible because we can collect only one of > WAL information even when pg_stat_wal will contain many information > in the future, by using the function. But you thought there are some > reasons that the latter is better for pg_stat_wal? FWIW I prefer to expose it by one SRF function rather than by subdivided functions. One of the reasons is the less oid consumption and/or reduction of definitions for intrinsic functions. Another reason is at least for me subdivided functions are not useful so much for on-the-fly examination on psql console. I'm often annoyed by realizing I can't recall the exact name of a function, say, pg_last_wal_receive_lsn or such but function names cannot be auto-completed on psql console. "select proname from pg_proc where proname like.. " is one of my friends:p On the other hand "select * from pg_stat_wal" requires no detailed memory. However subdivided functions might be useful if I wanted use just one number of wal-stats in a function, I think it is not a major usage and we can use a SQL query on the view instead. Another reason that I mildly want to object to subdivided functions is I was annoyed that a stats view makes many individual calls to functions that internally share the same statistics entry. That behavior required me to provide an entry-caching feature to my shared-memory statistics patch. regrds. -- Kyotaro Horiguchi NTT Open Source Software Center
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > Another reason that I mildly want to object to subdivided functions is > I was annoyed that a stats view makes many individual calls to > functions that internally share the same statistics entry. That > behavior required me to provide an entry-caching feature to my > shared-memory statistics patch. +1 The views for troubleshooting performance problems should be as light as possible. IIRC, we saw frequently searching pg_stat_replicationconsume unexpectedly high CPU power, because it calls pg_stat_get_activity(null) to get all sessions andjoin them with the walsenders. At that time, we had hundreds of client sessions. We expected pg_stat_replication tobe very lightweight because it provides information about a few walsenders. Regards Takayuki Tsunakawa
On Thu, Oct 1, 2020 at 6:53 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > > > > On 2020/09/30 20:21, Amit Kapila wrote: > > > On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao > > > <masao.fujii@oss.nttdata.com> wrote: > > >> > > >> On 2020/09/29 11:51, Masahiro Ikeda wrote: > > >>> On 2020-09-29 11:43, Amit Kapila wrote: > > >>>> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda > > >>>> <ikedamsh@oss.nttdata.com> wrote: > > >>> Thanks for your suggestion. > > >>> I understood that the point is that WAL-related stats have just one > > >>> counter now. > > >>> > > >>> Since we may add some WAL-related stats like pgWalUsage.(bytes, > > >>> records, fpi), > > >>> I think that the current approach is good. > > >> > > >> +1 > > >> > > > Okay, it makes sense to keep it in the current form if we have a plan > > > to extend this view with additional stats. However, why don't we > > > expose it with a function similar to pg_stat_get_archiver() instead of > > > providing individual functions like pg_stat_get_wal_buffers_full() and > > > pg_stat_get_wal_stat_reset_time? > > > > We can adopt either of those approaches for pg_stat_wal. I think that > > the former is a bit more flexible because we can collect only one of > > WAL information even when pg_stat_wal will contain many information > > in the future, by using the function. But you thought there are some > > reasons that the latter is better for pg_stat_wal? > > FWIW I prefer to expose it by one SRF function rather than by > subdivided functions. One of the reasons is the less oid consumption > and/or reduction of definitions for intrinsic functions. > > Another reason is at least for me subdivided functions are not useful > so much for on-the-fly examination on psql console. I'm often annoyed > by realizing I can't recall the exact name of a function, say, > pg_last_wal_receive_lsn or such but function names cannot be > auto-completed on psql console. "select proname from pg_proc where > proname like.. " is one of my friends:p On the other hand "select * > from pg_stat_wal" requires no detailed memory. > > However subdivided functions might be useful if I wanted use just one > number of wal-stats in a function, I think it is not a major usage and > we can use a SQL query on the view instead. > > Another reason that I mildly want to object to subdivided functions is > I was annoyed that a stats view makes many individual calls to > functions that internally share the same statistics entry. That > behavior required me to provide an entry-caching feature to my > shared-memory statistics patch. > All these are good reasons to expose it via one function and I think that is why most of our existing views also use one function approach. -- With Regards, Amit Kapila.
On 2020-10-01 11:33, Amit Kapila wrote: > On Thu, Oct 1, 2020 at 6:53 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: >> >> At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao >> <masao.fujii@oss.nttdata.com> wrote in >> > >> > >> > On 2020/09/30 20:21, Amit Kapila wrote: >> > > On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao >> > > <masao.fujii@oss.nttdata.com> wrote: >> > >> >> > >> On 2020/09/29 11:51, Masahiro Ikeda wrote: >> > >>> On 2020-09-29 11:43, Amit Kapila wrote: >> > >>>> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda >> > >>>> <ikedamsh@oss.nttdata.com> wrote: >> > >>> Thanks for your suggestion. >> > >>> I understood that the point is that WAL-related stats have just one >> > >>> counter now. >> > >>> >> > >>> Since we may add some WAL-related stats like pgWalUsage.(bytes, >> > >>> records, fpi), >> > >>> I think that the current approach is good. >> > >> >> > >> +1 >> > >> >> > > Okay, it makes sense to keep it in the current form if we have a plan >> > > to extend this view with additional stats. However, why don't we >> > > expose it with a function similar to pg_stat_get_archiver() instead of >> > > providing individual functions like pg_stat_get_wal_buffers_full() and >> > > pg_stat_get_wal_stat_reset_time? >> > >> > We can adopt either of those approaches for pg_stat_wal. I think that >> > the former is a bit more flexible because we can collect only one of >> > WAL information even when pg_stat_wal will contain many information >> > in the future, by using the function. But you thought there are some >> > reasons that the latter is better for pg_stat_wal? >> >> FWIW I prefer to expose it by one SRF function rather than by >> subdivided functions. One of the reasons is the less oid consumption >> and/or reduction of definitions for intrinsic functions. >> >> Another reason is at least for me subdivided functions are not useful >> so much for on-the-fly examination on psql console. I'm often annoyed >> by realizing I can't recall the exact name of a function, say, >> pg_last_wal_receive_lsn or such but function names cannot be >> auto-completed on psql console. "select proname from pg_proc where >> proname like.. " is one of my friends:p On the other hand "select * >> from pg_stat_wal" requires no detailed memory. >> >> However subdivided functions might be useful if I wanted use just one >> number of wal-stats in a function, I think it is not a major usage and >> we can use a SQL query on the view instead. >> >> Another reason that I mildly want to object to subdivided functions is >> I was annoyed that a stats view makes many individual calls to >> functions that internally share the same statistics entry. That >> behavior required me to provide an entry-caching feature to my >> shared-memory statistics patch. >> > > All these are good reasons to expose it via one function and I think > that is why most of our existing views also use one function approach. Thanks for your comments. I didn't notice there are the above disadvantages to provide individual functions. I changed the latest patch to expose it via one function. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2020/10/01 12:56, Masahiro Ikeda wrote: > On 2020-10-01 11:33, Amit Kapila wrote: >> On Thu, Oct 1, 2020 at 6:53 AM Kyotaro Horiguchi >> <horikyota.ntt@gmail.com> wrote: >>> >>> At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>> > >>> > >>> > On 2020/09/30 20:21, Amit Kapila wrote: >>> > > On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao >>> > > <masao.fujii@oss.nttdata.com> wrote: >>> > >> >>> > >> On 2020/09/29 11:51, Masahiro Ikeda wrote: >>> > >>> On 2020-09-29 11:43, Amit Kapila wrote: >>> > >>>> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda >>> > >>>> <ikedamsh@oss.nttdata.com> wrote: >>> > >>> Thanks for your suggestion. >>> > >>> I understood that the point is that WAL-related stats have just one >>> > >>> counter now. >>> > >>> >>> > >>> Since we may add some WAL-related stats like pgWalUsage.(bytes, >>> > >>> records, fpi), >>> > >>> I think that the current approach is good. >>> > >> >>> > >> +1 >>> > >> >>> > > Okay, it makes sense to keep it in the current form if we have a plan >>> > > to extend this view with additional stats. However, why don't we >>> > > expose it with a function similar to pg_stat_get_archiver() instead of >>> > > providing individual functions like pg_stat_get_wal_buffers_full() and >>> > > pg_stat_get_wal_stat_reset_time? >>> > >>> > We can adopt either of those approaches for pg_stat_wal. I think that >>> > the former is a bit more flexible because we can collect only one of >>> > WAL information even when pg_stat_wal will contain many information >>> > in the future, by using the function. But you thought there are some >>> > reasons that the latter is better for pg_stat_wal? >>> >>> FWIW I prefer to expose it by one SRF function rather than by >>> subdivided functions. One of the reasons is the less oid consumption >>> and/or reduction of definitions for intrinsic functions. >>> >>> Another reason is at least for me subdivided functions are not useful >>> so much for on-the-fly examination on psql console. I'm often annoyed >>> by realizing I can't recall the exact name of a function, say, >>> pg_last_wal_receive_lsn or such but function names cannot be >>> auto-completed on psql console. "select proname from pg_proc where >>> proname like.. " is one of my friends:p On the other hand "select * >>> from pg_stat_wal" requires no detailed memory. >>> >>> However subdivided functions might be useful if I wanted use just one >>> number of wal-stats in a function, I think it is not a major usage and >>> we can use a SQL query on the view instead. >>> >>> Another reason that I mildly want to object to subdivided functions is >>> I was annoyed that a stats view makes many individual calls to >>> functions that internally share the same statistics entry. That >>> behavior required me to provide an entry-caching feature to my >>> shared-memory statistics patch. >>> >> >> All these are good reasons to expose it via one function and I think Understood. +1 to expose it as one function. >> that is why most of our existing views also use one function approach. > > Thanks for your comments. > I didn't notice there are the above disadvantages to provide individual functions. > > I changed the latest patch to expose it via one function. Thanks for updating the patch! LGTM. Barring any other objection, I will commit it. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/10/01 10:50, tsunakawa.takay@fujitsu.com wrote: > From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> >> Another reason that I mildly want to object to subdivided functions is >> I was annoyed that a stats view makes many individual calls to >> functions that internally share the same statistics entry. That >> behavior required me to provide an entry-caching feature to my >> shared-memory statistics patch. > > +1 > The views for troubleshooting performance problems should be as light as possible. IIRC, we saw frequently searchingpg_stat_replication consume unexpectedly high CPU power, because it calls pg_stat_get_activity(null) to get allsessions and join them with the walsenders. At that time, we had hundreds of client sessions. We expected pg_stat_replicationto be very lightweight because it provides information about a few walsenders. I think that we can improve that, for example, by storing backend id into WalSndCtl and making pg_stat_get_wal_senders() directly get the walsender's LocalPgBackendStatus with the backend id, rather than joining pg_stat_get_activity() and pg_stat_get_wal_senders(). Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
From: Fujii Masao <masao.fujii@oss.nttdata.com> > I think that we can improve that, for example, by storing backend id > into WalSndCtl and making pg_stat_get_wal_senders() directly > get the walsender's LocalPgBackendStatus with the backend id, > rather than joining pg_stat_get_activity() and pg_stat_get_wal_senders(). Yeah, I had something like that in mind. I think I'll take note of this as my private homework. (Of course, anyone cando it.) Regards Takayuki Tsunakawa
On 2020/10/01 13:35, Fujii Masao wrote: > > > On 2020/10/01 12:56, Masahiro Ikeda wrote: >> On 2020-10-01 11:33, Amit Kapila wrote: >>> On Thu, Oct 1, 2020 at 6:53 AM Kyotaro Horiguchi >>> <horikyota.ntt@gmail.com> wrote: >>>> >>>> At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>>> > >>>> > >>>> > On 2020/09/30 20:21, Amit Kapila wrote: >>>> > > On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao >>>> > > <masao.fujii@oss.nttdata.com> wrote: >>>> > >> >>>> > >> On 2020/09/29 11:51, Masahiro Ikeda wrote: >>>> > >>> On 2020-09-29 11:43, Amit Kapila wrote: >>>> > >>>> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda >>>> > >>>> <ikedamsh@oss.nttdata.com> wrote: >>>> > >>> Thanks for your suggestion. >>>> > >>> I understood that the point is that WAL-related stats have just one >>>> > >>> counter now. >>>> > >>> >>>> > >>> Since we may add some WAL-related stats like pgWalUsage.(bytes, >>>> > >>> records, fpi), >>>> > >>> I think that the current approach is good. >>>> > >> >>>> > >> +1 >>>> > >> >>>> > > Okay, it makes sense to keep it in the current form if we have a plan >>>> > > to extend this view with additional stats. However, why don't we >>>> > > expose it with a function similar to pg_stat_get_archiver() instead of >>>> > > providing individual functions like pg_stat_get_wal_buffers_full() and >>>> > > pg_stat_get_wal_stat_reset_time? >>>> > >>>> > We can adopt either of those approaches for pg_stat_wal. I think that >>>> > the former is a bit more flexible because we can collect only one of >>>> > WAL information even when pg_stat_wal will contain many information >>>> > in the future, by using the function. But you thought there are some >>>> > reasons that the latter is better for pg_stat_wal? >>>> >>>> FWIW I prefer to expose it by one SRF function rather than by >>>> subdivided functions. One of the reasons is the less oid consumption >>>> and/or reduction of definitions for intrinsic functions. >>>> >>>> Another reason is at least for me subdivided functions are not useful >>>> so much for on-the-fly examination on psql console. I'm often annoyed >>>> by realizing I can't recall the exact name of a function, say, >>>> pg_last_wal_receive_lsn or such but function names cannot be >>>> auto-completed on psql console. "select proname from pg_proc where >>>> proname like.. " is one of my friends:p On the other hand "select * >>>> from pg_stat_wal" requires no detailed memory. >>>> >>>> However subdivided functions might be useful if I wanted use just one >>>> number of wal-stats in a function, I think it is not a major usage and >>>> we can use a SQL query on the view instead. >>>> >>>> Another reason that I mildly want to object to subdivided functions is >>>> I was annoyed that a stats view makes many individual calls to >>>> functions that internally share the same statistics entry. That >>>> behavior required me to provide an entry-caching feature to my >>>> shared-memory statistics patch. >>>> >>> >>> All these are good reasons to expose it via one function and I think > > Understood. +1 to expose it as one function. > > >>> that is why most of our existing views also use one function approach. >> >> Thanks for your comments. >> I didn't notice there are the above disadvantages to provide individual functions. >> >> I changed the latest patch to expose it via one function. > > Thanks for updating the patch! LGTM. > Barring any other objection, I will commit it. I updated typedefs.list and pushed the patch. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020-10-02 10:21, Fujii Masao wrote: > On 2020/10/01 13:35, Fujii Masao wrote: >> >> >> On 2020/10/01 12:56, Masahiro Ikeda wrote: >>> On 2020-10-01 11:33, Amit Kapila wrote: >>>> On Thu, Oct 1, 2020 at 6:53 AM Kyotaro Horiguchi >>>> <horikyota.ntt@gmail.com> wrote: >>>>> >>>>> At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao >>>>> <masao.fujii@oss.nttdata.com> wrote in >>>>> > >>>>> > >>>>> > On 2020/09/30 20:21, Amit Kapila wrote: >>>>> > > On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao >>>>> > > <masao.fujii@oss.nttdata.com> wrote: >>>>> > >> >>>>> > >> On 2020/09/29 11:51, Masahiro Ikeda wrote: >>>>> > >>> On 2020-09-29 11:43, Amit Kapila wrote: >>>>> > >>>> On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda >>>>> > >>>> <ikedamsh@oss.nttdata.com> wrote: >>>>> > >>> Thanks for your suggestion. >>>>> > >>> I understood that the point is that WAL-related stats have just one >>>>> > >>> counter now. >>>>> > >>> >>>>> > >>> Since we may add some WAL-related stats like pgWalUsage.(bytes, >>>>> > >>> records, fpi), >>>>> > >>> I think that the current approach is good. >>>>> > >> >>>>> > >> +1 >>>>> > >> >>>>> > > Okay, it makes sense to keep it in the current form if we have a plan >>>>> > > to extend this view with additional stats. However, why don't we >>>>> > > expose it with a function similar to pg_stat_get_archiver() instead of >>>>> > > providing individual functions like pg_stat_get_wal_buffers_full() and >>>>> > > pg_stat_get_wal_stat_reset_time? >>>>> > >>>>> > We can adopt either of those approaches for pg_stat_wal. I think that >>>>> > the former is a bit more flexible because we can collect only one of >>>>> > WAL information even when pg_stat_wal will contain many information >>>>> > in the future, by using the function. But you thought there are some >>>>> > reasons that the latter is better for pg_stat_wal? >>>>> >>>>> FWIW I prefer to expose it by one SRF function rather than by >>>>> subdivided functions. One of the reasons is the less oid >>>>> consumption >>>>> and/or reduction of definitions for intrinsic functions. >>>>> >>>>> Another reason is at least for me subdivided functions are not >>>>> useful >>>>> so much for on-the-fly examination on psql console. I'm often >>>>> annoyed >>>>> by realizing I can't recall the exact name of a function, say, >>>>> pg_last_wal_receive_lsn or such but function names cannot be >>>>> auto-completed on psql console. "select proname from pg_proc where >>>>> proname like.. " is one of my friends:p On the other hand "select * >>>>> from pg_stat_wal" requires no detailed memory. >>>>> >>>>> However subdivided functions might be useful if I wanted use just >>>>> one >>>>> number of wal-stats in a function, I think it is not a major usage >>>>> and >>>>> we can use a SQL query on the view instead. >>>>> >>>>> Another reason that I mildly want to object to subdivided functions >>>>> is >>>>> I was annoyed that a stats view makes many individual calls to >>>>> functions that internally share the same statistics entry. That >>>>> behavior required me to provide an entry-caching feature to my >>>>> shared-memory statistics patch. >>>>> >>>> >>>> All these are good reasons to expose it via one function and I think >> >> Understood. +1 to expose it as one function. >> >> >>>> that is why most of our existing views also use one function >>>> approach. >>> >>> Thanks for your comments. >>> I didn't notice there are the above disadvantages to provide >>> individual functions. >>> >>> I changed the latest patch to expose it via one function. >> >> Thanks for updating the patch! LGTM. >> Barring any other objection, I will commit it. > > I updated typedefs.list and pushed the patch. Thanks! Thanks to all reviewers! Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Hi, I think it's better to add other WAL statistics to the pg_stat_wal view. I'm thinking to add the following statistics. Please let me know your thoughts. 1. Basic wal statistics * wal_records: Total number of WAL records generated * wal_fpi: Total number of WAL full page images generated * wal_bytes: Total amount of WAL bytes generated To understand DB's performance, first, we will check the performance trends for the entire database instance. For example, if the number of wal_fpi becomes higher, users may tune "wal_compression", "checkpoint_timeout" and so on. Although users can check the above statistics via EXPLAIN, auto_explain, autovacuum and pg_stat_statements now, if users want to see the performance trends for the entire database, they must preprocess the statistics. Is it useful to add the sum of the above statistics to the pg_stat_wal view? 2. Number of when new WAL file is created and zero-filled. As Fujii-san already commented, I think it's good for tuning. > Just idea; it may be worth exposing the number of when new WAL file is > created and zero-filled. This initialization may have impact on the > performance of write-heavy workload generating lots of WAL. If this > number is reported high, to reduce the number of this initialization, > we can tune WAL-related parameters so that more "recycled" WAL files > can be hold. 3. Number of when to switch the WAL logfile segment. This is similar to 2, but this counts the number of when WAL file is recylcled too. I think it's useful for tuning "wal_segment_size" if the number is high relative to the startup time, "wal_segment_size" must be bigger. 4. Number of when WAL is flushed I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions. If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload. Also, it's useful for tuning "wal_writer_delay" and "wal_writer_flush_after" for wal writer. If the number is high, users can change the parameter for performance. I think it's better to separate this for backends and wal writer. 5. Wait time when WAL is flushed. This is the accumulated time when wal is flushed. If the time becomes much higher, users can detect the possibility of disk failure. Since users can see how much flash time occupies of the query execution time, it may lead to query tuning and so on. Since there is the above reason, I think it's better to separate this for backends and wal writer. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On 2020-10-06 15:57, Masahiro Ikeda wrote: > Hi, > > I think it's better to add other WAL statistics to the pg_stat_wal > view. > I'm thinking to add the following statistics. Please let me know your > thoughts. > > 1. Basic wal statistics > > * wal_records: Total number of WAL records generated > * wal_fpi: Total number of WAL full page images generated > * wal_bytes: Total amount of WAL bytes generated > > To understand DB's performance, first, we will check the performance > trends for the entire database instance. > For example, if the number of wal_fpi becomes higher, users may tune > "wal_compression", "checkpoint_timeout" and so on. > > Although users can check the above statistics via EXPLAIN, > auto_explain, autovacuum > and pg_stat_statements now, if users want to see the performance > trends for the entire database, > they must preprocess the statistics. > > Is it useful to add the sum of the above statistics to the pg_stat_wal > view? > > > 2. Number of when new WAL file is created and zero-filled. > > As Fujii-san already commented, I think it's good for tuning. > >> Just idea; it may be worth exposing the number of when new WAL file is >> created and zero-filled. This initialization may have impact on the >> performance of write-heavy workload generating lots of WAL. If this >> number is reported high, to reduce the number of this initialization, >> we can tune WAL-related parameters so that more "recycled" WAL files >> can be hold. > > > 3. Number of when to switch the WAL logfile segment. > > This is similar to 2, but this counts the number of when WAL file is > recylcled too. > I think it's useful for tuning "wal_segment_size" > if the number is high relative to the startup time, "wal_segment_size" > must be bigger. > > > 4. Number of when WAL is flushed > > I think it's useful for tuning "synchronous_commit" and "commit_delay" > for query executions. > If the number of WAL is flushed is high, users can know > "synchronous_commit" is useful for the workload. > > Also, it's useful for tuning "wal_writer_delay" and > "wal_writer_flush_after" for wal writer. > If the number is high, users can change the parameter for performance. > > I think it's better to separate this for backends and wal writer. > > > 5. Wait time when WAL is flushed. > > This is the accumulated time when wal is flushed. > If the time becomes much higher, users can detect the possibility of > disk failure. > > Since users can see how much flash time occupies of the query execution > time, > it may lead to query tuning and so on. > > Since there is the above reason, I think it's better to separate this > for backends and wal writer. I made a patch for collecting the above statistics. If you have any comments, please let me know. I think it's better to separate some statistics for backend and backgrounds because tuning target parameters like "synchronous_commit", "wal_writer_delay" and so on are different. But first, I want to get a consensus to collect them. Best regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2020/10/13 11:57, Masahiro Ikeda wrote: > On 2020-10-06 15:57, Masahiro Ikeda wrote: >> Hi, >> >> I think it's better to add other WAL statistics to the pg_stat_wal view. >> I'm thinking to add the following statistics. Please let me know your thoughts. >> >> 1. Basic wal statistics >> >> * wal_records: Total number of WAL records generated >> * wal_fpi: Total number of WAL full page images generated >> * wal_bytes: Total amount of WAL bytes generated +1 >> >> To understand DB's performance, first, we will check the performance >> trends for the entire database instance. >> For example, if the number of wal_fpi becomes higher, users may tune >> "wal_compression", "checkpoint_timeout" and so on. >> >> Although users can check the above statistics via EXPLAIN, >> auto_explain, autovacuum >> and pg_stat_statements now, if users want to see the performance >> trends for the entire database, >> they must preprocess the statistics. >> >> Is it useful to add the sum of the above statistics to the pg_stat_wal view? >> >> >> 2. Number of when new WAL file is created and zero-filled. >> >> As Fujii-san already commented, I think it's good for tuning. >> >>> Just idea; it may be worth exposing the number of when new WAL file is created and zero-filled. This initialization mayhave impact on the performance of write-heavy workload generating lots of WAL. If this number is reported high, to reducethe number of this initialization, we can tune WAL-related parameters so that more "recycled" WAL files can be hold. +1 But it might be better to track the number of when new WAL file is created whether it's zero-filled or not, if file creation and sync itself takes time. >> >> >> 3. Number of when to switch the WAL logfile segment. >> >> This is similar to 2, but this counts the number of when WAL file is >> recylcled too. >> I think it's useful for tuning "wal_segment_size" >> if the number is high relative to the startup time, "wal_segment_size" >> must be bigger. You're thinking to count all the WAL file switch? That number is equal to the number of WAL files generated since the last reset of pg_stat_wal? >> >> >> 4. Number of when WAL is flushed >> >> I think it's useful for tuning "synchronous_commit" and "commit_delay" >> for query executions. >> If the number of WAL is flushed is high, users can know >> "synchronous_commit" is useful for the workload. >> >> Also, it's useful for tuning "wal_writer_delay" and >> "wal_writer_flush_after" for wal writer. >> If the number is high, users can change the parameter for performance. >> >> I think it's better to separate this for backends and wal writer. +1 >> >> >> 5. Wait time when WAL is flushed. >> >> This is the accumulated time when wal is flushed. >> If the time becomes much higher, users can detect the possibility of >> disk failure. This should be tracked, e.g., only when track_io_timing is enabled? Otherwise, tracking that may cause performance overhead. >> >> Since users can see how much flash time occupies of the query execution time, >> it may lead to query tuning and so on. >> >> Since there is the above reason, I think it's better to separate this >> for backends and wal writer. I'm afraid that this counter for a backend may be a bit confusing. Because when the counter indicates small time, we may think that walwriter almost write WAL data and a backend doesn't take time to write WAL. But a backend may be just waiting for walwriter to write WAL. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020-10-15 19:49, Fujii Masao wrote: > On 2020/10/13 11:57, Masahiro Ikeda wrote: >> On 2020-10-06 15:57, Masahiro Ikeda wrote: >>> 2. Number of when new WAL file is created and zero-filled. >>> >>> As Fujii-san already commented, I think it's good for tuning. >>> >>>> Just idea; it may be worth exposing the number of when new WAL file >>>> is created and zero-filled. This initialization may have impact on >>>> the performance of write-heavy workload generating lots of WAL. If >>>> this number is reported high, to reduce the number of this >>>> initialization, we can tune WAL-related parameters so that more >>>> "recycled" WAL files can be hold. > > +1 > > But it might be better to track the number of when new WAL file is > created whether it's zero-filled or not, if file creation and sync > itself > takes time. OK, I changed to track the number of when new WAL file is created. >>> >>> >>> 3. Number of when to switch the WAL logfile segment. >>> >>> This is similar to 2, but this counts the number of when WAL file is >>> recylcled too. >>> I think it's useful for tuning "wal_segment_size" >>> if the number is high relative to the startup time, >>> "wal_segment_size" >>> must be bigger. > > You're thinking to count all the WAL file switch? That number is equal > to the number of WAL files generated since the last reset of > pg_stat_wal? Yes. I think it might be better to count it because I think the ratio in which a new WAL file is created is important. To calculate it, we need the count all the WAL file switch. >>> 4. Number of when WAL is flushed >>> >>> I think it's useful for tuning "synchronous_commit" and >>> "commit_delay" >>> for query executions. >>> If the number of WAL is flushed is high, users can know >>> "synchronous_commit" is useful for the workload. >>> >>> Also, it's useful for tuning "wal_writer_delay" and >>> "wal_writer_flush_after" for wal writer. >>> If the number is high, users can change the parameter for >>> performance. >>> >>> I think it's better to separate this for backends and wal writer. > > +1 Thanks, I separated the statistics for backends and wal writer. When checkpointer process flushes the WAL, the statistics for backends are counted now. Although I think its impact is not big, is it better to make statistics for checkpointer? >>> 5. Wait time when WAL is flushed. >>> >>> This is the accumulated time when wal is flushed. >>> If the time becomes much higher, users can detect the possibility of >>> disk failure. > > This should be tracked, e.g., only when track_io_timing is enabled? > Otherwise, tracking that may cause performance overhead. OK, I changed the implementation. >>> Since users can see how much flash time occupies of the query >>> execution time, >>> it may lead to query tuning and so on. >>> >>> Since there is the above reason, I think it's better to separate this >>> for backends and wal writer. > > > I'm afraid that this counter for a backend may be a bit confusing. > Because > when the counter indicates small time, we may think that walwriter > almost > write WAL data and a backend doesn't take time to write WAL. But a > backend > may be just waiting for walwriter to write WAL. Thanks for your comments. I agreed. Now, the following is the view implemented in the attached patch. If you have any other comments, please let me know. ``` postgres=# SELECT * FROM pg_stat_wal; -[ RECORD 1 ]-------+------------------------------ wal_records | 1000128 # Total number of WAL records generated wal_fpi | 1 # Total number of WAL full page images generated wal_bytes | 124013682 #Total amount of WAL bytes generated wal_buffers_full | 7952 #Total number of WAL data written to the disk because WAL buffers got full wal_file | 14 #Total number of WAL file segment created or opened a pre-existing one wal_init_file | 7 #Total number of WAL file segment created wal_write_backend | 7956 #Total number of WAL data written to the disk by backends wal_write_walwriter | 27 #Total number of WAL data written to the disk by walwriter wal_write_time | 40 # Total amount of time that has been spent in the portion of WAL data was written to disk by backend and walwriter, in milliseconds wal_sync_backend | 1 # Total number of WAL data synced to the disk by backends wal_sync_walwriter | 6 #Total number of WAL data synced to the disk by walwriter wal_sync_time | 0 # Total amount of time that has been spent in the portion of WAL data was synced to disk by backend and walwriter, in milliseconds stats_reset | 2020-10-16 19:41:01.892272+09 ``` Regards, -- Masahiro Ikeda NTT DATA CORPORATION