Thread: New statistics for WAL buffer dirty writes
Hi all, I've created new patch to get/reset statistics of WAL buffer writes (flushes) caused by WAL buffer full. This patch provides two new functions, pg_stat_get_xlog_dirty_write() and pg_stat_reset_xlog_dirty_write(), which have been designed to determine an appropriate value for WAL buffer size. If this counter is increasing in the production environment, it would mean that the WAL buffer size is too small to hold xlog records generated the transactions. So, you can increase your WAL buffer size to keep xlog records and to reduce WAL writes. I think this patch would not affect to WAL write performance, but still paying attention to it. Any comments or suggestions? Regards, ----------------------------------------------------------- [snaga@devvm03 src]$ psql -p 15432 postgres psql (9.3devel) Type "help" for help. postgres=# SELECT pg_stat_get_xlog_dirty_write(); pg_stat_get_xlog_dirty_write ------------------------------ 0 (1 row) postgres=# \q [snaga@devvm03 src]$ pgbench -p 15432 -s 10 -c 32 -t 1000 postgres Scale option ignored, using pgbench_branches table count = 10 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 10 query mode: simple number of clients: 32 number of threads: 1 number of transactions per client: 1000 number of transactions actually processed: 32000/32000 tps = 141.937738 (including connections establishing) tps = 142.123457 (excluding connections establishing) [snaga@devvm03 src]$ psql -p 15432 postgres psql (9.3devel) Type "help" for help. postgres=# SELECT pg_stat_get_xlog_dirty_write(); pg_stat_get_xlog_dirty_write ------------------------------ 0 (1 row) postgres=# begin; BEGIN postgres=# DELETE FROM pgbench_accounts; DELETE 1000000 postgres=# commit; COMMIT postgres=# SELECT pg_stat_get_xlog_dirty_write(); pg_stat_get_xlog_dirty_write ------------------------------ 19229 (1 row) postgres=# SELECT pg_stat_reset_xlog_dirty_write(); pg_stat_reset_xlog_dirty_write -------------------------------- (1 row) postgres=# SELECT pg_stat_get_xlog_dirty_write(); pg_stat_get_xlog_dirty_write ------------------------------ 0 (1 row) postgres=# \q [snaga@devvm03 src]$ ----------------------------------------------------------- -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Attachment
On 07-07-2012 09:00, Satoshi Nagayasu wrote: > I've created new patch to get/reset statistics of WAL buffer > writes (flushes) caused by WAL buffer full. > This new statistic doesn't solve your problem (tune wal_buffers). It doesn't give you the wal_buffers value. It only says "hey, I needed more buffers so I write those dirty ones". It doesn't say how many. I would like to have something that says "hey, you have 1000 buffers available and you are using 100 buffers (10%)". This new statistic is only useful for decreasing the WALWriteLock contention. -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte24x7 e Treinamento
2012/07/07 22:07, Euler Taveira wrote: > On 07-07-2012 09:00, Satoshi Nagayasu wrote: >> I've created new patch to get/reset statistics of WAL buffer >> writes (flushes) caused by WAL buffer full. >> > This new statistic doesn't solve your problem (tune wal_buffers). It doesn't > give you the wal_buffers value. It only says "hey, I needed more buffers so I > write those dirty ones". It doesn't say how many. I would like to have > something that says "hey, you have 1000 buffers available and you are using > 100 buffers (10%)". This new statistic is only useful for decreasing the > WALWriteLock contention. I agree with that it would not tell the exact number for wal_buffers, but it would help DBA understand what's actually happening around WAL buffers. Also, decreasing the WALWriteLock contention is obviously important for DBA in terms of improving database performance. Actually, that's the reason why I'm working on another statistics. :) http://archives.postgresql.org/pgsql-hackers/2012-06/msg01489.php Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
On Jul 7, 2012, at 9:07 AM, Euler Taveira <euler@timbira.com> wrote: > On 07-07-2012 09:00, Satoshi Nagayasu wrote: >> I've created new patch to get/reset statistics of WAL buffer >> writes (flushes) caused by WAL buffer full. >> > This new statistic doesn't solve your problem (tune wal_buffers). It doesn't > give you the wal_buffers value. It only says "hey, I needed more buffers so I > write those dirty ones". It doesn't say how many. I would like to have > something that says "hey, you have 1000 buffers available and you are using > 100 buffers (10%)". This new statistic is only useful for decreasing the > WALWriteLock contention. The number of WAL buffers that you are using is going to change so quickly as to be utterly meaningless. I don't reallysee that there's any statistic we could gather that would tell us how many WAL buffers are needed. This patch seemslike it's on the right track, at least telling you how often you're running out. I'm interested to run some benchmarks with this; I think it could be quite informative. ...Robert
On Sat, Jul 7, 2012 at 3:52 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Jul 7, 2012, at 9:07 AM, Euler Taveira <euler@timbira.com> wrote: >> On 07-07-2012 09:00, Satoshi Nagayasu wrote: >>> I've created new patch to get/reset statistics of WAL buffer >>> writes (flushes) caused by WAL buffer full. >>> >> This new statistic doesn't solve your problem (tune wal_buffers). It doesn't >> give you the wal_buffers value. It only says "hey, I needed more buffers so I >> write those dirty ones". It doesn't say how many. I would like to have >> something that says "hey, you have 1000 buffers available and you are using >> 100 buffers (10%)". This new statistic is only useful for decreasing the >> WALWriteLock contention. > > The number of WAL buffers that you are using is going to change so quickly as to be utterly meaningless. I don't reallysee that there's any statistic we could gather that would tell us how many WAL buffers are needed. This patch seemslike it's on the right track, at least telling you how often you're running out. We could keep a high watermark of "what's the largest percentage we've used", perhaps? -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Jul 7, 2012, at 8:54 AM, Magnus Hagander <magnus@hagander.net> wrote: > On Sat, Jul 7, 2012 at 3:52 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Jul 7, 2012, at 9:07 AM, Euler Taveira <euler@timbira.com> wrote: >>> On 07-07-2012 09:00, Satoshi Nagayasu wrote: >>>> I've created new patch to get/reset statistics of WAL buffer >>>> writes (flushes) caused by WAL buffer full. >>>> >>> This new statistic doesn't solve your problem (tune wal_buffers). It doesn't >>> give you the wal_buffers value. It only says "hey, I needed more buffers so I >>> write those dirty ones". It doesn't say how many. I would like to have >>> something that says "hey, you have 1000 buffers available and you are using >>> 100 buffers (10%)". This new statistic is only useful for decreasing the >>> WALWriteLock contention. >> >> The number of WAL buffers that you are using is going to change so quickly as to be utterly meaningless. I don't reallysee that there's any statistic we could gather that would tell us how many WAL buffers are needed. This patch seemslike it's on the right track, at least telling you how often you're running out. > > We could keep a high watermark of "what's the largest percentage we've > used", perhaps? Sure, but I doubt that would be as informative as this. It's no big deal if you hit 100% every once in a while; what youreally want to know is whether it's happening once per second or once per week. ...Robert
On Sat, Jul 7, 2012 at 7:06 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Jul 7, 2012, at 8:54 AM, Magnus Hagander <magnus@hagander.net> wrote: >> On Sat, Jul 7, 2012 at 3:52 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Jul 7, 2012, at 9:07 AM, Euler Taveira <euler@timbira.com> wrote: >>>> On 07-07-2012 09:00, Satoshi Nagayasu wrote: >>>>> I've created new patch to get/reset statistics of WAL buffer >>>>> writes (flushes) caused by WAL buffer full. >>>>> >>>> This new statistic doesn't solve your problem (tune wal_buffers). It doesn't >>>> give you the wal_buffers value. It only says "hey, I needed more buffers so I >>>> write those dirty ones". It doesn't say how many. I would like to have >>>> something that says "hey, you have 1000 buffers available and you are using >>>> 100 buffers (10%)". This new statistic is only useful for decreasing the >>>> WALWriteLock contention. >>> >>> The number of WAL buffers that you are using is going to change so quickly as to be utterly meaningless. I don't reallysee that there's any statistic we could gather that would tell us how many WAL buffers are needed. This patch seemslike it's on the right track, at least telling you how often you're running out. >> >> We could keep a high watermark of "what's the largest percentage we've >> used", perhaps? > > Sure, but I doubt that would be as informative as this. It's no big deal if you hit 100% every once in a while; what youreally want to know is whether it's happening once per second or once per week. I'm not suggesting one or the other, I'm suggesting that both values might be interesting. Though in reality, you'd want that high watermark to only count if it was the state for more than <n>, which is a lot more difficult to get. so yeah, maybe that's overkill to even try. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
Hi, Jeff Janes has pointed out that my previous patch could hold a number of the dirty writes only in single local backend, and it could not hold all over the cluster, because the counter was allocated in the local process memory. That's true, and I have fixed it with moving the counter into the shared memory, as a member of XLogCtlWrite, to keep total dirty writes in the cluster. Regards, 2012/07/07 21:00, Satoshi Nagayasu wrote: > Hi all, > > I've created new patch to get/reset statistics of WAL buffer > writes (flushes) caused by WAL buffer full. > > This patch provides two new functions, pg_stat_get_xlog_dirty_write() > and pg_stat_reset_xlog_dirty_write(), which have been designed to > determine an appropriate value for WAL buffer size. > > If this counter is increasing in the production environment, > it would mean that the WAL buffer size is too small to hold > xlog records generated the transactions. So, you can increase > your WAL buffer size to keep xlog records and to reduce WAL writes. > > I think this patch would not affect to WAL write performance, > but still paying attention to it. > > Any comments or suggestions? > > Regards, > > ----------------------------------------------------------- > [snaga@devvm03 src]$ psql -p 15432 postgres > psql (9.3devel) > Type "help" for help. > > postgres=# SELECT pg_stat_get_xlog_dirty_write(); > pg_stat_get_xlog_dirty_write > ------------------------------ > 0 > (1 row) > > postgres=# \q > [snaga@devvm03 src]$ pgbench -p 15432 -s 10 -c 32 -t 1000 postgres > Scale option ignored, using pgbench_branches table count = 10 > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 10 > query mode: simple > number of clients: 32 > number of threads: 1 > number of transactions per client: 1000 > number of transactions actually processed: 32000/32000 > tps = 141.937738 (including connections establishing) > tps = 142.123457 (excluding connections establishing) > [snaga@devvm03 src]$ psql -p 15432 postgres > psql (9.3devel) > Type "help" for help. > > postgres=# SELECT pg_stat_get_xlog_dirty_write(); > pg_stat_get_xlog_dirty_write > ------------------------------ > 0 > (1 row) > > postgres=# begin; > BEGIN > postgres=# DELETE FROM pgbench_accounts; > DELETE 1000000 > postgres=# commit; > COMMIT > postgres=# SELECT pg_stat_get_xlog_dirty_write(); > pg_stat_get_xlog_dirty_write > ------------------------------ > 19229 > (1 row) > > postgres=# SELECT pg_stat_reset_xlog_dirty_write(); > pg_stat_reset_xlog_dirty_write > -------------------------------- > > (1 row) > > postgres=# SELECT pg_stat_get_xlog_dirty_write(); > pg_stat_get_xlog_dirty_write > ------------------------------ > 0 > (1 row) > > postgres=# \q > [snaga@devvm03 src]$ > ----------------------------------------------------------- > > -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Attachment
On Sat, Jul 7, 2012 at 9:17 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > Hi, > > Jeff Janes has pointed out that my previous patch could hold > a number of the dirty writes only in single local backend, and > it could not hold all over the cluster, because the counter > was allocated in the local process memory. > > That's true, and I have fixed it with moving the counter into > the shared memory, as a member of XLogCtlWrite, to keep total > dirty writes in the cluster. A concern I have is whether the XLogCtlWrite *Write pointer needs to be declared volatile, to prevent the compiler from pushing operations on them outside of the locks (and so memory barriers) that formally protect them. However I see that existing code with Insert also does not use volatile, so maybe my concern is baseless. Perhaps the compiler guarantees to not move operations on pointers over the boundaries of function calls? The pattern elsewhere in the code seems to be to use volatiles for things protected by spin-locks (implemented by macros) but not for things protected by LWLocks. The comment "XLogCtrlWrite must be protected with WALWriteLock" mis-spells XLogCtlWrite. The final patch will need to add a sections to the documentation. Cheers, Jeff
On 7 July 2012 18:06, Robert Haas <robertmhaas@gmail.com> wrote: > Sure, but I doubt that would be as informative as this. It's no big deal if you hit 100% every once in a while; what youreally want to know is whether it's happening once per second or once per week. Agreed. I can't see an easy way of recording the high water mark % and I'm not sure how we'd use it if we had it. Let's just track how often we run out of space because that is when bad things happen, not before. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Jul 28, 2012 at 6:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > A concern I have is whether the XLogCtlWrite *Write pointer needs to > be declared volatile, to prevent the compiler from pushing operations > on them outside of the locks (and so memory barriers) that formally > protect them. However I see that existing code with Insert also does > not use volatile, so maybe my concern is baseless. Perhaps the > compiler guarantees to not move operations on pointers over the > boundaries of function calls? The pattern elsewhere in the code seems > to be to use volatiles for things protected by spin-locks (implemented > by macros) but not for things protected by LWLocks. Yes, our code is only correct if we assume that the compiler performs no global optimizations - i.e. no movement of code between functions. IMHO, the way we have it now is kind of a mess. SpinLockAcquire and SpinLockRelease are required to be CPU barriers, but they are not required to be compiler barriers. If we changed that so that they were required to act as barriers of both flavors, then (1) we wouldn't need volatile in as many places, (2) we would be less prone to bugs caused by the omission of not-obviously-necessary volatile markings, and (3) we would remove one possible source of breakage that might be induced by a globally optimizing compiler. As things stand today, making a previously-global function static could result in working code breaking, because the static function might be inlined where the global function wasn't. Ouch. Anyway, unless and until we make a definitional change of the sort described above, any pointers used within a spinlock critical section must be volatile; and pray that the compiler doesn't inline anything you weren't expecting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > IMHO, the way we have it now is kind of a mess. SpinLockAcquire and > SpinLockRelease are required to be CPU barriers, but they are not > required to be compiler barriers. If we changed that so that they > were required to act as barriers of both flavors, Since they are macros, how do you propose to do that exactly? I agree that volatile-izing everything in the vicinity is a sucky solution, but the last time we looked at this there did not seem to be a better one. regards, tom lane
On Tue, Jul 31, 2012 at 4:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> IMHO, the way we have it now is kind of a mess. SpinLockAcquire and >> SpinLockRelease are required to be CPU barriers, but they are not >> required to be compiler barriers. If we changed that so that they >> were required to act as barriers of both flavors, > > Since they are macros, how do you propose to do that exactly? Why does it matter that they are macros? > I agree that volatile-izing everything in the vicinity is a sucky > solution, but the last time we looked at this there did not seem to > be a better one. Well, Linux has a barrier() primitive which is defined as a compiler-barrier, so I don't see why we shouldn't be able to manage the same thing. In fact, we've already got it, though it's presently unused; see storage/barrier.h. Looking over s_lock.h, it looks like TAS is typically defined using __asm__ __volatile__, and the __asm__ is marked as clobbering memory. As the fine comments say "this prevents gcc from thinking it can cache the values of shared-memory fields across the asm code", which is another way of saying that it's a compiler barrier. However, there's no similar guard in S_UNLOCK, which is simply declared as a volatile store, and therefore compiler ordering is guaranteed only with respect to other volatile pointer references. If we added something of the form __asm__ __volatile__("" : : : "memory") in there, it should serve as a full compiler barrier. That might have to go in a static inline function as we do with TAS, but I think it should work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jul 31, 2012 at 4:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I agree that volatile-izing everything in the vicinity is a sucky >> solution, but the last time we looked at this there did not seem to >> be a better one. > Well, Linux has a barrier() primitive which is defined as a > compiler-barrier, so I don't see why we shouldn't be able to manage > the same thing. In fact, we've already got it, though it's presently > unused; see storage/barrier.h. Solving the problem for linux only, or gcc only, isn't going to get us to a place where we can stop volatile-izing call sites. We need to be sure it works for every single case supported by s_lock.h. I think you may be right that using __asm__ __volatile__ in gcc S_UNLOCK cases would be a big step forward, but it needs more research to see if that's the only fix needed. regards, tom lane
On Wed, Aug 1, 2012 at 10:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Solving the problem for linux only, or gcc only, isn't going to get us > to a place where we can stop volatile-izing call sites. We need to be > sure it works for every single case supported by s_lock.h. Yep, that's the problem all right. > I think you may be right that using __asm__ __volatile__ in gcc > S_UNLOCK cases would be a big step forward, but it needs more research > to see if that's the only fix needed. I agree, but I will note that I have done a fair bit of research on this already, and there are definitions in storage/barrier.h for pg_compiler_barrier() that cover gcc, icc, HP's aCC, MSVC, and Borland C. There are probably other wacky compilers out there, though: looking at the build farm, I see Sun Studio and sco cc as cases that would likely need some attention. Are there any compilers not represented in the build-farm that we'd mind breaking? If we can get working pg_compiler_barrier() definitions for all the compilers we care about, the rest is probably mostly a question of going through s_lock.h and inserting compiler barriers anywhere that they aren't already implied by the existing code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Jul 28, 2012 at 3:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Sat, Jul 7, 2012 at 9:17 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote: >> Hi, >> >> Jeff Janes has pointed out that my previous patch could hold >> a number of the dirty writes only in single local backend, and >> it could not hold all over the cluster, because the counter >> was allocated in the local process memory. >> >> That's true, and I have fixed it with moving the counter into >> the shared memory, as a member of XLogCtlWrite, to keep total >> dirty writes in the cluster. > ... > The comment "XLogCtrlWrite must be protected with WALWriteLock" > mis-spells XLogCtlWrite. > > The final patch will need to add a sections to the documentation. Thanks to Robert and Tom for addressing my concerns about the pointer volatility. I think there is enough consensus that this is useful without adding more things to it, like histograms or high water marks. However, I do think we will want to add a way to query for the time of the last reset, as other monitoring features are going that way. Is it OK that the count is reset upon a server restart? pg_stat_bgwriter, for example, does not do that. Unfortunately I think fixing this in an acceptable way will be harder than the entire rest of the patch was. The coding looks OK to me, it applies and builds, and passes make check, and does what it says. I didn't do performance testing, as it is hard to believe it would have a meaningful effect. I'll marked it as waiting on author, for the documentation and reset time. I'd ask a more senior hacker to comment on the durability over restarts. Cheers, Jeff
On Sat, Aug 11, 2012 at 6:11 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > However, I do think we will want to add a way to query for the time of > the last reset, as other monitoring features are going that way. That should be easy to add. > Is it OK that the count is reset upon a server restart? I think it's OK. The reason why many of our stats are kept in the stats file is because we have a limited amount of shared memory and therefore can't guarantee (for example) that there's enough to keep stats about EVERY table, since the number of tables is unlimited. However, in cases where the data to be stored is fixed-size, and especially when it's fixed-size and small, there's a lot of sense to keeping the data in shared memory rather than sending stats collector messages. It's a lot less overhead, for one thing. Maybe at some point someone will want to devise a way to hibernate such stats to disk at shutdown (or periodically) and reload them on startup, but it doesn't seem like a must-have to me. Other opinions may vary, of course. > I'll marked it as waiting on author, for the documentation and reset > time. Yeah, we definitely need some documentation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, 2012/08/12 7:11, Jeff Janes wrote: > On Sat, Jul 28, 2012 at 3:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Sat, Jul 7, 2012 at 9:17 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote: >>> Hi, >>> >>> Jeff Janes has pointed out that my previous patch could hold >>> a number of the dirty writes only in single local backend, and >>> it could not hold all over the cluster, because the counter >>> was allocated in the local process memory. >>> >>> That's true, and I have fixed it with moving the counter into >>> the shared memory, as a member of XLogCtlWrite, to keep total >>> dirty writes in the cluster. >> > > ... > >> The comment "XLogCtrlWrite must be protected with WALWriteLock" >> mis-spells XLogCtlWrite. >> >> The final patch will need to add a sections to the documentation. > > Thanks to Robert and Tom for addressing my concerns about the pointer > volatility. > > I think there is enough consensus that this is useful without adding > more things to it, like histograms or high water marks. > > However, I do think we will want to add a way to query for the time of > the last reset, as other monitoring features are going that way. > > Is it OK that the count is reset upon a server restart? > pg_stat_bgwriter, for example, does not do that. Unfortunately I > think fixing this in an acceptable way will be harder than the entire > rest of the patch was. > > > The coding looks OK to me, it applies and builds, and passes make > check, and does what it says. I didn't do performance testing, as it > is hard to believe it would have a meaningful effect. > > I'll marked it as waiting on author, for the documentation and reset > time. I'd ask a more senior hacker to comment on the durability over > restarts. I have rewritten the patch to deal with dirty write statistics through pgstat collector as bgwriter does. Yeah, it's a bit bigger rewrite. With this patch, walwriter process and each backend process would sum up dirty writes, and send it to the stat collector. So, the value could be saved in the stat file, and could be kept on restarting. The statistics could be retreive with using pg_stat_get_xlog_dirty_writes() function, and could be reset with calling pg_stat_reset_shared('walwriter'). Now, I have one concern. The reset time could be captured in globalStats.stat_reset_timestamp, but this value is the same with the bgwriter one. So, once pg_stat_reset_shared('walwriter') is called, stats_reset column in pg_stat_bgwriter does represent the reset time for walwriter, not for bgwriter. How should we handle this? Should we split this value? And should we have new system view for walwriter? Of course, I will work on documentation next. Regards, > > Cheers, > > Jeff > -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Attachment
Satoshi Nagayasu escribió: > With this patch, walwriter process and each backend process > would sum up dirty writes, and send it to the stat collector. > So, the value could be saved in the stat file, and could be > kept on restarting. > > The statistics could be retreive with using > pg_stat_get_xlog_dirty_writes() function, and could be reset > with calling pg_stat_reset_shared('walwriter'). > > Now, I have one concern. > > The reset time could be captured in globalStats.stat_reset_timestamp, > but this value is the same with the bgwriter one. > > So, once pg_stat_reset_shared('walwriter') is called, > stats_reset column in pg_stat_bgwriter does represent > the reset time for walwriter, not for bgwriter. > > How should we handle this? Should we split this value? > And should we have new system view for walwriter? I think the answer to the two last questions is yes. It doesn't seem to make sense, to me, to have a single reset timings for what are effectively two separate things. Please submit an updated patch to next CF. I'm marking this one returned with feedback. Thanks. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
2012/10/24 1:12, Alvaro Herrera wrote: > Satoshi Nagayasu escribió: > >> With this patch, walwriter process and each backend process >> would sum up dirty writes, and send it to the stat collector. >> So, the value could be saved in the stat file, and could be >> kept on restarting. >> >> The statistics could be retreive with using >> pg_stat_get_xlog_dirty_writes() function, and could be reset >> with calling pg_stat_reset_shared('walwriter'). >> >> Now, I have one concern. >> >> The reset time could be captured in globalStats.stat_reset_timestamp, >> but this value is the same with the bgwriter one. >> >> So, once pg_stat_reset_shared('walwriter') is called, >> stats_reset column in pg_stat_bgwriter does represent >> the reset time for walwriter, not for bgwriter. >> >> How should we handle this? Should we split this value? >> And should we have new system view for walwriter? > > I think the answer to the two last questions is yes. It doesn't seem to > make sense, to me, to have a single reset timings for what are > effectively two separate things. > > Please submit an updated patch to next CF. I'm marking this one > returned with feedback. Thanks. > I attached the latest one, which splits the reset_time for bgwriter and walwriter, and provides new system view, called pg_stat_walwriter, to show the dirty write counter and the reset time. Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Attachment
Satoshi Nagayasu escribió: > I attached the latest one, which splits the reset_time > for bgwriter and walwriter, and provides new system view, > called pg_stat_walwriter, to show the dirty write counter > and the reset time. Thanks. I gave this a look and I have a couple of comments: 1. The counter is called dirty_write. I imagine that this comes directly from the name of the nearby DTrace probe, TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_START. That probe comes from email 494C1565.3060105@sun.com committed in 4ee79fd20d9a. But there wasn't much discussion about the name back then. Maybe that was fine at the time because it was pretty much an internal matter, being so deep in the code. But we're now going to expose it to regular users, so we'd better choose a very good name because we're going to be stuck with it for a very long time. And the name "dirty" doesn't ring with me too well; what matters here is not that we're writing a buffer that is dirty, but the fact that we're writing while holding the WalInsertLock, so the name should convey the fact that this is a "locked" write or something like that. Or at least that's how I see the issue. Note the documentation wording: + <entry><structfield>dirty_writes</></entry> + <entry><type>bigint</type></entry> + <entry>Number of dirty writes, which means flushing wal buffers + because of its full.</entry> 2. Should we put bgwriter and walwriter data in separate structs? I don't think this is necessary, but maybe someone opines differently? 3. +/* + * WalWriter global statistics counter. + * Despite its name, this counter is actually used not only in walwriter, + * but also in each backend process to sum up xlog dirty writes. + * Those processes would increment this counter in each XLogWrite call, + * then send it to the stat collector process. + */ +PgStat_MsgWalWriter WalWriterStats; Maybe we should use a different name for the struct, to avoid having to excuse ourselves for the name being wrong ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 29.10.2012 04:58, Satoshi Nagayasu wrote: > 2012/10/24 1:12, Alvaro Herrera wrote: >> Satoshi Nagayasu escribi�: >> >>> With this patch, walwriter process and each backend process >>> would sum up dirty writes, and send it to the stat collector. >>> So, the value could be saved in the stat file, and could be >>> kept on restarting. >>> >>> The statistics could be retreive with using >>> pg_stat_get_xlog_dirty_writes() function, and could be reset >>> with calling pg_stat_reset_shared('walwriter'). >>> >>> Now, I have one concern. >>> >>> The reset time could be captured in globalStats.stat_reset_timestamp, >>> but this value is the same with the bgwriter one. >>> >>> So, once pg_stat_reset_shared('walwriter') is called, >>> stats_reset column in pg_stat_bgwriter does represent >>> the reset time for walwriter, not for bgwriter. >>> >>> How should we handle this? Should we split this value? >>> And should we have new system view for walwriter? >> >> I think the answer to the two last questions is yes. It doesn't seem to >> make sense, to me, to have a single reset timings for what are >> effectively two separate things. >> >> Please submit an updated patch to next CF. I'm marking this one >> returned with feedback. Thanks. >> > > I attached the latest one, which splits the reset_time > for bgwriter and walwriter, and provides new system view, > called pg_stat_walwriter, to show the dirty write counter > and the reset time. I've done a quick review of the v4 patch: 1) applies fine on HEAD, compiles fine 2) "make installcheck" fails because of a difference in the 'rules' test suite (there's a new view "pg_stat_walwriter" - see the attached patch for a fixed version or expected/rules.out) 3) I do agree with Alvaro that using the same struct for two separate components (bgwriter and walwriter) seems a bit awkward. For example you need to have two separate stat_reset fields, the reset code becomes much more verbose (because you need to list individual fields) etc. So I'd vote to either split this into two structures or keeping it as a single structure (although with two views on top of it). 4) Are there any other fields that might be interesting? Right now there's just "dirty_writes" but I guess there are other values. E.g. how much data was actually written etc.? Tomas
Attachment
(2012/11/27 7:42), Alvaro Herrera wrote: > Satoshi Nagayasu escribió: > >> I attached the latest one, which splits the reset_time >> for bgwriter and walwriter, and provides new system view, >> called pg_stat_walwriter, to show the dirty write counter >> and the reset time. > > Thanks. I gave this a look and I have a couple of comments: > > 1. The counter is called dirty_write. I imagine that this comes > directly from the name of the nearby DTrace probe, > TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_START. That probe comes from > email 494C1565.3060105@sun.com committed in 4ee79fd20d9a. But there > wasn't much discussion about the name back then. Maybe that was fine at > the time because it was pretty much an internal matter, being so deep in > the code. But we're now going to expose it to regular users, so we'd > better choose a very good name because we're going to be stuck with it > for a very long time. And the name "dirty" doesn't ring with me too > well; what matters here is not that we're writing a buffer that is > dirty, but the fact that we're writing while holding the WalInsertLock, > so the name should convey the fact that this is a "locked" write or > something like that. Or at least that's how I see the issue. Note the > documentation wording: > > + <entry><structfield>dirty_writes</></entry> > + <entry><type>bigint</type></entry> > + <entry>Number of dirty writes, which means flushing wal buffers > + because of its full.</entry> Yes, "dirty_writes" came from the probe name, and if it needs to be changed, "buffers_flush" would make sense for me in this situation, because this counter is intended to show WAL writes due to "wal buffer full". > 2. Should we put bgwriter and walwriter data in separate structs? I > don't think this is necessary, but maybe someone opines differently? I tried to minimize an impact of this patch, but if I can change this struct, yes, I'd like to split into two structs. > 3. > +/* > + * WalWriter global statistics counter. > + * Despite its name, this counter is actually used not only in walwriter, > + * but also in each backend process to sum up xlog dirty writes. > + * Those processes would increment this counter in each XLogWrite call, > + * then send it to the stat collector process. > + */ > +PgStat_MsgWalWriter WalWriterStats; > > Maybe we should use a different name for the struct, to avoid having to > excuse ourselves for the name being wrong ... Ok. How about WalBufferStats? I think this name could be accepted in both the wal writer and each backend process. -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
(2012/12/10 3:06), Tomas Vondra wrote: > On 29.10.2012 04:58, Satoshi Nagayasu wrote: >> 2012/10/24 1:12, Alvaro Herrera wrote: >>> Satoshi Nagayasu escribi�: >>> >>>> With this patch, walwriter process and each backend process >>>> would sum up dirty writes, and send it to the stat collector. >>>> So, the value could be saved in the stat file, and could be >>>> kept on restarting. >>>> >>>> The statistics could be retreive with using >>>> pg_stat_get_xlog_dirty_writes() function, and could be reset >>>> with calling pg_stat_reset_shared('walwriter'). >>>> >>>> Now, I have one concern. >>>> >>>> The reset time could be captured in globalStats.stat_reset_timestamp, >>>> but this value is the same with the bgwriter one. >>>> >>>> So, once pg_stat_reset_shared('walwriter') is called, >>>> stats_reset column in pg_stat_bgwriter does represent >>>> the reset time for walwriter, not for bgwriter. >>>> >>>> How should we handle this? Should we split this value? >>>> And should we have new system view for walwriter? >>> >>> I think the answer to the two last questions is yes. It doesn't seem to >>> make sense, to me, to have a single reset timings for what are >>> effectively two separate things. >>> >>> Please submit an updated patch to next CF. I'm marking this one >>> returned with feedback. Thanks. >>> >> >> I attached the latest one, which splits the reset_time >> for bgwriter and walwriter, and provides new system view, >> called pg_stat_walwriter, to show the dirty write counter >> and the reset time. > > I've done a quick review of the v4 patch: Thanks for the review, and sorry for my delayed response. > 1) applies fine on HEAD, compiles fine > > 2) "make installcheck" fails because of a difference in the 'rules' > test suite (there's a new view "pg_stat_walwriter" - see the > attached patch for a fixed version or expected/rules.out) Ah, I forgot about the regression test. I will fix it. Thanks. > 3) I do agree with Alvaro that using the same struct for two separate > components (bgwriter and walwriter) seems a bit awkward. For example > you need to have two separate stat_reset fields, the reset code > becomes much more verbose (because you need to list individual > fields) etc. > > So I'd vote to either split this into two structures or keeping it > as a single structure (although with two views on top of it). Ok, I will split it into two structs, PgStat_BgWriterGlobalStats and PgStat_WalWriterGlobalStats, and will modify PgStat_GlobalStats to hold those two structs in the stat collector. > 4) Are there any other fields that might be interesting? Right now > there's just "dirty_writes" but I guess there are other values. E.g. > how much data was actually written etc.? AFAIK, I think those numbers can be obtained by calling pg_current_xlog_insert_location() or pg_current_xlog_location(), but if we need it, I will add it. Regards, > > Tomas > > > > -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
What happened to this patch? We were waiting on an updated version from you. Satoshi Nagayasu wrote: > (2012/12/10 3:06), Tomas Vondra wrote: > >On 29.10.2012 04:58, Satoshi Nagayasu wrote: > >>2012/10/24 1:12, Alvaro Herrera wrote: > >>>Satoshi Nagayasu escribi�: > >>> > >>>>With this patch, walwriter process and each backend process > >>>>would sum up dirty writes, and send it to the stat collector. > >>>>So, the value could be saved in the stat file, and could be > >>>>kept on restarting. > >>>> > >>>>The statistics could be retreive with using > >>>>pg_stat_get_xlog_dirty_writes() function, and could be reset > >>>>with calling pg_stat_reset_shared('walwriter'). > >>>> > >>>>Now, I have one concern. > >>>> > >>>>The reset time could be captured in globalStats.stat_reset_timestamp, > >>>>but this value is the same with the bgwriter one. > >>>> > >>>>So, once pg_stat_reset_shared('walwriter') is called, > >>>>stats_reset column in pg_stat_bgwriter does represent > >>>>the reset time for walwriter, not for bgwriter. > >>>> > >>>>How should we handle this? Should we split this value? > >>>>And should we have new system view for walwriter? > >>> > >>>I think the answer to the two last questions is yes. It doesn't seem to > >>>make sense, to me, to have a single reset timings for what are > >>>effectively two separate things. > >>> > >>>Please submit an updated patch to next CF. I'm marking this one > >>>returned with feedback. Thanks. > >>> > >> > >>I attached the latest one, which splits the reset_time > >>for bgwriter and walwriter, and provides new system view, > >>called pg_stat_walwriter, to show the dirty write counter > >>and the reset time. > > > >I've done a quick review of the v4 patch: > > Thanks for the review, and sorry for my delayed response. > > >1) applies fine on HEAD, compiles fine > > > >2) "make installcheck" fails because of a difference in the 'rules' > > test suite (there's a new view "pg_stat_walwriter" - see the > > attached patch for a fixed version or expected/rules.out) > > Ah, I forgot about the regression test. I will fix it. Thanks. > > >3) I do agree with Alvaro that using the same struct for two separate > > components (bgwriter and walwriter) seems a bit awkward. For example > > you need to have two separate stat_reset fields, the reset code > > becomes much more verbose (because you need to list individual > > fields) etc. > > > > So I'd vote to either split this into two structures or keeping it > > as a single structure (although with two views on top of it). > > Ok, I will split it into two structs, PgStat_BgWriterGlobalStats and > PgStat_WalWriterGlobalStats, and will modify PgStat_GlobalStats to > hold those two structs in the stat collector. > > >4) Are there any other fields that might be interesting? Right now > > there's just "dirty_writes" but I guess there are other values. E.g. > > how much data was actually written etc.? > > AFAIK, I think those numbers can be obtained by calling > pg_current_xlog_insert_location() or pg_current_xlog_location(), > but if we need it, I will add it. > > Regards, -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Will revise and re-resubmit for the next CF. Regards, 2013/07/19 1:06, Alvaro Herrera wrote: > > What happened to this patch? We were waiting on an updated version from > you. > > > Satoshi Nagayasu wrote: >> (2012/12/10 3:06), Tomas Vondra wrote: >>> On 29.10.2012 04:58, Satoshi Nagayasu wrote: >>>> 2012/10/24 1:12, Alvaro Herrera wrote: >>>>> Satoshi Nagayasu escribi�: >>>>> >>>>>> With this patch, walwriter process and each backend process >>>>>> would sum up dirty writes, and send it to the stat collector. >>>>>> So, the value could be saved in the stat file, and could be >>>>>> kept on restarting. >>>>>> >>>>>> The statistics could be retreive with using >>>>>> pg_stat_get_xlog_dirty_writes() function, and could be reset >>>>>> with calling pg_stat_reset_shared('walwriter'). >>>>>> >>>>>> Now, I have one concern. >>>>>> >>>>>> The reset time could be captured in globalStats.stat_reset_timestamp, >>>>>> but this value is the same with the bgwriter one. >>>>>> >>>>>> So, once pg_stat_reset_shared('walwriter') is called, >>>>>> stats_reset column in pg_stat_bgwriter does represent >>>>>> the reset time for walwriter, not for bgwriter. >>>>>> >>>>>> How should we handle this? Should we split this value? >>>>>> And should we have new system view for walwriter? >>>>> >>>>> I think the answer to the two last questions is yes. It doesn't seem to >>>>> make sense, to me, to have a single reset timings for what are >>>>> effectively two separate things. >>>>> >>>>> Please submit an updated patch to next CF. I'm marking this one >>>>> returned with feedback. Thanks. >>>>> >>>> >>>> I attached the latest one, which splits the reset_time >>>> for bgwriter and walwriter, and provides new system view, >>>> called pg_stat_walwriter, to show the dirty write counter >>>> and the reset time. >>> >>> I've done a quick review of the v4 patch: >> >> Thanks for the review, and sorry for my delayed response. >> >>> 1) applies fine on HEAD, compiles fine >>> >>> 2) "make installcheck" fails because of a difference in the 'rules' >>> test suite (there's a new view "pg_stat_walwriter" - see the >>> attached patch for a fixed version or expected/rules.out) >> >> Ah, I forgot about the regression test. I will fix it. Thanks. >> >>> 3) I do agree with Alvaro that using the same struct for two separate >>> components (bgwriter and walwriter) seems a bit awkward. For example >>> you need to have two separate stat_reset fields, the reset code >>> becomes much more verbose (because you need to list individual >>> fields) etc. >>> >>> So I'd vote to either split this into two structures or keeping it >>> as a single structure (although with two views on top of it). >> >> Ok, I will split it into two structs, PgStat_BgWriterGlobalStats and >> PgStat_WalWriterGlobalStats, and will modify PgStat_GlobalStats to >> hold those two structs in the stat collector. >> >>> 4) Are there any other fields that might be interesting? Right now >>> there's just "dirty_writes" but I guess there are other values. E.g. >>> how much data was actually written etc.? >> >> AFAIK, I think those numbers can be obtained by calling >> pg_current_xlog_insert_location() or pg_current_xlog_location(), >> but if we need it, I will add it. >> >> Regards, > > -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Hi, The revised patch for wal buffer statistics is attached. A test script is also attached. Please take a look. Regards, (2013/07/19 7:49), Satoshi Nagayasu wrote: > Will revise and re-resubmit for the next CF. > > Regards, > > 2013/07/19 1:06, Alvaro Herrera wrote: >> >> What happened to this patch? We were waiting on an updated version from >> you. >> >> >> Satoshi Nagayasu wrote: >>> (2012/12/10 3:06), Tomas Vondra wrote: >>>> On 29.10.2012 04:58, Satoshi Nagayasu wrote: >>>>> 2012/10/24 1:12, Alvaro Herrera wrote: >>>>>> Satoshi Nagayasu escribi�: >>>>>> >>>>>>> With this patch, walwriter process and each backend process >>>>>>> would sum up dirty writes, and send it to the stat collector. >>>>>>> So, the value could be saved in the stat file, and could be >>>>>>> kept on restarting. >>>>>>> >>>>>>> The statistics could be retreive with using >>>>>>> pg_stat_get_xlog_dirty_writes() function, and could be reset >>>>>>> with calling pg_stat_reset_shared('walwriter'). >>>>>>> >>>>>>> Now, I have one concern. >>>>>>> >>>>>>> The reset time could be captured in >>>>>>> globalStats.stat_reset_timestamp, >>>>>>> but this value is the same with the bgwriter one. >>>>>>> >>>>>>> So, once pg_stat_reset_shared('walwriter') is called, >>>>>>> stats_reset column in pg_stat_bgwriter does represent >>>>>>> the reset time for walwriter, not for bgwriter. >>>>>>> >>>>>>> How should we handle this? Should we split this value? >>>>>>> And should we have new system view for walwriter? >>>>>> >>>>>> I think the answer to the two last questions is yes. It doesn't >>>>>> seem to >>>>>> make sense, to me, to have a single reset timings for what are >>>>>> effectively two separate things. >>>>>> >>>>>> Please submit an updated patch to next CF. I'm marking this one >>>>>> returned with feedback. Thanks. >>>>>> >>>>> >>>>> I attached the latest one, which splits the reset_time >>>>> for bgwriter and walwriter, and provides new system view, >>>>> called pg_stat_walwriter, to show the dirty write counter >>>>> and the reset time. >>>> >>>> I've done a quick review of the v4 patch: >>> >>> Thanks for the review, and sorry for my delayed response. >>> >>>> 1) applies fine on HEAD, compiles fine >>>> >>>> 2) "make installcheck" fails because of a difference in the 'rules' >>>> test suite (there's a new view "pg_stat_walwriter" - see the >>>> attached patch for a fixed version or expected/rules.out) >>> >>> Ah, I forgot about the regression test. I will fix it. Thanks. >>> >>>> 3) I do agree with Alvaro that using the same struct for two separate >>>> components (bgwriter and walwriter) seems a bit awkward. For >>>> example >>>> you need to have two separate stat_reset fields, the reset code >>>> becomes much more verbose (because you need to list individual >>>> fields) etc. >>>> >>>> So I'd vote to either split this into two structures or keeping it >>>> as a single structure (although with two views on top of it). >>> >>> Ok, I will split it into two structs, PgStat_BgWriterGlobalStats and >>> PgStat_WalWriterGlobalStats, and will modify PgStat_GlobalStats to >>> hold those two structs in the stat collector. >>> >>>> 4) Are there any other fields that might be interesting? Right now >>>> there's just "dirty_writes" but I guess there are other values. >>>> E.g. >>>> how much data was actually written etc.? >>> >>> AFAIK, I think those numbers can be obtained by calling >>> pg_current_xlog_insert_location() or pg_current_xlog_location(), >>> but if we need it, I will add it. >>> >>> Regards, >> >> > > -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Attachment
On 9/6/13 11:32 PM, Satoshi Nagayasu wrote: > The revised patch for wal buffer statistics is attached. > A test script is also attached. Please take a look. You have duplicate OIDs. Run the script duplicate_oids to find them.
On Mon, Sep 9, 2013 at 2:43 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > You have duplicate OIDs. Run the script duplicate_oids to find them. Are you considering picking up the script that Andrew wrote to automate that as part of the build? I wonder why that didn't end up going anywhere. -- Peter Geoghegan
On Mon, 2013-09-09 at 14:51 -0700, Peter Geoghegan wrote: > On Mon, Sep 9, 2013 at 2:43 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > > You have duplicate OIDs. Run the script duplicate_oids to find them. > > Are you considering picking up the script that Andrew wrote to > automate that as part of the build? I wonder why that didn't end up > going anywhere. It is automated. Andrew's rewrite is still worth considering, and I had planned to do that, but it doesn't provide any functionality we don't already have.
On Mon, Sep 9, 2013 at 6:05 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > It is automated. Oh, yeah. I see that the maintainer-check target does that. I should probably get into the habit of using targets other than check/installcheck, as you recently demonstrated. -- Peter Geoghegan
Thanks for checking. Revised one attached. (2013/09/10 6:43), Peter Eisentraut wrote: > On 9/6/13 11:32 PM, Satoshi Nagayasu wrote: >> The revised patch for wal buffer statistics is attached. >> A test script is also attached. Please take a look. > > You have duplicate OIDs. Run the script duplicate_oids to find them. > -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Attachment
On 9/10/13 3:37 AM, Satoshi Nagayasu wrote: > Thanks for checking. Revised one attached. Please fix compiler warning: walwriter.c: In function ‘WalWriterMain’: walwriter.c:293:3: warning: implicit declaration of function ‘pgstat_send_walwriter’ [-Wimplicit-function-declaration]
(2013/09/10 22:48), Peter Eisentraut wrote: > On 9/10/13 3:37 AM, Satoshi Nagayasu wrote: >> Thanks for checking. Revised one attached. > > Please fix compiler warning: > > walwriter.c: In function ‘WalWriterMain’: > walwriter.c:293:3: warning: implicit declaration of function > ‘pgstat_send_walwriter’ [-Wimplicit-function-declaration] Thanks. Fixed. -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Attachment
On Wed, Sep 11, 2013 at 12:43 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > (2013/09/10 22:48), Peter Eisentraut wrote: >> On 9/10/13 3:37 AM, Satoshi Nagayasu wrote: >>> Thanks for checking. Revised one attached. >> >> Please fix compiler warning: >> >> walwriter.c: In function ‘WalWriterMain’: >> walwriter.c:293:3: warning: implicit declaration of function >> ‘pgstat_send_walwriter’ [-Wimplicit-function-declaration] > > Thanks. Fixed. The patch looks good to me. I have some comments: The description of pg_stat_reset_shared() should mention pg_stat_walwriter in the document. We should implment something like pg_stat_reset_shared('all') so that we can easily reset all cluster-wide statistics counters to zero? Some background workers may write WAL because WAL buffer is full. So you seem to need to change those processes so that they also can increase the xlog_dirty_writes counter. Regards, -- Fujii Masao