Thread: Reduced power consumption in WAL Writer process
Attached is patch for the WAL writer that removes its tight polling loop (which probably doesn't get hit often in practice, as we just sleep if wal_writer_delay is under a second), and, at least potentially, reduces power consumption when idle by using a latch. I will break all remaining power consumption work down into per-auxiliary process patches. I think that this is appropriate - if we hit a snag on one of the processes, there is no need to have that hold up everything. I've commented that we handle all expected signals, and therefore we shouldn't worry about having timeout invalidated by signals, just as with the archiver. Previously, we didn't even worry about Postmaster death within the tight polling loop, presumably because wal_writer_delay is typically small enough to avoid that being a problem. I thought that WL_POSTMASTER_DEATH might be superfluous here, but then again there is a codepath specifically for the case where wal_writer_delay exceeds one second, so it is included in this initial version. Comments? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
On Wed, Jul 13, 2011 at 10:56 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > Attached is patch for the WAL writer that removes its tight polling > loop (which probably doesn't get hit often in practice, as we just > sleep if wal_writer_delay is under a second), and, at least > potentially, reduces power consumption when idle by using a latch. > > I will break all remaining power consumption work down into > per-auxiliary process patches. I think that this is appropriate - if > we hit a snag on one of the processes, there is no need to have that > hold up everything. > > I've commented that we handle all expected signals, and therefore we > shouldn't worry about having timeout invalidated by signals, just as > with the archiver. Previously, we didn't even worry about Postmaster > death within the tight polling loop, presumably because > wal_writer_delay is typically small enough to avoid that being a > problem. I thought that WL_POSTMASTER_DEATH might be superfluous here, > but then again there is a codepath specifically for the case where > wal_writer_delay exceeds one second, so it is included in this initial > version. > > Comments? ISTM that this in itself isn't enough to reduce power consumption. Currently the only people that use WALWriter are asynchronous commits, so we should include within RecordTransactionCommit() a SetLatch() command for the WALWriter. That way we can have WALWriter sleep until its needed. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jul 14, 2011 at 5:39 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, Jul 13, 2011 at 10:56 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >> Attached is patch for the WAL writer that removes its tight polling >> loop (which probably doesn't get hit often in practice, as we just >> sleep if wal_writer_delay is under a second), and, at least >> potentially, reduces power consumption when idle by using a latch. >> >> I will break all remaining power consumption work down into >> per-auxiliary process patches. I think that this is appropriate - if >> we hit a snag on one of the processes, there is no need to have that >> hold up everything. >> >> I've commented that we handle all expected signals, and therefore we >> shouldn't worry about having timeout invalidated by signals, just as >> with the archiver. Previously, we didn't even worry about Postmaster >> death within the tight polling loop, presumably because >> wal_writer_delay is typically small enough to avoid that being a >> problem. I thought that WL_POSTMASTER_DEATH might be superfluous here, >> but then again there is a codepath specifically for the case where >> wal_writer_delay exceeds one second, so it is included in this initial >> version. >> >> Comments? > > ISTM that this in itself isn't enough to reduce power consumption. > > Currently the only people that use WALWriter are asynchronous commits, > so we should include within RecordTransactionCommit() a SetLatch() > command for the WALWriter. > > That way we can have WALWriter sleep until its needed. +1 Currently walwriter might write out the WAL before a transaction commits. IOW, walwriter tries to write out the WAL in wal_buffers in every wakeups. This might be useful for long transaction which generates lots of WAL records before commit. So we should call SetLatch() in XLogInsert() instead of RecordTransactionCommit()? Though I'm not sure how much walwriter improves the performance of synchronous commit case.. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Jul 14, 2011 at 9:57 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > Currently walwriter might write out the WAL before a transaction commits. > IOW, walwriter tries to write out the WAL in wal_buffers in every wakeups. > This might be useful for long transaction which generates lots of WAL > records before commit. So we should call SetLatch() in XLogInsert() instead > of RecordTransactionCommit()? Though I'm not sure how much walwriter > improves the performance of synchronous commit case.. Yeh, we did previously have a heuristic to write out the WAL when it was more than half full. Not sure I want to put exactly that code back into such a busy code path. I suggest that we set latch every time the wal buffers wrap. So at the bottom of AdvanceXLInsertBuffer(), if nextidx == 0 then SetLatch on the WALWriter. That's a simple test and we only check it if we're switch WAL buffer page. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 14.07.2011 12:42, Simon Riggs wrote: > On Thu, Jul 14, 2011 at 9:57 AM, Fujii Masao<masao.fujii@gmail.com> wrote: > >> Currently walwriter might write out the WAL before a transaction commits. >> IOW, walwriter tries to write out the WAL in wal_buffers in every wakeups. >> This might be useful for long transaction which generates lots of WAL >> records before commit. So we should call SetLatch() in XLogInsert() instead >> of RecordTransactionCommit()? Though I'm not sure how much walwriter >> improves the performance of synchronous commit case.. > > Yeh, we did previously have a heuristic to write out the WAL when it > was more than half full. Not sure I want to put exactly that code back > into such a busy code path. > > I suggest that we set latch every time the wal buffers wrap. > > So at the bottom of AdvanceXLInsertBuffer(), if nextidx == 0 then > SetLatch on the WALWriter. > > That's a simple test and we only check it if we're switch WAL buffer page. That was my first though too - but I wonder if that's too aggressive? A backend that does for example a large bulk load will cycle through the buffers real quick. It seems like a bad idea to wake up walwriter between each buffer in that case. Then again, setting a latch that's already set is cheap, so maybe it works fine in practice. Maybe it would be better to do it less frequently, say, every time you switch to new WAL segment. Or every 10 buffers or something like that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Jul 14, 2011 at 10:53 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 14.07.2011 12:42, Simon Riggs wrote: >> >> On Thu, Jul 14, 2011 at 9:57 AM, Fujii Masao<masao.fujii@gmail.com> >> wrote: >> >>> Currently walwriter might write out the WAL before a transaction commits. >>> IOW, walwriter tries to write out the WAL in wal_buffers in every >>> wakeups. >>> This might be useful for long transaction which generates lots of WAL >>> records before commit. So we should call SetLatch() in XLogInsert() >>> instead >>> of RecordTransactionCommit()? Though I'm not sure how much walwriter >>> improves the performance of synchronous commit case.. >> >> Yeh, we did previously have a heuristic to write out the WAL when it >> was more than half full. Not sure I want to put exactly that code back >> into such a busy code path. >> >> I suggest that we set latch every time the wal buffers wrap. >> >> So at the bottom of AdvanceXLInsertBuffer(), if nextidx == 0 then >> SetLatch on the WALWriter. >> >> That's a simple test and we only check it if we're switch WAL buffer page. > > That was my first though too - but I wonder if that's too aggressive? A > backend that does for example a large bulk load will cycle through the > buffers real quick. It seems like a bad idea to wake up walwriter between > each buffer in that case. Then again, setting a latch that's already set is > cheap, so maybe it works fine in practice. > > Maybe it would be better to do it less frequently, say, every time you > switch to new WAL segment. Or every 10 buffers or something like that. Yes, that roughly what I'm saying. When nextidx == 0 is just after we wrapped wal_buffers, i.e. we only wake up the WALWriter every wal_buffers pages. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Jul 14, 2011, at 4:42 AM, Simon Riggs <simon@2ndQuadrant.com> wrote: > On Thu, Jul 14, 2011 at 9:57 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > >> Currently walwriter might write out the WAL before a transaction commits. >> IOW, walwriter tries to write out the WAL in wal_buffers in every wakeups. >> This might be useful for long transaction which generates lots of WAL >> records before commit. So we should call SetLatch() in XLogInsert() instead >> of RecordTransactionCommit()? Though I'm not sure how much walwriter >> improves the performance of synchronous commit case.. > > Yeh, we did previously have a heuristic to write out the WAL when it > was more than half full. Not sure I want to put exactly that code back > into such a busy code path. > > I suggest that we set latch every time the wal buffers wrap. > > So at the bottom of AdvanceXLInsertBuffer(), if nextidx == 0 then > SetLatch on the WALWriter. > > That's a simple test and we only check it if we're switch WAL buffer page. Seems reasonable at first blush, but I worry that changing the algorithm here could change performance in somewhat unpredictableways. Some of those might be improvements, but I think some careful measurement would be in order. If the primary goal here is to reduce power consumption, another option would be to keep the regular wake-ups most of thetime but have some mechanism for putting the process to sleep until wakened when no activity happens for a certain periodof time - say, 10 cycles. I'm not at all sure that's better, but it would be less of a change to the existing behavior. ...Robert
My instrumentation wasn't that good. I was using powertop 1.13, which apparently goes to great lengths to group processes by various criteria (including process group), but doesn't actually offer the option of seeing that instrumentation per process. I'm using version 1.98 beta 1 as of now, which provides per-process instrumentation - it only works with Kernel versions 2.6.36+ though. The per process breakdown of wake-ups per second is: 248.3 us/s 5.0 Process postgres: wal writer process 281.0 us/s 4.9 Process postgres: writer process 82.3 us/s 1.1 Process postgres: autovacuum launcher process 82.3 us/s 0.4 Process postgres: stats collector process 442.8 us/s 0.15 Process postgres 23.6 us/s 0.15 Process postgres -d1 The second column from the left is wake-ups per second. As previously reported and as you see here, there are about 11.5 idle wake-ups per second per cluster. Note that archiving was running when this instrumentation was performed, but the recently-improved archiver wasn't listed at all, presumably because powertop didn't detect any wake-ups in the period of instrumentation, which was 10 or 20 seconds. As you'd expect, the WAL writer is woken up (1 second / wal_writer_delay milliseconds) times per second. On 14 July 2011 11:00, Simon Riggs <simon@2ndquadrant.com> wrote: >> That was my first though too - but I wonder if that's too aggressive? A >> backend that does for example a large bulk load will cycle through the >> buffers real quick. It seems like a bad idea to wake up walwriter between >> each buffer in that case. Then again, setting a latch that's already set is >> cheap, so maybe it works fine in practice. >> >> Maybe it would be better to do it less frequently, say, every time you >> switch to new WAL segment. Or every 10 buffers or something like that. > > Yes, that roughly what I'm saying. When nextidx == 0 is just after we > wrapped wal_buffers, i.e. we only wake up the WALWriter every > wal_buffers pages. I'll work towards that implementation. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Fri, Jul 15, 2011 at 2:29 PM, Robert Haas <robertmhaas@gmail.com> wrote: > If the primary goal here is to reduce power consumption, another option > would be to keep the regular wake-ups most of the time but have some > mechanism for putting the process to sleep until wakened when no activity > happens for a certain period of time - say, 10 cycles. I'm not at all sure > that's better, but it would be less of a change to the existing behavior. Now we have them, latches seem the best approach because they (mostly) avoid heuristics. This proposal works same or better for async transactions. The only difference is how bulk write operations are handled. As long as we wake WALWriter before wal_buffers fills then we'll be good. Wakeup once per wal buffer is too much. I agree we should measure this to check how frequently wakeups are required for bulk ops. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Excerpts from Simon Riggs's message of vie jul 15 09:55:40 -0400 2011: > On Fri, Jul 15, 2011 at 2:29 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > > If the primary goal here is to reduce power consumption, another option > > would be to keep the regular wake-ups most of the time but have some > > mechanism for putting the process to sleep until wakened when no activity > > happens for a certain period of time - say, 10 cycles. I'm not at all sure > > that's better, but it would be less of a change to the existing behavior. > > Now we have them, latches seem the best approach because they (mostly) > avoid heuristics. Yeah, there's no reason for "less of a change" to be a criterion to determine the best way forward. The new tech is clearly a better solution overall, so lets just get rid of the cruft. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Jul 15, 2011, at 8:55 AM, Simon Riggs <simon@2ndQuadrant.com> wrote: > On Fri, Jul 15, 2011 at 2:29 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> If the primary goal here is to reduce power consumption, another option >> would be to keep the regular wake-ups most of the time but have some >> mechanism for putting the process to sleep until wakened when no activity >> happens for a certain period of time - say, 10 cycles. I'm not at all sure >> that's better, but it would be less of a change to the existing behavior. > > Now we have them, latches seem the best approach because they (mostly) > avoid heuristics. That's my feeling as well. > This proposal works same or better for async transactions. Right. I would say probably better. The potential for a reduction in latency here is very appealing. > The only difference is how bulk write operations are handled. As long > as we wake WALWriter before wal_buffers fills then we'll be good. > Wakeup once per wal buffer is too much. I agree we should measure this > to check how frequently wakeups are required for bulk ops. Yeah. The trick is to get the wake-ups to be frequent enough without adding too much latency to the backends that have toperform them. Off-hand, I don't have a good feeling for how hard that will be. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Jul 15, 2011, at 8:55 AM, Simon Riggs <simon@2ndQuadrant.com> wrote: >> The only difference is how bulk write operations are handled. As long >> as we wake WALWriter before wal_buffers fills then we'll be good. >> Wakeup once per wal buffer is too much. I agree we should measure this >> to check how frequently wakeups are required for bulk ops. > Yeah. The trick is to get the wake-ups to be frequent enough without adding too much latency to the backends that haveto perform them. Off-hand, I don't have a good feeling for how hard that will be. I'd say send the signal when wal buffers are more than X% full (maybe half). The suggestion to send it when wrapping around at the end of the array is not quite right, because that's an arbitrary condition that's not related to how much work there is for the walwriter to do. It should be cheap to check for this while advancing to a new wal buffer. regards, tom lane
On Fri, Jul 15, 2011 at 6:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Jul 15, 2011, at 8:55 AM, Simon Riggs <simon@2ndQuadrant.com> wrote: >>> The only difference is how bulk write operations are handled. As long >>> as we wake WALWriter before wal_buffers fills then we'll be good. >>> Wakeup once per wal buffer is too much. I agree we should measure this >>> to check how frequently wakeups are required for bulk ops. > >> Yeah. The trick is to get the wake-ups to be frequent enough without >> adding too much latency to the backends that have to perform them. Off-hand, >> I don't have a good feeling for how hard that will be. > > I'd say send the signal when wal buffers are more than X% full (maybe > half). The suggestion to send it when wrapping around at the end of the > array is not quite right, because that's an arbitrary condition that's > not related to how much work there is for the walwriter to do. It > should be cheap to check for this while advancing to a new wal buffer. Yes, I was trying to go too simple. I think we need to put the calculation and SetLatch() after we release WALInsertLock, so as to avoid adding contention. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On Fri, Jul 15, 2011 at 6:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'd say send the signal when wal buffers are more than X% full (maybe >> half). �The suggestion to send it when wrapping around at the end of the >> array is not quite right, because that's an arbitrary condition that's >> not related to how much work there is for the walwriter to do. �It >> should be cheap to check for this while advancing to a new wal buffer. > I think we need to put the calculation and SetLatch() after we release > WALInsertLock, so as to avoid adding contention. Yeah, I agree with putting the actual SetLatch call after we release the lock ... but doesn't the calculation need to be done while we're still holding it? Otherwise it'd be using potentially-inconsistent values. regards, tom lane
This is a bit of a detour, but probably a useful one. Attached is a patch that replaces a tight PostmasterIsAlive() polling loop in the AV launcher with a latch, making use of the new WL_POSTMASTER_DEATH functionality. It's similar to what we've already done for the archiver. It is relatively straightforward as these auxiliary process polling loop elimination patches go (certainly compared to what we're contemplating with the WAL writer), but it raises some questions that we were lucky to have been able to avoid when I worked on the archiver. Obviously, this patch isn't finished. We register various generic signal handlers for the AVLauncher, including StatementCancelHandler(). Of course, signals that are handled generically have the same potential to invalidate WaitLatch() timeout as any other type of signal. We should be mindful of this. ISTM that these generic handlers ought to be handling this generically, and that there should be a Latch for just this purpose for each process within PGPROC. We already have this Latch in PGPROC: Latch waitLatch; /* allow us to wait for sync rep */ Maybe its purpose should be expanded to "current process Latch"? Another concern is, what happens when we receive a signal, generically handled or otherwise, and have to SetLatch() to avoid time-out invalidation? Should we just live with a spurious AutoVacLauncherMain() iteration, or should we do something like check if the return value of WaitLatch indicates that we woke up due to a SetLatch() call, which must have been within a singal handler, and that we should therefore goto just before WaitLatch() and elide the spurious iteration? Given that we can expect some signals to occur relatively frequently, spurious iterations could be a real concern. Incidentally, should I worry about the timeout long for WaitLatch() overflowing? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
On Sun, Jul 17, 2011 at 8:20 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > This is a bit of a detour, but probably a useful one. Attached is a > patch that replaces a tight PostmasterIsAlive() polling loop in the AV > launcher with a latch, making use of the new WL_POSTMASTER_DEATH > functionality. It's similar to what we've already done for the > archiver. It is relatively straightforward as these auxiliary process > polling loop elimination patches go (certainly compared to what we're > contemplating with the WAL writer), but it raises some questions that > we were lucky to have been able to avoid when I worked on the > archiver. Obviously, this patch isn't finished. Might be good to start a new thread for each auxilliary process, or we may get confused. > We register various generic signal handlers for the AVLauncher, > including StatementCancelHandler(). Of course, signals that are > handled generically have the same potential to invalidate WaitLatch() > timeout as any other type of signal. We should be mindful of this. Right. > ISTM that these generic handlers ought to be handling this > generically, and that there should be a Latch for just this purpose > for each process within PGPROC. We already have this Latch in PGPROC: > > Latch waitLatch; /* allow us to wait for sync rep */ > > Maybe its purpose should be expanded to "current process Latch"? I think that could be a really good idea, though I haven't looked at it closely enough to be sure. > Another concern is, what happens when we receive a signal, generically > handled or otherwise, and have to SetLatch() to avoid time-out > invalidation? Should we just live with a spurious > AutoVacLauncherMain() iteration, or should we do something like check > if the return value of WaitLatch indicates that we woke up due to a > SetLatch() call, which must have been within a singal handler, and > that we should therefore goto just before WaitLatch() and elide the > spurious iteration? Given that we can expect some signals to occur > relatively frequently, spurious iterations could be a real concern. Really? I suspect that it doesn't much matter exactly how many machine language instructions we execute on each wake-up, within reasonable bounds, of course. Maybe some testing is in order? On another note, I might be inclined to write something like: if ((return_value_of_waitlatch & WL_POSTMASTER_DEATH) && !PostmasterIsAlive()) proc_exit(1); ...so as to avoid calling that function unnecessarily on every iteration. > Incidentally, should I worry about the timeout long for WaitLatch() > overflowing? How would that happen? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Jul 17, 2011 at 10:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On Fri, Jul 15, 2011 at 6:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I'd say send the signal when wal buffers are more than X% full (maybe >>> half). The suggestion to send it when wrapping around at the end of the >>> array is not quite right, because that's an arbitrary condition that's >>> not related to how much work there is for the walwriter to do. It >>> should be cheap to check for this while advancing to a new wal buffer. > >> I think we need to put the calculation and SetLatch() after we release >> WALInsertLock, so as to avoid adding contention. > > Yeah, I agree with putting the actual SetLatch call after we release the > lock ... but doesn't the calculation need to be done while we're still > holding it? Otherwise it'd be using potentially-inconsistent values. The calculation is just a heurustic, so doesn't need to be exactly accurate. We need the latest values derived while holding the lock, but we don't need to ensure they change while we make the calc. The calc needs to say "if we are the ones who make the array more than half full, then wake up the WALWriter". It might already be awake, it might be already be more than half full or it might even be less than half full now - none of those cases are very important. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jul 18, 2011 at 1:48 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Might be good to start a new thread for each auxilliary process, or we > may get confused. +1 -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On Sun, Jul 17, 2011 at 10:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, I agree with putting the actual SetLatch call after we release the >> lock ... but doesn't the calculation need to be done while we're still >> holding it? �Otherwise it'd be using potentially-inconsistent values. > The calculation is just a heurustic, so doesn't need to be exactly > accurate. We need the latest values derived while holding the lock, > but we don't need to ensure they change while we make the calc. > The calc needs to say "if we are the ones who make the array more than > half full, then wake up the WALWriter". It might already be awake, it > might be already be more than half full or it might even be less than > half full now - none of those cases are very important. Nonetheless, I think it'll be better to make the calculation and set a local bool variable (to tell whether to do SetLatch later) while we're in the midst of the advance-to-next-WAL-buffer code. Even if it's true that we would generally get an OK answer by reading the variables again after releasing the lock, that's highly unlikely to be a performance win, because of cache line contention effects (ie, having to wrest the line back from the next guy doing WALInsert, and then give it back to him afterwards). Once you release the lock, you should stop touching shared memory, period. regards, tom lane