Thread: Uh, I change my mind about commit_delay + commit_siblings (sort of)
The attached very simple patch moves the commit_delay + commit_siblings sleep into XLogFlush, where the leader alone sleeps. This appears to be a much more effective site for a delay. Benchmark results, with and without a delay of 3000ms (commit_siblings is 0 in both cases) are available from: http://leadercommitdelay.staticloud.com The only way this differed from my usual setup for these benchmarks was that, as it happened, shared_buffers was set to 256MB, which would of course have affected wal_buffers actual value, which was 8MB. I surmise that the reason this works so well is that it increases the rate of batching at lower client counts, without needlessly delaying each and every follower beyond when their transaction is likely to have committed. A more detailed analysis will have to wait for tomorrow. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)
From
Heikki Linnakangas
Date:
On 29.05.2012 04:18, Peter Geoghegan wrote: > The attached very simple patch moves the commit_delay + > commit_siblings sleep into XLogFlush, where the leader alone sleeps. > This appears to be a much more effective site for a delay. > > Benchmark results, with and without a delay of 3000ms (commit_siblings > is 0 in both cases) are available from: > > http://leadercommitdelay.staticloud.com Can you elaborate how to read those results? What do "insert delay test" and "patch, commit_delay=0" mean? Which graph should I be looking at? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 29 May 2012 07:16, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 29.05.2012 04:18, Peter Geoghegan wrote: >> Benchmark results, with and without a delay of 3000ms (commit_siblings >> is 0 in both cases) are available from: >> >> http://leadercommitdelay.staticloud.com Sorry, I do of course mean a commit_delay of 3000 (microseconds, not milliseconds). > Can you elaborate how to read those results? What do "insert delay test" and > "patch, commit_delay=0" mean? Which graph should I be looking at? It's very simple. It's an insert.sql based workload, for the patch, with and without commit_delay set to 3000 (and a commit_delay of 0 makes the commit_delay code inert, just as before). Setting commit_delay appears to be far more effective with the patch. Here is a new benchmark: http://leadermastercommitdelay.staticloud.com/ This is more or less my usual group commit benchmark - median of 3 runs of insert.sql across a wide range of client counts. The pgbench-tools config is attached for your reference. On this occasion, I set shared_buffers to 1GB once more (and consequently, wal_buffers was 16MB). The green line should look familiar to you. With commit_delay = 0, it's equivalent to commit_delay=0 on master. In other words, it's very similar to a number of benchmarks that have already been performed. I believe that the numbers will line up pretty well with my original group commit benchmark from back in January, for example, as that was also performed on my Thinkpad. The effect that Jeff Janes observed on master is apparent here at low client counts - master with commit_delay = 3000 is almost as effective at 8 clients as it is for the patch. After that, the effect is almost entirely eliminated, which is consistent with my earlier observations about commit_delay post new group commit. There obviously appears to be a further significant increase in transaction throughput with this configuration and patch. Even the average latency of the patch with commit_delay=3000 is a bit better than either master with group_commit = 3000 or baseline (as I've said baseline could have equally well been on master or with master + the patch). I suppose my main beef with commit_delay before, apart from the simple fact that it wasn't actually useful to any significant degree, was that the code didn't usefully interact with the group commit improvements, and we didn't revisit commit_delay for 9.2 even in light of its assumptions not necessarily continuing to hold. Perhaps we should revisit commit_delay for 9.2 now - we never adequately answered the question of how useful it was in light of new group commit, so revisiting that question can be justified as closing an open item, rather than adding new functionality. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
On Mon, May 28, 2012 at 9:18 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > The attached very simple patch moves the commit_delay + > commit_siblings sleep into XLogFlush, where the leader alone sleeps. > This appears to be a much more effective site for a delay. This is a clever idea, but I think it needs some fine-tuning: as written, this will sleep for any flush, not just a flush of a commit record. One idea might be to add a flag to XLogFlush indicating whether a commit-delay sleep is permissible. Callers other than RecordTransactionCommit() could pass false; RecordTransactionCommit() could pass true. The comments need some updating, too. Like Heikki, I find the test results that you posted pretty hard to understand. This is sort of a general beef I have with pgbench-tools: there are no explanations anywhere about what the graphs actually mean. The first three graphs seem fairly useless, and the third one doesn't even appear to contain any data. The fourth and fifth graphs seem like the most useful part of the output, but the lack of an explanation of what's being graphed really hinders understanding. Apparently, the fourth graph is TPS vs. scale factor (at an unspecified number of clients) and the fifth graph is TPS vs. clients (at an unspecified scale factor). Maybe we're just averaging across the unspecified parameter in each case, but if so that doesn't seem very useful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 29 May 2012 17:10, Robert Haas <robertmhaas@gmail.com> wrote: > This is a clever idea, Thanks. > but I think it needs some fine-tuning: as > written, this will sleep for any flush, not just a flush of a commit > record. One idea might be to add a flag to XLogFlush indicating > whether a commit-delay sleep is permissible. Callers other than > RecordTransactionCommit() could pass false; RecordTransactionCommit() > could pass true. > > The comments need some updating, too. Uh, yeah, I posted that at 2am, which was about an hour after I initially had the idea. Attached patch revises the comments and documentation in a way that I think is appropriate, though it is still essentially the same patch. I did consider the fact that the delay might occur from one of a number of XLogFlush() callsites that did not previously delay. However, even if I do what you suggest, it is just dumb luck as to whether or not that makes any difference, since those other sites may well be exactly as delayed, since they're still going to have to wait behind the WALWriteLock, which the leader now holds while sleeping anyway. I imagined that the fact that those callsites were excluded before had something to do with ameliorating historic commit_delay problems, but those problems have already been significantly reduced. Why do you think that doing this for all XLogFlush() callsites might be problematic? > Like Heikki, I find the test results that you posted pretty hard to > understand. This is sort of a general beef I have with pgbench-tools: > there are no explanations anywhere about what the graphs actually > mean. The first three graphs seem fairly useless, and the third one > doesn't even appear to contain any data. The fourth and fifth graphs > seem like the most useful part of the output, but the lack of an > explanation of what's being graphed really hinders understanding. > Apparently, the fourth graph is TPS vs. scale factor (at an > unspecified number of clients) and the fifth graph is TPS vs. clients > (at an unspecified scale factor). Maybe we're just averaging across > the unspecified parameter in each case, but if so that doesn't seem > very useful. I'm sure that Greg will be happy to consider a github pull request to improve the tool in these areas. I myself am used to interpreting pgbench-tools output, but I suppose it is a little confusing in various ways. As you probably realise, I was directing your attention towards clients-set.png . I am in the habit of hacking pgbench-tools to output much bigger gnu-plot images. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
On Tue, May 29, 2012 at 12:47 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > Why do you think that doing this for all XLogFlush() callsites might > be problematic? Well, consider the one in the background writer, for example. That's just a periodic flush, so I see no benefit in having it acquire the lock and then wait some more. It already did wait. And what about the case where we're flushing while holding WALInsertLock because the buffer's full? Clearly waiting is useless in that case - nobody can join the group commit for exactly the same reason that we're doing the flush in the first place: no buffer space. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 29 May 2012 17:58, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, May 29, 2012 at 12:47 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >> Why do you think that doing this for all XLogFlush() callsites might >> be problematic? > > Well, consider the one in the background writer, for example. That's > just a periodic flush, so I see no benefit in having it acquire the > lock and then wait some more. It already did wait. No benefit for the BGWriter, no, but a benefit for the cluster as a whole, which can see not only an improvement in transaction throughput, but also of average and worst-case latency. The whole idea of a group commit feature is that one backend altruistically takes other backends with it. Besides, for the background writer or checkpointer, that operate in terms of minutes, seconds and milliseconds, having their transaction block an extra 2 milliseconds is hardly a real problem. Both the BGWriter and checkpointer do an amount of work that is adaptive per cycle anyway. > And what about the case where we're flushing while holding WALInsertLock because the > buffer's full? Clearly waiting is useless in that case - nobody can > join the group commit for exactly the same reason that we're doing the > flush in the first place: no buffer space. Maybe you could have the leader check that condition, and not wait accordingly. If the buffer is full, chances are very high that there already is a leader, either sleeping or following through with an XLogWrite(), whose work is well underway. The general solution that you've proposed - adding an actually_wait bool parameter to XLogFlush() - won't work, unless, for example, the "buffer is full" call happens to become the leader, which is generally quite unlikely. Otherwise it's stuck waiting behind WALWriteLock along with everyone else, on average for a duration of half (CommitDelay + avg flushtime), while the leader sleeps and then works. If something is only effective a small minority of the time, why bother? I doubt that you can reasonably have the BGWriter or whatever cut in line ahead of an ever-growing queue of backends calling XLogFlush(), if that's what you're thinking - now all those backends have to wait an additional period of up to however long the flush takes. In any case, the standard for changing the exact behaviour of commit_delay/commit_siblings ought to be quite low, because those settings are already so heavily laden with problems that it is somewhat doubtful that they've ever been used by someone in production sensibly. Obviously having a delay that turns out to be in some way suboptimal is obviously something I'd hope to avoid entirely, but I've done a good job of considerably improving that situation. If we get this into 9.2, perhaps a more adaptive implementation can follow in 9.3. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 29 May 2012 17:58, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, May 29, 2012 at 12:47 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >> Why do you think that doing this for all XLogFlush() callsites might >> be problematic? > > Well, consider the one in the background writer, for example. That's > just a periodic flush, so I see no benefit in having it acquire the > lock and then wait some more. It already did wait. And what about > the case where we're flushing while holding WALInsertLock because the > buffer's full? Clearly waiting is useless in that case - nobody can > join the group commit for exactly the same reason that we're doing the > flush in the first place: no buffer space. When I read this the first time, I was in full agreement. On closer inspection neither point is valid, though both points were worth considering. > Well, consider the one in the background writer, for example. That's > just a periodic flush, so I see no benefit in having it acquire the > lock and then wait some more. It already did wait. We use XLogBackgroundFlush() not XLogFlush() from background processes. > And what about > the case where we're flushing while holding WALInsertLock because the > buffer's full? Clearly waiting is useless in that case - nobody can > join the group commit for exactly the same reason that we're doing the > flush in the first place: no buffer space. We don't flush WAL in that case, we just write it. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, May 30, 2012 at 4:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > When I read this the first time, I was in full agreement. > > On closer inspection neither point is valid, though both points were > worth considering. > >> Well, consider the one in the background writer, for example. That's >> just a periodic flush, so I see no benefit in having it acquire the >> lock and then wait some more. It already did wait. > > We use XLogBackgroundFlush() not XLogFlush() from background processes. > >> And what about >> the case where we're flushing while holding WALInsertLock because the >> buffer's full? Clearly waiting is useless in that case - nobody can >> join the group commit for exactly the same reason that we're doing the >> flush in the first place: no buffer space. > > We don't flush WAL in that case, we just write it. OK, but there are a lot of places where we call XLogFlush(), and it's far from obvious that it's a win to do this in all of those cases. At least, nobody's done that analysis. XLogFlush is called from: - WriteTruncateXlogRec(), which is called from TruncateCLOG() - SlruPhysicalWritePage() - EndPrepare() - RecordTransactionCommitPrepared() - RecordTransactionCommit() - xact_redo_commit_internal() - CreateCheckPoint() - RelationTruncate() - FlushBuffer() - write_relmap_file() Most of those actually do look like reasonable places to try to get grouped flushing behavior, but: 1. It seems wrong to do it in xact_redo_commit_internal(). It won't matter if commit_siblings>0 since there won't be any other backends with transaction IDs anyway, but if commit_siblings==0 then we'll sleep for no possible benefit. 2. Doing it in FlushBuffer() seems slightly iffy since we might be sitting on a buffer lock. But maybe it's a win anyway, or just not worth worrying about. Another thing to think about is that if we do this across the board rather than just for commits, then commit_delay and commit_siblings will really be totally misnamed - they really ought to be flush_delay and flush_siblings at that point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 30 May 2012 13:24, Robert Haas <robertmhaas@gmail.com> wrote: > Most of those actually do look like reasonable places to try to get > grouped flushing behavior, but: > > 1. It seems wrong to do it in xact_redo_commit_internal(). It won't > matter if commit_siblings>0 since there won't be any other backends > with transaction IDs anyway, but if commit_siblings==0 then we'll > sleep for no possible benefit. > > 2. Doing it in FlushBuffer() seems slightly iffy since we might be > sitting on a buffer lock. But maybe it's a win anyway, or just not > worth worrying about. Typical values of commit_delay are usually a fraction of spinning rust latency, so I don't think that FlushBuffer() has any business really caring. These problems seem rather minor compared to the existing problems with the settings. As I've already outlined, I doubt it's possible to really remove the delays here without an over-engineered solution. In short, I suspect it isn't worth it. We must trust DBAs to set commit_siblings appropriately if they've set commit_delay. > Another thing to think about is that if we do this across the board > rather than just for commits, then commit_delay and commit_siblings > will really be totally misnamed - they really ought to be flush_delay > and flush_siblings at that point. Seems reasonable. It would also have the advantage of avoiding having the new implementation tarred with the same brush as commit_delay. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 30 May 2012 13:24, Robert Haas <robertmhaas@gmail.com> wrote: > OK, but there are a lot of places where we call XLogFlush(), and it's > far from obvious that it's a win to do this in all of those cases. At > least, nobody's done that analysis. XLogFlush is called from: > > - WriteTruncateXlogRec(), which is called from TruncateCLOG() > - SlruPhysicalWritePage() > - EndPrepare() > - RecordTransactionCommitPrepared() > - RecordTransactionCommit() > - xact_redo_commit_internal() > - CreateCheckPoint() > - RelationTruncate() > - FlushBuffer() > - write_relmap_file() > > Most of those actually do look like reasonable places to try to get > grouped flushing behavior, but: > > 1. It seems wrong to do it in xact_redo_commit_internal(). It won't > matter if commit_siblings>0 since there won't be any other backends > with transaction IDs anyway, but if commit_siblings==0 then we'll > sleep for no possible benefit. Agreed > 2. Doing it in FlushBuffer() seems slightly iffy since we might be > sitting on a buffer lock. But maybe it's a win anyway, or just not > worth worrying about. Agreed. The remaining cases aren't worth worrying about, apart from SlruPhysicalWritePage() which happens during visibility checks and needs to happen as quickly as possible also. I would say the additional contention from waiting outweighs the benefit of the wait in those 3 places, so skipping the wait is wise. Should be implemented as #define XLogFlush(lsn) XLogFlushInternal(lsn, true) #define XLogFlushNoWait(lsn) XLogFlushInternal(lsn, false) so we can drop the no wait version in without touching the other call points > Another thing to think about is that if we do this across the board > rather than just for commits, then commit_delay and commit_siblings > will really be totally misnamed - they really ought to be flush_delay > and flush_siblings at that point. I think if we were to rename the parameters, then they should be called group_commit_delay group_commit_siblings Since "Group Commit" is the accepted term for this behaviour, even though, as you point out that the behaviour isn't just about commit. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 30 May 2012 15:25, Simon Riggs <simon@2ndquadrant.com> wrote: >> 1. It seems wrong to do it in xact_redo_commit_internal(). It won't >> matter if commit_siblings>0 since there won't be any other backends >> with transaction IDs anyway, but if commit_siblings==0 then we'll >> sleep for no possible benefit. > > Agreed > >> 2. Doing it in FlushBuffer() seems slightly iffy since we might be >> sitting on a buffer lock. But maybe it's a win anyway, or just not >> worth worrying about. > > Agreed. As I've pointed out, we cannot meaningfully skip the wait for auxiliary processes alone (except perhaps by having commit_siblings set sufficiently high). > The remaining cases aren't worth worrying about, apart from > SlruPhysicalWritePage() which happens during visibility checks and > needs to happen as quickly as possible also. I'm not so sure. It says in that function: /* * We must determine the largest async-commit LSN for the page. This * is a bit tedious, but since this entirefunction is a slow path * anyway, it seems better to do this here than to maintain a per-page * LSN variable(which'd need an extra comparison in the * transaction-commit path). */ > I would say the additional contention from waiting outweighs the > benefit of the wait in those 3 places, so skipping the wait is wise. MinimumActiveBackends() reports the "count backends (other than myself) that are in active transactions", so unnecessary calls will have to occur when we have active transactions >= CommitSiblings, not connections >= CommitSiblings as was previously the case. What if we were to skip the wait during recovery only, by specially setting CommitDelay to 0 in the start-up process? Would that satisfy everyone's concerns about unhelpful delays? I'm not sure how this might interact with hot standby. The only concrete way I can see of avoiding a possibly useless wait (setting commit_siblings prudently alone is a very big help) would be semantics that are not generally acceptable for XLogFlush(): a way of calling XLogFlush() such that we either become the leader and don't wait OR join the queue and return immediately under the assumption that the leader will successfully flush up to our LSN. It is true that part of the premise of having a leader backend was that "XLogFlush is pretty much all critical section, so if it fails we're going to PANIC anyway". That said, at least some of the problematic XLogFlush() call sites named require that we verify that we've flushed up to their LSN before control returns to them. FlushBuffer() does for sure. > I think if we were to rename the parameters, then they should be called > > group_commit_delay > group_commit_siblings > > Since "Group Commit" is the accepted term for this behaviour, even > though, as you point out that the behaviour isn't just about commit. +1. I understand that that is the precise definition of group commit in Oracle, for example, where the feature is explicitly framed as a trade-off between throughput and latency (any of the other things that we might have called group commit at various times are not). The fact that group commit implies that non-commit flushes will also be batched seems rather academic, on reflection. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 30 May 2012 17:19, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 30 May 2012 15:25, Simon Riggs <simon@2ndquadrant.com> wrote: >>> 1. It seems wrong to do it in xact_redo_commit_internal(). It won't >>> matter if commit_siblings>0 since there won't be any other backends >>> with transaction IDs anyway, but if commit_siblings==0 then we'll >>> sleep for no possible benefit. >> >> Agreed I've looked at this more closely now and I can see that the call to XLogFlush() that is made from xact_redo_commit_internal() doesn't ever actually flush WAL, so whether we delay or not is completely irrelevant. So un-agreed. No change required to patch there. >>> 2. Doing it in FlushBuffer() seems slightly iffy since we might be >>> sitting on a buffer lock. But maybe it's a win anyway, or just not >>> worth worrying about. >> >> Agreed. > > As I've pointed out, we cannot meaningfully skip the wait for > auxiliary processes alone (except perhaps by having commit_siblings > set sufficiently high). > >> The remaining cases aren't worth worrying about, apart from >> SlruPhysicalWritePage() which happens during visibility checks and >> needs to happen as quickly as possible also. > > I'm not so sure. It says in that function: > > /* > * We must determine the largest async-commit LSN for the page. This > * is a bit tedious, but since this entire function is a slow path > * anyway, it seems better to do this here than to maintain a per-page > * LSN variable (which'd need an extra comparison in the > * transaction-commit path). > */ > >> I would say the additional contention from waiting outweighs the >> benefit of the wait in those 3 places, so skipping the wait is wise. > > MinimumActiveBackends() reports the "count backends (other than > myself) that are in active transactions", so unnecessary calls will > have to occur when we have active transactions >= CommitSiblings, not > connections >= CommitSiblings as was previously the case. > > What if we were to skip the wait during recovery only, by specially > setting CommitDelay to 0 in the start-up process? Would that satisfy > everyone's concerns about unhelpful delays? I'm not sure how this > might interact with hot standby. Hmm, that was a good idea, but as of my comments above, that isn't required or useful. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 31 May 2012 11:19, Simon Riggs <simon@2ndquadrant.com> wrote: > I've looked at this more closely now and I can see that the call to > XLogFlush() that is made from xact_redo_commit_internal() doesn't ever > actually flush WAL, so whether we delay or not is completely > irrelevant. > > So un-agreed. No change required to patch there. So, does that clear up the question of it being acceptable to add a delay to every existing XLogFlush() call site? I think so. Aside from the outstanding question of what to rename commit_delay/commit_siblings to, and how we might want to reframe those settings in the docs, I think that's everything. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Thu, May 31, 2012 at 6:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > I've looked at this more closely now and I can see that the call to > XLogFlush() that is made from xact_redo_commit_internal() doesn't ever > actually flush WAL, so whether we delay or not is completely > irrelevant. > > So un-agreed. No change required to patch there. I think Peter's suggestion of forcibly setting the delay to 0 in the startup process is a good one, though. It's one line of code, and if it isn't strictly necessary today, it still seems like good future-proofing. I am not very happy about the idea of renaming commit_* to group_commit_*. It's basically a cosmetic renaming, and breaking existing configuration files for cosmetic purposes does not seem warranted to me, especially when the old and new names are so close. I certainly don't think we can do that in 9.2, now that beta1 has already shipped. Modifying the default contents of postgresql.conf after we've shipped beta has been a historical no-no for reasons that escape me at the moment, but IIRC they're not stupid reasons. Frankly, I think this whole thing should be pushed to 9.3. The commit_delay and commit_siblings knobs suck, but they've sucked for a long time, and it won't kill anybody to wait another release cycle to fix them. We have plenty of more important things queued up for 9.3 already, and I don't believe there's any compelling reason to think that this particular thing needs preferential treatment. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 31 May 2012 13:16, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, May 31, 2012 at 6:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> I've looked at this more closely now and I can see that the call to >> XLogFlush() that is made from xact_redo_commit_internal() doesn't ever >> actually flush WAL, so whether we delay or not is completely >> irrelevant. >> >> So un-agreed. No change required to patch there. > > I think Peter's suggestion of forcibly setting the delay to 0 in the > startup process is a good one, though. It's one line of code, and if > it isn't strictly necessary today, it still seems like good > future-proofing. Adding a line that does nothing is not a good idea. The Startup process flushes very, very few WAL messages, so the setting is irrelevant. > I am not very happy about the idea of renaming commit_* to > group_commit_*. It's basically a cosmetic renaming, and breaking > existing configuration files for cosmetic purposes does not seem > warranted to me, especially when the old and new names are so close. > I certainly don't think we can do that in 9.2, now that beta1 has > already shipped. Modifying the default contents of postgresql.conf > after we've shipped beta has been a historical no-no for reasons that > escape me at the moment, but IIRC they're not stupid reasons. > > Frankly, I think this whole thing should be pushed to 9.3. The > commit_delay and commit_siblings knobs suck, but they've sucked for a > long time, and it won't kill anybody to wait another release cycle to > fix them. We have plenty of more important things queued up for 9.3 > already, and I don't believe there's any compelling reason to think > that this particular thing needs preferential treatment. No problem with pushing a variable rename through to 9.3. To be honest, I don't care whether we rename them or not. What matters is that we have a patch that provides a massive performance gain in write performance in just a few lines of code, and that should be committed to 9.2. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 31 May 2012 13:16, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, May 31, 2012 at 6:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Frankly, I think this whole thing should be pushed to 9.3. The > commit_delay and commit_siblings knobs suck, but they've sucked for a > long time, and it won't kill anybody to wait another release cycle to > fix them. We have plenty of more important things queued up for 9.3 > already, and I don't believe there's any compelling reason to think > that this particular thing needs preferential treatment. Why do you think that? Those knobs are now quite ineffective, though we never even considered that when the group commit delay patch was committed. The entire body of research and commentary that exists on commit_delay has been invalidated for 9.2. If that isn't something that needs to be addressed before release, I don't know what is. The fact that the patch can sometimes double transaction throughput for an absolutely trivial change, moving 2 lines of code, is also a good reason to not bump this for another year. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Thu, May 31, 2012 at 8:38 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 31 May 2012 13:16, Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, May 31, 2012 at 6:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Frankly, I think this whole thing should be pushed to 9.3. The >> commit_delay and commit_siblings knobs suck, but they've sucked for a >> long time, and it won't kill anybody to wait another release cycle to >> fix them. We have plenty of more important things queued up for 9.3 >> already, and I don't believe there's any compelling reason to think >> that this particular thing needs preferential treatment. > > Why do you think that? Those knobs are now quite ineffective, though > we never even considered that when the group commit delay patch was > committed. The entire body of research and commentary that exists on > commit_delay has been invalidated for 9.2. If that isn't something > that needs to be addressed before release, I don't know what is. The > fact that the patch can sometimes double transaction throughput for an > absolutely trivial change, moving 2 lines of code, is also a good > reason to not bump this for another year. Fixing regressions before release is essential; improving performance is not - especially when the improvement relates to a little-used feature that you were proposing to get rid of two weeks ago. It can't simultaneously be so unimportant that we should remove it altogether and so important that it's must-fix-before-release, and if one test can completely overturn your view of which category this falls into, that seems like a reason for taking some more time to think it over and, perhaps, run more tests. We don't have a lot of latitude to maneuver at this point - anything we do now is going to go straight out into the wild. Caution is appropriate. However, rather than arguing about it, let's see if anyone else has an opinion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Simon Riggs <simon@2ndQuadrant.com> writes: > On 31 May 2012 13:16, Robert Haas <robertmhaas@gmail.com> wrote: >> Frankly, I think this whole thing should be pushed to 9.3. > What matters is that we have a patch that provides a massive > performance gain in write performance in just a few lines of code, and > that should be committed to 9.2. I agree with Robert on this. This patch hasn't had *nearly* enough testing to justify cramming it into 9.2 at this point. AFAIK the claim of "massive performance gain" is based on a single test case run by a single person, which doesn't even give me any confidence that it doesn't break anything, much less that it's a win across the board. If we want to finish the beta cycle in a reasonable time period and get back to actual development, we have to refrain from adding more possibly-destabilizing development work to 9.2. And that is what this is. Add it to the upcoming CF, please. regards, tom lane
On 31 May 2012 15:00, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 31 May 2012 13:16, Robert Haas <robertmhaas@gmail.com> wrote: >>> Frankly, I think this whole thing should be pushed to 9.3. > >> What matters is that we have a patch that provides a massive >> performance gain in write performance in just a few lines of code, and >> that should be committed to 9.2. > > I agree with Robert on this. This patch hasn't had *nearly* enough > testing to justify cramming it into 9.2 at this point. AFAIK the > claim of "massive performance gain" is based on a single test case run > by a single person, which doesn't even give me any confidence that it > doesn't break anything, much less that it's a win across the board. I agree with you. You would be mistaken if you thought that I think Peter's laptop was sufficient proof for anyone to commit something and I've already said exactly that to him. My description of "massive performance gain" is appropriate based on the measurements so far. > If we want to finish the beta cycle in a reasonable time period and get > back to actual development, we have to refrain from adding more > possibly-destabilizing development work to 9.2. And that is what > this is. In what way is it possibly destabilising? I see nothing in the patch to merit that claim, so presumably you haven't read the patch yet? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On 31 May 2012 15:00, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If we want to finish the beta cycle in a reasonable time period and get >> back to actual development, we have to refrain from adding more >> possibly-destabilizing development work to 9.2. �And that is what >> this is. > In what way is it possibly destabilising? I'm prepared to believe that it only affects performance, but it could be destabilizing to that. It needs proper review and testing, and the next CF is the right environment for that to happen. regards, tom lane
On 31 May 2012 14:58, Robert Haas <robertmhaas@gmail.com> wrote: > Fixing regressions before release is essential; improving performance > is not - especially when the improvement relates to a little-used > feature that you were proposing to get rid of two weeks ago. Yes, the fact that I wanted to get rid of commit_delay is well established - I called for its deprecation in a dedicated thread, and during my talk at pgCon. Bruce's confusion as to how that interacted with what I've been calling "new group commit" was actually what crystallised my position here: it is trying, for the most part, to do the same thing as new group commit, but in an entirely orthogonal way. Bruce's confusion actually reflected the confusion of the code. So I'm in a sense removing the overlap between commit_delay used to do but now but shouldn't try to do anymore (make commits coincide, giving good benchmark results) and what new group commit now does, while preserving commit_delay's ability to trade off latency for throughput. I didn't have an answer to the question of how we might continue to offer a throughput/latency trade-off to users before, but knew that with 9.2, commit_delay was totally ineffective anyway. The realisation that it could be made effective by working with rather than against new group commit changed my mind. > It can't simultaneously be so unimportant that we should remove it altogether > and so important that it's must-fix-before-release, and if one test > can completely overturn your view of which category this falls into, > that seems like a reason for taking some more time to think it over > and, perhaps, run more tests. We don't have a lot of latitude to > maneuver at this point - anything we do now is going to go straight > out into the wild. Caution is appropriate. The patch can be justified as a way of removing the tension between new group commit and commit_delay. Removing commit_delay would also do this, but then there'd be no way to make the aforementioned trade-off that we previously offered. I suspect that if it restored the peaks and valleys of commit_delay's changes to throughput in 9.1, over and above a new group commit baseline, this would be more readily accepted. I hope the patch isn't being punished for being effective. Yes, it does offer a large boost to performance, but that happens to be incidental, unlikely though that sounds. You've called this a clever idea. I actually don't agree. I was fairly surprised that no one noticed this earlier. It is rather obviously the case that a delay that hopes to maximise the batching of commits at the expense of latency should occur only in a single leader backend that will proceed with the flush for the batch, and not within each and every backend as it commits. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 31 May 2012 16:23, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> In what way is it possibly destabilising? > > I'm prepared to believe that it only affects performance, but it could > be destabilizing to that. It needs proper review and testing, and the > next CF is the right environment for that to happen. It couldn't possibly be as destabilising to performance as commit_delay was in 9.1. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 31 May 2012 16:26, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 31 May 2012 16:23, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Simon Riggs <simon@2ndQuadrant.com> writes: >>> In what way is it possibly destabilising? >> >> I'm prepared to believe that it only affects performance, but it could >> be destabilizing to that. It needs proper review and testing, and the >> next CF is the right environment for that to happen. > > It couldn't possibly be as destabilising to performance as > commit_delay was in 9.1. Furthermore, it couldn't possibly affect performance in any way unless commit_delay is set. I've just moved the delay site so that its only executed by the group commit leader. The leader would execute the code anyway, but now the followers don't. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
This is a benchmark I performed on the same hardware, for tpc-b.sql, with a commit_delay of 0 and 4000 for the patch: http://tpcbdelay.staticloud.com/ There is a rather large improvement in throughput here. Robert previously complained that our group commit implementation didn't do very well on that benchmark. This seems to make all the difference. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services