Thread: Doc patch making firm recommendation for setting the value of commit_delay
Doc patch making firm recommendation for setting the value of commit_delay
From
Peter Geoghegan
Date:
Some of you will be aware that I've tried to formulate a good general recommendation for setting the value of commit_delay, in light of the improvements made in 9.3. I previously asked for help finding a methodology for optimising commit_delay [1]. The documentation related to commit_delay still says that we don't know what might work well, though I don't think that's true any more. I found the time to do some benchmarks of my own - Greg Smith made available a server that he frequently uses for benchmarks. It was previously my observation that "half of raw single-page sync time as reported by pg_test_fsync for you wal_sync_method" worked best for commit_delay. I went so far as to modify pg_test_fsync to output average time per op in microseconds for each operation with commit_delay in mind, which is a patch that has already been committed [2]. This general approach worked really well on my laptop, which has a slowish fsync of about 8 milliseconds (8,000 microseconds), for which a commit_delay setting of 4,000 (microseconds) seemed to clearly work best, on both insert and tpc-b benchmarks [3]. This server has an Intel 320 SSD series. Greg has already written quite a bit about this drive [4] for unrelated reasons. For the SSD, results of an insert-based pgbench-tools run are shown, with commit_delay at 0, 20 and 59 (determined by following my general advise above): http://commit-delay-results-ssd-insert.staticloud.com I also looked at a 3-disk RAID0 of 7200RPM drives connected through a 512MB battery-backed write cache (Areca controller), again using insert.sql: http://commit-delay-stripe-insert.staticloud.com In addition to a tpc-b benchmark with the same data directory on the same 3-disk stripe: http://commit-delay-results-stripe-tpcb.staticloud.com I used the same postgresql.conf in all cases, which you can see for each report, and did the usual median-of-three thing in all cases (though each run lasted 120 seconds in all cases, not the default 60 seconds). Settings used for one particular pgbench run can be viewed here (though they're all exactly the same anyway): http://commit-delay-results-ssd-insert.staticloud.com/19/pg_settings.txt Attached is a doc-patch that makes recommendations that are consistent with my observations about what works best here. I'd like to see us making *some* recommendation - for sympathetic cases, setting commit_delay appropriately can make a very large difference to transaction throughput. Such sympathetic cases - many small write transactions - are something that tends to be seen relatively frequently with web applications, that disproportionately use cloud hosting. It isn't at all uncommon for these cases to be highly bound by their commit rate, and so it is compelling to try to amortize the cost of a flush as effectively as possible there. It would be unfortunate if no one was even aware that commit_delay is now useful for these cases, since the setting allows cloud hosting providers to help these cases quite a bit, without having to do something like compromise durability, which in general isn't acceptable. The point of all these benchmarks isn't to show how effective commit_delay now is, or can be - we already had that discussion months ago, before the alteration to its behaviour was committed. The point is to put my proposed doc changes in the appropriate context, so that I can build confidence that they're balanced and helpful, by showing cases that are not so sympathetic. Thoughts? [1] http://archives.postgresql.org/pgsql-performance/2012-08/msg00003.php [2] http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=82e429794b348cd80c1d1b011e21ffac98bc6e88 [3] http://pgeoghegan.blogspot.com/2012/06/towards-14000-write-transactions-on-my.html [4] http://blog.2ndquadrant.com/intel_ssds_lifetime_and_the_32/ -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
Re: Doc patch making firm recommendation for setting the value of commit_delay
From
"David Rowley"
Date:
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Peter Geoghegan > Sent: 15 November 2012 09:44 > To: PG Hackers > Subject: [HACKERS] Doc patch making firm recommendation for setting the > value of commit_delay > > Some of you will be aware that I've tried to formulate a good general > recommendation for setting the value of commit_delay, in light of the > improvements made in 9.3. I previously asked for help finding a methodology > for optimising commit_delay [1]. The documentation related to > commit_delay still says that we don't know what might work well, though I > don't think that's true any more. > > I found the time to do some benchmarks of my own - Greg Smith made > available a server that he frequently uses for benchmarks. It was previously > my observation that "half of raw single-page sync time as reported by > pg_test_fsync for you wal_sync_method" worked best for commit_delay. I > went so far as to modify pg_test_fsync to output average time per op in > microseconds for each operation with commit_delay in mind, which is a patch > that has already been committed [2]. This general approach worked really > well on my laptop, which has a slowish fsync of about 8 milliseconds (8,000 > microseconds), for which a commit_delay setting of 4,000 (microseconds) > seemed to clearly work best, on both insert and tpc-b benchmarks [3]. <snip> > > Thoughts? > I think for sure, since the GUC maintained its original name, that the docs need to be updated to let people know the background behaviour has changed and may now be far more useful. I've read through the patch. Only thing I see out of place is a small typo: ! values of <varname>commit_siblings</varname> should be used is such cases, Should probably read ! values of <varname>commit_siblings</varname> should be used *in* such cases, Thanks for doing all the benchmarks too. Good to see such a range of different hardware tested. Regards David Rowley
Re: Doc patch making firm recommendation for setting the value of commit_delay
From
"Albe Laurenz"
Date:
Peter Geoghegan wrote: > Some of you will be aware that I've tried to formulate a good general > recommendation for setting the value of commit_delay, in light of the > improvements made in 9.3. I previously asked for help finding a > methodology for optimising commit_delay [1]. The documentation related > to commit_delay still says that we don't know what might work well, > though I don't think that's true any more. > > I found the time to do some benchmarks of my own - Greg Smith made > available a server that he frequently uses for benchmarks. It was > previously my observation that "half of raw single-page sync time as > reported by pg_test_fsync for you wal_sync_method" worked best for > commit_delay. I went so far as to modify pg_test_fsync to output > average time per op in microseconds for each operation with > commit_delay in mind, which is a patch that has already been committed > [2]. [...] > Attached is a doc-patch that makes recommendations that are consistent > with my observations about what works best here. I'd like to see us > making *some* recommendation - for sympathetic cases, setting > commit_delay appropriately can make a very large difference to > transaction throughput. Such sympathetic cases - many small write > transactions - are something that tends to be seen relatively > frequently with web applications, that disproportionately use cloud > hosting. It isn't at all uncommon for these cases to be highly bound > by their commit rate, and so it is compelling to try to amortize the > cost of a flush as effectively as possible there. It would be > unfortunate if no one was even aware that commit_delay is now useful > for these cases, since the setting allows cloud hosting providers to > help these cases quite a bit, without having to do something like > compromise durability, which in general isn't acceptable. > > The point of all these benchmarks isn't to show how effective > commit_delay now is, or can be - we already had that discussion months > ago, before the alteration to its behaviour was committed. The point > is to put my proposed doc changes in the appropriate context, so that > I can build confidence that they're balanced and helpful, by showing > cases that are not so sympathetic. > > Thoughts? If there is an agreement that half the sync time as reported by pg_test_fsync is a good value, would it make sense to have initdb test sync time and preset commit_delay? Yours, Laurenz Albe
Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Greg Smith
Date:
On 11/15/12 12:19 AM, Albe Laurenz wrote: > If there is an agreement that half the sync time as reported by > pg_test_fsync is a good value, would it make sense to have initdb test > sync time and preset commit_delay? Peter has validated this against a good range of systems, but it would be optimistic to say it's a universal idea. My main concern with this would be the relatively common practice of moving the pg_xlog directory after initdb time. Sometimes people don't know about the option to have initdb move it. Sometimes the drive to hold pg_xlog isn't available when the database is originally created yet. And the camp I fall into (which admittedly is the group who can take care of this on their own) will move pg_xlog manually and symlink it on their own, because that's what we're used to. I would rather see this just turn into one of the things a more general tuning tool knew how to do, executing against a fully setup system. Having a useful implementation of commit_delay and useful docs on it seems like enough of a jump forward for one release. Moving fully into auto-tuning before getting more field feedback on how that works out is pretty aggressive. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Craig Ringer
Date:
On 11/15/2012 04:56 PM, Greg Smith wrote: > > I would rather see this just turn into one of the things a more > general tuning tool knew how to do, executing against a fully setup > system. Having a useful implementation of commit_delay and useful docs > on it seems like enough of a jump forward for one release. Moving > fully into auto-tuning before getting more field feedback on how that > works out is pretty aggressive. > It'll also potentially make it harder to get reproducible results in benchmarking and testing across repeated runs, cause confusion when someone relocates a DB or changes hardware, and slow down initdb (and thus testing). I'd be all for making it part of a "test my hardware and tune my DB" tool, but not such a fan of doing it at initdb time. Making initdb less predictable and more complicated sounds like asking for trouble. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Magnus Hagander
Date:
On Thu, Nov 15, 2012 at 9:56 AM, Greg Smith <greg@2ndquadrant.com> wrote: > On 11/15/12 12:19 AM, Albe Laurenz wrote: >> >> If there is an agreement that half the sync time as reported by >> pg_test_fsync is a good value, would it make sense to have initdb test >> sync time and preset commit_delay? > > > Peter has validated this against a good range of systems, but it would be > optimistic to say it's a universal idea. > > My main concern with this would be the relatively common practice of moving > the pg_xlog directory after initdb time. Sometimes people don't know about > the option to have initdb move it. Sometimes the drive to hold pg_xlog > isn't available when the database is originally created yet. And the camp I > fall into (which admittedly is the group who can take care of this on their > own) will move pg_xlog manually and symlink it on their own, because that's > what we're used to. An even more common usecase for this, I think, is "I installed from a package that ran initdb for me".. I still think manual moving of pg_xlog is a lot more common than using the initdb option in general. > I would rather see this just turn into one of the things a more general > tuning tool knew how to do, executing against a fully setup system. Having a > useful implementation of commit_delay and useful docs on it seems like > enough of a jump forward for one release. Moving fully into auto-tuning > before getting more field feedback on how that works out is pretty > aggressive. +1. --Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Peter Geoghegan
Date:
On 15 November 2012 10:04, Magnus Hagander <magnus@hagander.net> wrote: >> I would rather see this just turn into one of the things a more general >> tuning tool knew how to do, executing against a fully setup system. Having a >> useful implementation of commit_delay and useful docs on it seems like >> enough of a jump forward for one release. Moving fully into auto-tuning >> before getting more field feedback on how that works out is pretty >> aggressive. > > +1. I am inclined to agree. I did attempt to use the instrumentation macros to have commit_delay set adaptively at runtime, which would have at least addressed this concern, but that just didn't work as well as this. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Peter Geoghegan
Date:
On 15 November 2012 08:56, Greg Smith <greg@2ndquadrant.com> wrote: > My main concern with this would be the relatively common practice of moving > the pg_xlog directory after initdb time. I probably should have increased the default number of seconds that pg_test_fsync runs for, in light of the fact that that can make an appreciable difference when following this method. I'd suggest that a value of 5 be used there. I think we should just change that, since everyone will just use the default anyway (or, the more cautious ones will use a higher value than the default, if anything). -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Stephen Frost
Date:
* Peter Geoghegan (peter@2ndquadrant.com) wrote: > I am inclined to agree. I did attempt to use the instrumentation > macros to have commit_delay set adaptively at runtime, which would > have at least addressed this concern, but that just didn't work as > well as this. Setting it at run-time was actually my first thought on this. I'm disappointed to hear that it didn't work out well. Thanks, Stephen
Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Peter Geoghegan
Date:
On 16 November 2012 01:20, Stephen Frost <sfrost@snowman.net> wrote: > Setting it at run-time was actually my first thought on this. I'm > disappointed to hear that it didn't work out well. I actually did quite a lot of research on it, that was interesting, but ultimately didn't lead to any breakthroughs. There was a couple of research papers written on this subject in the late 80s and early 90s that are cited in the transaction processing literature relatively frequently, but the models involved aren't terribly applicable to modern hardware. This is the main paper: http://www.hpl.hp.com/techreports/tandem/TR-88.1.pdf This later 1991 paper reads kind of like a critique of the 1988 paper: http://www.linux-mips.org/pub/linux/mips/people/macro/DEC/DTJ/DTJ107/DTJ107PF.PDF There doesn't appear to be much of anything in the public domain after this. I suppose that I could have spent more time pursuing this, but I very much had the sense of pursuing diminishing returns. If the whole idea of making the thing adaptive is to make it more accessible to users, well, they'll still have to turn it on themselves (presumably it will never be on by default), which is only slightly less effort than following my advice here. I suspect my tools - in particular, pgbench - just isn't sophisticated enough to evaluate the effectiveness of any particular model. I might have used something like Tsung, which modes workloads stochastically, instead. However, I just didn't have the time. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Peter Geoghegan
Date:
I've added this to the open commitfest. I think that a formal review is probably required here. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Noah Misch
Date:
On Wed, Nov 14, 2012 at 08:44:26PM +0000, Peter Geoghegan wrote: > http://commit-delay-results-ssd-insert.staticloud.com > http://commit-delay-stripe-insert.staticloud.com > http://commit-delay-results-stripe-tpcb.staticloud.com > http://commit-delay-results-ssd-insert.staticloud.com/19/pg_settings.txt staticloud.com seems to be gone. Would you repost these?
Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Peter Geoghegan
Date:
On 19 January 2013 20:38, Noah Misch <noah@leadboat.com> wrote: > staticloud.com seems to be gone. Would you repost these? I've pushed these to a git repo, hosted on github. https://github.com/petergeoghegan/commit_delay_benchmarks I'm sorry that I didn't take the time to make the html benchmarks easily viewable within a browser on this occasion. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Noah Misch
Date:
On Mon, Jan 21, 2013 at 12:23:21AM +0000, Peter Geoghegan wrote: > On 19 January 2013 20:38, Noah Misch <noah@leadboat.com> wrote: > > staticloud.com seems to be gone. Would you repost these? > > I've pushed these to a git repo, hosted on github. > > https://github.com/petergeoghegan/commit_delay_benchmarks > > I'm sorry that I didn't take the time to make the html benchmarks > easily viewable within a browser on this occasion. That's plenty convenient; thanks. What filesystem did you use for testing? Would you also provide /proc/cpuinfo or a rough description of the system's CPUs?
Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Peter Geoghegan
Date:
On 21 January 2013 13:10, Noah Misch <noah@leadboat.com> wrote: > What filesystem did you use for testing? Would you also provide /proc/cpuinfo > or a rough description of the system's CPUs? Unfortunately, I don't have access to that server at the moment. It's under Greg Smith's control. I believe you yourself had an account on this server at one point. I do know that the CPU is an Intel Core i7-870: http://ark.intel.com/products/41315/Intel-Core-i7-870-Processor-8M-Cache-2_93-GHz I am pretty sure that the filesystem that the various block devices were mounted with was ext4 (without LVM), but it might have been XFS. I don't recall. The operating system was Debian Lenny. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Noah Misch
Date:
Hi Peter, I took a look at this patch and the benchmarks you've furnished: https://github.com/petergeoghegan/commit_delay_benchmarks http://pgeoghegan.blogspot.com/2012/06/towards-14000-write-transactions-on-my.html On Wed, Nov 14, 2012 at 08:44:26PM +0000, Peter Geoghegan wrote: > Attached is a doc-patch that makes recommendations that are consistent > with my observations about what works best here. I'd like to see us > making *some* recommendation - for sympathetic cases, setting > commit_delay appropriately can make a very large difference to > transaction throughput. Such sympathetic cases - many small write > transactions - are something that tends to be seen relatively > frequently with web applications, that disproportionately use cloud > hosting. It isn't at all uncommon for these cases to be highly bound > by their commit rate, and so it is compelling to try to amortize the > cost of a flush as effectively as possible there. It would be > unfortunate if no one was even aware that commit_delay is now useful > for these cases, since the setting allows cloud hosting providers to > help these cases quite a bit, without having to do something like > compromise durability, which in general isn't acceptable. Your fast-fsync (SSD, BBWC) benchmarks show a small loss up to 8 clients and a 10-20% improvement at 32 clients. That's on a 4-core/8-thread CPU, assuming HT was left enabled. Your slow-fsync (laptop) benchmarks show a 40-100% improvement in the 16-64 client range. I did a few more benchmarks along the spectrum. First, I used a Mac, also 4-core/8-thread, with fsync_writethrough; half of fsync time gave commit_delay = 35000. I used pgbench, scale factor 100, 4-minute runs, three trials each: -clients- -tps@commit_delay=0- -tps@commit_delay=35000- 8 51,55,63 82,84,86 16 98,100,107 130,134,143 32 137,148,157 192,200,201 64 199,201,214 249,256,258 So that's a nice 27-53% improvement, fairly similar to the pattern for your laptop pgbench numbers. Next, based on your comment about the possible value for cloud-hosted applications, I tried a cc2.8xlarge (16 core, 32 thread), GNU/Linux EC2 instance with a data directory on a standard EBS volume, ext4 filesystem. Several 15s pg_test_fsync runs could not agree on an fsync time; I saw results from 694us to 1904us. Ultimately I settled on trying commit_delay=500, scale factor 300: -clients- -tps@commit_delay=0- -tps@commit_delay=500- 32 1224,1391,1584 1175,1229,1394 64 1553,1647,1673 1544,1546,1632 128 1717,1833,1900 1621,1720,1951 256 1664,1717,1918 1734,1832,1918 The numbers are all over the place, but there's more loss than gain. Amit Kapila also measured small losses in tps@-c16: http://www.postgresql.org/message-id/000701cd6ff0$013a6210$03af2630$@kapila@huawei.com I was curious about the cost of the MinimumActiveBackends() call when relying on commit_siblings to skip the delay. I ran a similar test with an extra 500 idle backends, clients=8, commit_siblings=20 (so the delay would never be used), and either a zero or nonzero commit_delay. There was no appreciable performance advantage from setting commit_delay=0 as opposed to relying on commit_siblings to suppress the delay. That's good news. On the GNU/Linux VM, pg_sleep() achieves precision on the order of 10us. However, the sleep was consistently around 70us longer than requested. A 300us request yielded a 370us sleep, and a 3000us request gave a 3080us sleep. Mac OS X was similarly precise for short sleeps, but it could oversleep a full 1000us on a 35000us sleep. > diff doc/src/sgml/wal.sgml > index fc5c3b2..92619dd > *** a/doc/src/sgml/wal.sgml > --- b/doc/src/sgml/wal.sgml > *************** > *** 375,382 **** > just before a synchronous commit attempts to flush > <acronym>WAL</acronym> to disk, in the hope that a single flush > executed by one such transaction can also serve other transactions > ! committing at about the same time. Setting <varname>commit_delay</varname> > ! can only help when there are many concurrently committing transactions. > </para> > > </sect1> > --- 375,397 ---- The beginning of this paragraph stills says "commit_delay causes a delay just before a synchronous commit attempts to flush WAL to disk". Since it now applies to every WAL flush, that should be updated. > *************** > *** 560,570 **** > is not enabled, or if fewer than <xref linkend="guc-commit-siblings"> > other sessions are currently in active transactions; this avoids > sleeping when it's unlikely that any other session will commit soon. > ! Note that on most platforms, the resolution of a sleep request is > ten milliseconds, so that any nonzero <varname>commit_delay</varname> > setting between 1 and 10000 microseconds would have the same effect. > ! Good values for these parameters are not yet clear; experimentation > ! is encouraged. > </para> > > <para> > --- 575,607 ---- There's a similar problem at the beginning of this paragraph; it says specifically, "The commit_delay parameter defines for how many microseconds the server process will sleep after writing a commit record to the log with LogInsert but before performing a LogFlush." > is not enabled, or if fewer than <xref linkend="guc-commit-siblings"> > other sessions are currently in active transactions; this avoids > sleeping when it's unlikely that any other session will commit soon. > ! Note that on some platforms, the resolution of a sleep request is > ten milliseconds, so that any nonzero <varname>commit_delay</varname> > setting between 1 and 10000 microseconds would have the same effect. > ! </para> As a side note, if we're ever going to recommend a fire-and-forget method for setting commit_delay, it may be worth detecting whether the host sleep granularity is limited like this. Setting commit_delay = 20 for your SSD and silently getting commit_delay = 10000 would make for an unpleasant surprise. > ! > ! <para> > ! Since the purpose of <varname>commit_delay</varname> is to allow > ! the cost of each flush operation to be more effectively amortized > ! across concurrently committing transactions (potentially at the > ! expense of transaction latency), it is necessary to quantify that > ! cost when altering the setting. The higher that cost is, the more > ! effective <varname>commit_delay</varname> is expected to be in > ! increasing transaction throughput. The That's true for spinning disks, but I suspect it does not hold for storage with internal parallelism, notably virtualized storage. Consider an iSCSI configuration with high bandwidth and high latency. When network latency is the limiting factor, will sending larger requests less often still help? > ! <xref linkend="pgtestfsync"> module can be used to measure the > ! average time in microseconds that a single WAL flush operation > ! takes. A value of half of the average time the module reports it > ! takes to flush after a single 8kB write operation is often the most > ! effective setting for <varname>commit_delay</varname>. The benefit > ! of tuning <varname>commit_delay</varname> can even be pronounced on > ! storage media with very fast sync times, such as solid-state drives > ! or RAID arrays with a battery-backed write cache. However, higher > ! values of <varname>commit_siblings</varname> should be used is such Typo: "is such" -> "in such" > ! cases, whereas smaller <varname>commit_siblings</varname> values > ! can be helpful on higher latency media. Note that it is quite > ! possible that a setting of <varname>commit_delay</varname> that is > ! too high can increase transaction latency by so much that total > ! transaction throughput suffers. > </para> One would be foolish to run a performance-sensitive workload like those in question, including the choice to have synchronous_commit=on, on spinning disks with no battery-backed write cache. A cloud environment is more credible, but my benchmark showed no gain there. Overall, I still won't personally recommend changing commit_delay without measuring the performance change for one's particular workload and storage environment. commit_delay can now bring some impressive gains in the right situations, but I doubt those are common enough for a fire-and-forget setting to do more good than harm. I suggest having the documentation recommend half of the fsync time as a starting point for benchmarking different commit_delay settings against your own workload. Indicate that it's more likely to help for direct use of spinning disks than for BBWC/solid state/virtualized storage. Not sure what else can be credibly given as general advice for PostgreSQL DBAs. Thanks, nm
Re: Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Peter Geoghegan
Date:
Hi Noah, On 27 January 2013 02:31, Noah Misch <noah@leadboat.com> wrote: > I did a few more benchmarks along the spectrum. > So that's a nice 27-53% improvement, fairly similar to the pattern for your > laptop pgbench numbers. I presume that this applies to a tpc-b benchmark (the pgbench default). Note that the really compelling numbers that I reported in that blog post (where there is an increase of over 80% in transaction throughput at lower client counts) occur with an insert-based benchmark (i.e. a maximally commit-bound workload). > Next, based on your comment about the possible value > for cloud-hosted applications > -clients- -tps@commit_delay=0- -tps@commit_delay=500- > 32 1224,1391,1584 1175,1229,1394 > 64 1553,1647,1673 1544,1546,1632 > 128 1717,1833,1900 1621,1720,1951 > 256 1664,1717,1918 1734,1832,1918 > > The numbers are all over the place, but there's more loss than gain. I suspected that the latency of cloud storage might be relatively poor. Since that is evidently not actually the case with Amazon EBS, it makes sense that commit_delay isn't compelling there. I am not disputing whether or not Amazon EBS should be considered representative of such systems in general - I'm sure that it should be. > There was no appreciable > performance advantage from setting commit_delay=0 as opposed to relying on > commit_siblings to suppress the delay. That's good news. Thank you for doing that research; I investigated that the fastpath in MinimumActiveBackends() works well myself, but it's useful to have my findings verified. > On the GNU/Linux VM, pg_sleep() achieves precision on the order of 10us. > However, the sleep was consistently around 70us longer than requested. A > 300us request yielded a 370us sleep, and a 3000us request gave a 3080us sleep. > Mac OS X was similarly precise for short sleeps, but it could oversleep a full > 1000us on a 35000us sleep. Ugh. > The beginning of this paragraph stills says "commit_delay causes a delay just > before a synchronous commit attempts to flush WAL to disk". Since it now > applies to every WAL flush, that should be updated. Agreed. > There's a similar problem at the beginning of this paragraph; it says > specifically, "The commit_delay parameter defines for how many microseconds > the server process will sleep after writing a commit record to the log with > LogInsert but before performing a LogFlush." Right. > As a side note, if we're ever going to recommend a fire-and-forget method for > setting commit_delay, it may be worth detecting whether the host sleep > granularity is limited like this. Setting commit_delay = 20 for your SSD and > silently getting commit_delay = 10000 would make for an unpleasant surprise. Yes, it would. Note on possible oversleeping added. >> ! <para> >> ! Since the purpose of <varname>commit_delay</varname> is to allow >> ! the cost of each flush operation to be more effectively amortized >> ! across concurrently committing transactions (potentially at the >> ! expense of transaction latency), it is necessary to quantify that >> ! cost when altering the setting. The higher that cost is, the more >> ! effective <varname>commit_delay</varname> is expected to be in >> ! increasing transaction throughput. The > > That's true for spinning disks, but I suspect it does not hold for storage > with internal parallelism, notably virtualized storage. Consider an iSCSI > configuration with high bandwidth and high latency. When network latency is > the limiting factor, will sending larger requests less often still help? Well, I don't like to speculate about things like that, because it's just too easy to be wrong. That said, it doesn't immediately occur to me why the statement that you've highlighted wouldn't be true of virtualised storage that has the characteristics you describe. Any kind of latency at flush time means that clients idle, which means that the CPU is potentially not kept fully busy for a greater amount of wall time, where it might otherwise be kept more busy. > One would be foolish to run a performance-sensitive workload like those in > question, including the choice to have synchronous_commit=on, on spinning > disks with no battery-backed write cache. A cloud environment is more > credible, but my benchmark showed no gain there. In an everyday sense you are correct. It would typically be fairly senseless to run an application that was severely limited by transaction throughput like this, when a battery-backed cache could be used at the cost of a couple of hundred dollars. However, it's quite possible to imagine a scenario in which the economics favoured using commit_delay instead. For example, I am aware that at Facebook, a similar Facebook-flavoured-MySQL setting (sync_binlog_timeout_usecs) is used. Furthermore, it might not be obvious that fsync speed is an issue in practice. Setting commit_delay to 4,000 has seemingly no downside on my laptop - it *positively* affects both average and worse-case transaction latency - so with spinning disks, it probably would actually be sensible to set it and forget it, regardless of workload. When Robert committed this feature, he added an additional check when WALWriteLock is acquired, that could see the lock acquired in a way that turned out to be needless, but also prevented a flush that was technically needless from the group commit leader/lock holder backend's own selfish perspective. I never got around to satisfying myself that that changed helped more than it hurt, if in fact it had any measurable impact either way. Perhaps I should. The benchmark that appears on my blog was actually produced with the slightly different, original version. > Overall, I still won't > personally recommend changing commit_delay without measuring the performance > change for one's particular workload and storage environment. commit_delay > can now bring some impressive gains in the right situations, but I doubt those > are common enough for a fire-and-forget setting to do more good than harm. I agree. > I suggest having the documentation recommend half of the fsync time as a > starting point for benchmarking different commit_delay settings against your > own workload. Indicate that it's more likely to help for direct use of > spinning disks than for BBWC/solid state/virtualized storage. Not sure what > else can be credibly given as general advice for PostgreSQL DBAs. That all seems reasonable. The really important thing is that we don't state that we don't have a clue what helps - that inspires no confidence, could turn someone off what would be a really useful feature for them and just isn't accurate. I also think it's important that we don't say "Setting commit_delay can only help when there are many concurrently committing transactions", because roughly the opposite is true. With many connections, there are already enough committing transactions to effectively amortize the cost of a flush, and commit_delay is then only very slightly helpful. Lower client counts are where commit_delay actually helps (at least with slow fsync times, which are the compelling case). I attach a revision that I think addresses your concerns. I've polished it a bit further too - in particular, my elaborations about commit_delay have been concentrated at the end of wal.sgml, where they belong. I've also removed the reference to XLogInsert, because, since all XLogFlush call sites are now covered by commit_delay, XLogInsert isn't particularly relevant. I have also increased the default time that pg_test_fsync runs - I think that the kind of variability commonly seen in its output, that you yourself have reported, justifies doing so in passing. -- Regards, Peter Geoghegan
Attachment
Re: Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Noah Misch
Date:
On Mon, Jan 28, 2013 at 12:16:24AM +0000, Peter Geoghegan wrote: > On 27 January 2013 02:31, Noah Misch <noah@leadboat.com> wrote: > > I did a few more benchmarks along the spectrum. > > > So that's a nice 27-53% improvement, fairly similar to the pattern for your > > laptop pgbench numbers. > > I presume that this applies to a tpc-b benchmark (the pgbench > default). Note that the really compelling numbers that I reported in > that blog post (where there is an increase of over 80% in transaction > throughput at lower client counts) occur with an insert-based > benchmark (i.e. a maximally commit-bound workload). Correct. The pgbench default workload is already rather friendly toward commit_delay, so I wanted to stay away from even-friendlier tests. Would you commit to the same git repository the pgbench-tools data for the graphs appearing in that blog post? I couldn't readily tell what was happening below 16 clients due to the graphed data points blending together. > >> ! <para> > >> ! Since the purpose of <varname>commit_delay</varname> is to allow > >> ! the cost of each flush operation to be more effectively amortized > >> ! across concurrently committing transactions (potentially at the > >> ! expense of transaction latency), it is necessary to quantify that > >> ! cost when altering the setting. The higher that cost is, the more > >> ! effective <varname>commit_delay</varname> is expected to be in > >> ! increasing transaction throughput. The > > > > That's true for spinning disks, but I suspect it does not hold for storage > > with internal parallelism, notably virtualized storage. Consider an iSCSI > > configuration with high bandwidth and high latency. When network latency is > > the limiting factor, will sending larger requests less often still help? > > Well, I don't like to speculate about things like that, because it's > just too easy to be wrong. That said, it doesn't immediately occur to > me why the statement that you've highlighted wouldn't be true of > virtualised storage that has the characteristics you describe. Any > kind of latency at flush time means that clients idle, which means > that the CPU is potentially not kept fully busy for a greater amount > of wall time, where it might otherwise be kept more busy. On further reflection, I retract the comment. Regardless of internal parallelism of the storage, PostgreSQL issues WAL fsyncs serially. > > One would be foolish to run a performance-sensitive workload like those in > > question, including the choice to have synchronous_commit=on, on spinning > > disks with no battery-backed write cache. A cloud environment is more > > credible, but my benchmark showed no gain there. > > In an everyday sense you are correct. It would typically be fairly > senseless to run an application that was severely limited by > transaction throughput like this, when a battery-backed cache could be > used at the cost of a couple of hundred dollars. However, it's quite > possible to imagine a scenario in which the economics favoured using > commit_delay instead. For example, I am aware that at Facebook, a > similar Facebook-flavoured-MySQL setting (sync_binlog_timeout_usecs) > is used. Furthermore, it might not be obvious that fsync speed is an > issue in practice. Setting commit_delay to 4,000 has seemingly no > downside on my laptop - it *positively* affects both average and > worse-case transaction latency - so with spinning disks, it probably > would actually be sensible to set it and forget it, regardless of > workload. I agree that commit_delay is looking like a safe bet for spinning disks. > I attach a revision that I think addresses your concerns. I've > polished it a bit further too - in particular, my elaborations about > commit_delay have been concentrated at the end of wal.sgml, where they > belong. I've also removed the reference to XLogInsert, because, since > all XLogFlush call sites are now covered by commit_delay, XLogInsert > isn't particularly relevant. I'm happy with this formulation. > I have also increased the default time that pg_test_fsync runs - I > think that the kind of variability commonly seen in its output, that > you yourself have reported, justifies doing so in passing. On the EBS configuration with volatile fsync timings, the variability didn't go away with 15s runs. On systems with stable fsync times, 15s was no better than 2s. Absent some particular reason to believe 5s is better than 2s, I would leave it alone. I'm marking this patch Ready for Committer, qualified with a recommendation to adopt only the wal.sgml changes. Thanks, nm
Re: Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Peter Geoghegan
Date:
On 28 January 2013 03:34, Noah Misch <noah@leadboat.com> wrote: > On the EBS configuration with volatile fsync timings, the variability didn't > go away with 15s runs. On systems with stable fsync times, 15s was no better > than 2s. Absent some particular reason to believe 5s is better than 2s, I > would leave it alone. I'm not recommending doing so because I thought you'd be likely to get better numbers on EBS; obviously the variability you saw there likely had a lot to do with the fact that the underlying physical machines have multiple tenants. It has just been my observation that more consistent figures can be obtained (on my laptop) by using a pg_test_fsync --secs-per-test of about 5. That being the case, why take the chance with 2 seconds? It isn't as if people run pg_test_fsync everyday, or that they cannot set --secs-per-test to whatever they like themselves. On the other hand, the cost of setting it too low could be quite high now, because the absolute values (and not just how different wal_sync_methods compare) is now important. -- Regards, Peter Geoghegan
Re: Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Peter Geoghegan
Date:
On 28 January 2013 03:34, Noah Misch <noah@leadboat.com> wrote: > Would you commit to the same git repository the pgbench-tools data for the > graphs appearing in that blog post? I couldn't readily tell what was > happening below 16 clients due to the graphed data points blending together. I'm afraid that I no longer have that data. Of course, I could fairly easily recreate it, but I don't think I'll have time tomorrow. Is it important? Are you interested in both the insert and tpc-b cases? -- Regards, Peter Geoghegan
Re: Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Noah Misch
Date:
On Mon, Jan 28, 2013 at 04:48:56AM +0000, Peter Geoghegan wrote: > On 28 January 2013 03:34, Noah Misch <noah@leadboat.com> wrote: > > Would you commit to the same git repository the pgbench-tools data for the > > graphs appearing in that blog post? I couldn't readily tell what was > > happening below 16 clients due to the graphed data points blending together. > > I'm afraid that I no longer have that data. Of course, I could fairly > easily recreate it, but I don't think I'll have time tomorrow. Is it > important? Are you interested in both the insert and tpc-b cases? No need, then.
Re: Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Noah Misch
Date:
On Mon, Jan 28, 2013 at 04:29:12AM +0000, Peter Geoghegan wrote: > On 28 January 2013 03:34, Noah Misch <noah@leadboat.com> wrote: > > On the EBS configuration with volatile fsync timings, the variability didn't > > go away with 15s runs. On systems with stable fsync times, 15s was no better > > than 2s. Absent some particular reason to believe 5s is better than 2s, I > > would leave it alone. > > I'm not recommending doing so because I thought you'd be likely to get > better numbers on EBS; obviously the variability you saw there likely > had a lot to do with the fact that the underlying physical machines > have multiple tenants. It has just been my observation that more > consistent figures can be obtained (on my laptop) by using a > pg_test_fsync --secs-per-test of about 5. That being the case, why > take the chance with 2 seconds? I can't get too excited about it either way. > It isn't as if people run > pg_test_fsync everyday, or that they cannot set --secs-per-test to > whatever they like themselves. On the other hand, the cost of setting > it too low could be quite high now, because the absolute values (and > not just how different wal_sync_methods compare) is now important. True. You'd actually want to run the tool with a short interval to select a wal_sync_method, then test the chosen method for a longer period to get an accurate reading for commit_delay.
Re: Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes: > I'm marking this patch Ready for Committer, qualified with a recommendation to > adopt only the wal.sgml changes. I've committed this along with some further wordsmithing. I kept Peter's change to pg_test_fsync's default -s value; I've always felt that 2 seconds was laughably small. It might be all right for very quick-and-dirty tests, but as a default value, it seems like a poor choice, because it's at the very bottom of the credible range of choices. regards, tom lane
Re: Re: Doc patch making firm recommendation for setting the value of commit_delay
From
Bruce Momjian
Date:
On Fri, Mar 15, 2013 at 05:47:30PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > I'm marking this patch Ready for Committer, qualified with a recommendation to > > adopt only the wal.sgml changes. > > I've committed this along with some further wordsmithing. I kept > Peter's change to pg_test_fsync's default -s value; I've always felt > that 2 seconds was laughably small. It might be all right for very > quick-and-dirty tests, but as a default value, it seems like a poor > choice, because it's at the very bottom of the credible range of > choices. Agreed, 2 seconds was at the bottom. The old behavior was very slow so I went low. Now that we are using it, 5 secs makes sense. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +