Thread: dbt-2 tuning results with postgresql-8.3.5
Hi all, So after a long hiatus after running this OLTP workload at the OSDL, many of you know the community has had some equipment donated by HP: a DL380 G5 and an MSA70 disk array. We are currently using the hardware to do some tuning exercises to show the effects of various GUC parameters. I wanted to share what I've started with for input for what is realistic to tune an OLTP database on a single large LUN. The initial goal is to show how much can (or can't) be tuned on an OLTP type workload with just database and kernel parameters before physically partitioning the database. I hope this is actually a useful exercise (it was certainly helped get the kit updated a little bit.) To recap, dbt2 is a fair-use derivative of the TPC-C benchmark. We are using a 1000 warehouse database, which amounts to about 100GB of raw text data. The DL380 G5 is an 8 core Xeon E5405 with 32GB of memory. The MSA70 is a 25-disk 15,000 RPM SAS array, currently configured as a 25-disk RAID-0 array. More specific hardware details can be found here: http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide#Hardware_Details So first task is to show the confidence of the results, here are a link to a few repeated runs using all default GUC values except the number of connections (250): http://pugs.postgresql.org/node/502 Here are links to how the throughput changes when increasing shared_buffers: http://pugs.postgresql.org/node/505 And another series of tests to show how throughput changes when checkpoint_segments are increased: http://pugs.postgresql.org/node/503 The links go to a graphical summary and raw data. Note that the maximum theoretical throughput at this scale factor is approximately 12000 notpm. My first glance takes tells me that the system performance is quite erratic when increasing the shared_buffers. I'm also not what to gather from increasing the checkpoint_segments. Is it simply that the more checkpoint segments you have, the more time the database spends fsyncing when at a checkpoint? Moving forward, what other parameters (or combinations of) do people feel would be valuable to illustrate with this workload? Regards, Mark
"Mark Wong" <markwkm@gmail.com> writes: > To recap, dbt2 is a fair-use derivative of the TPC-C benchmark. We > are using a 1000 warehouse database, which amounts to about 100GB of > raw text data. Really? Do you get conforming results with 1,000 warehouses? What's the 95th percentile response time? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
On Dec 20, 2008, at 5:33 PM, Gregory Stark wrote: > "Mark Wong" <markwkm@gmail.com> writes: > >> To recap, dbt2 is a fair-use derivative of the TPC-C benchmark. We >> are using a 1000 warehouse database, which amounts to about 100GB of >> raw text data. > > Really? Do you get conforming results with 1,000 warehouses? What's > the 95th > percentile response time? No, the results are not conforming. You and others have pointed that out already. The 95th percentile response time are calculated on each page of the previous links. I find your questions a little odd for the input I'm asking for. Are you under the impression we are trying to publish benchmarking results? Perhaps this is a simple misunderstanding? Regards, Mark
Mark Wong <markwkm@gmail.com> writes: > On Dec 20, 2008, at 5:33 PM, Gregory Stark wrote: > >> "Mark Wong" <markwkm@gmail.com> writes: >> >>> To recap, dbt2 is a fair-use derivative of the TPC-C benchmark. We >>> are using a 1000 warehouse database, which amounts to about 100GB of >>> raw text data. >> >> Really? Do you get conforming results with 1,000 warehouses? What's the 95th >> percentile response time? > > No, the results are not conforming. You and others have pointed that out > already. The 95th percentile response time are calculated on each page of the > previous links. Where exactly? Maybe I'm blind but I don't see them. > > I find your questions a little odd for the input I'm asking for. Are you > under the impression we are trying to publish benchmarking results? Perhaps > this is a simple misunderstanding? Hm, perhaps. The "conventional" way to run TPC-C is to run it with larger and larger scale factors until you find out the largest scale factor you can get a conformant result at. In other words the scale factor is an output, not an input variable. You're using TPC-C just as an example workload and looking to see how to maximize the TPM for a given scale factor. I guess there's nothing wrong with that as long as everyone realizes it's not a TPC-C benchmark. Except that if the 95th percentile response times are well above a second I have to wonder whether the situation reflects an actual production OLTP system well. It implies there are so many concurrent sessions that any given query is being context switched out for seconds at a time. I have to imagine that a real production system would consider the system overloaded as soon as queries start taking significantly longer than they take on an unloaded system. People monitor the service wait times and queue depths for i/o systems closely and having several seconds of wait time is a highly abnormal situation. I'm not sure how bad that is for the benchmarks. The only effect that comes to mind is that it might exaggerate the effects of some i/o intensive operations that under normal conditions might not cause any noticeable impact like wal log file switches or even checkpoints. If you have a good i/o controller it might confuse your results a bit when you're comparing random and sequential i/o because the controller might be able to sort requests by physical position better than in a typical oltp environment where the wait queues are too short to effectively do that. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
On Sun, Dec 21, 2008 at 10:56 PM, Gregory Stark <stark@enterprisedb.com> wrote: > Mark Wong <markwkm@gmail.com> writes: > >> On Dec 20, 2008, at 5:33 PM, Gregory Stark wrote: >> >>> "Mark Wong" <markwkm@gmail.com> writes: >>> >>>> To recap, dbt2 is a fair-use derivative of the TPC-C benchmark. We >>>> are using a 1000 warehouse database, which amounts to about 100GB of >>>> raw text data. >>> >>> Really? Do you get conforming results with 1,000 warehouses? What's the 95th >>> percentile response time? >> >> No, the results are not conforming. You and others have pointed that out >> already. The 95th percentile response time are calculated on each page of the >> previous links. > > Where exactly? Maybe I'm blind but I don't see them. Here's an example: http://207.173.203.223/~markwkm/community6/dbt2/baseline.1000.1/report/ The links on the blog entries should be pointing to their respective reports. I spot checked a few and it seems I got some right. I probably didn't make it clear you needed to click on the results to see the reports. >> I find your questions a little odd for the input I'm asking for. Are you >> under the impression we are trying to publish benchmarking results? Perhaps >> this is a simple misunderstanding? > > Hm, perhaps. The "conventional" way to run TPC-C is to run it with larger and > larger scale factors until you find out the largest scale factor you can get a > conformant result at. In other words the scale factor is an output, not an > input variable. > > You're using TPC-C just as an example workload and looking to see how to > maximize the TPM for a given scale factor. I guess there's nothing wrong with > that as long as everyone realizes it's not a TPC-C benchmark. Perhaps, but we're not trying to run a TPC-C benchmark. We're trying to illustrate how performance changes with an understood OLTP workload. The purpose is to show how the system bahaves more so than what the maximum transactions are. We try to advertise the kit the and work for self learning, we never try to pass dbt-2 off as a benchmarking kit. > Except that if the 95th percentile response times are well above a second I > have to wonder whether the situation reflects an actual production OLTP system > well. It implies there are so many concurrent sessions that any given query is > being context switched out for seconds at a time. > > I have to imagine that a real production system would consider the system > overloaded as soon as queries start taking significantly longer than they take > on an unloaded system. People monitor the service wait times and queue depths > for i/o systems closely and having several seconds of wait time is a highly > abnormal situation. We attempt to illustrate the response times on the reports. For example, there is a histogram (drawn as a scatter plot) illustrating the number of transactions vs. the response time for each transaction. This is for the New Order transaction: http://207.173.203.223/~markwkm/community6/dbt2/baseline.1000.1/report/dist_n.png We also plot the response time for a transaction vs the elapsed time (also as a scatter plot). Again, this is for the New Order transaction: http://207.173.203.223/~markwkm/community6/dbt2/baseline.1000.1/report/rt_n.png > I'm not sure how bad that is for the benchmarks. The only effect that comes to > mind is that it might exaggerate the effects of some i/o intensive operations > that under normal conditions might not cause any noticeable impact like wal > log file switches or even checkpoints. I'm not sure I'm following. Is this something than can be shown by any stats collection or profiling? This vaguely reminds me of the the significant spikes in system time (and dips everywhere else) when the operating system is fsyncing during a checkpoint that we've always observed when running this in the past. > If you have a good i/o controller it might confuse your results a bit when > you're comparing random and sequential i/o because the controller might be > able to sort requests by physical position better than in a typical oltp > environment where the wait queues are too short to effectively do that. Thanks for the input. Regards, Mark
On Sat, 20 Dec 2008, Mark Wong wrote: > Here are links to how the throughput changes when increasing > shared_buffers: http://pugs.postgresql.org/node/505 My first glance > takes tells me that the system performance is quite erratic when > increasing the shared_buffers. If you smooth that curve out a bit, you have to throw out the 22528MB figure as meaningless--particularly since it's way too close to the cliff where performance dives hard. The sweet spot looks to me like 11264MB to 17408MB. I'd say 14336MB is the best performing setting that's in the middle of a stable area. > And another series of tests to show how throughput changes when > checkpoint_segments are increased: http://pugs.postgresql.org/node/503 > I'm also not what to gather from increasing the checkpoint_segments. What was shared_buffers set to here? Those two settings are not completely independent, for example at a tiny buffer size it's not as obvious there's a win in spreading the checkpoints out more. It's actually a 3-D graph, with shared_buffers and checkpoint_segments as two axes and the throughput as the Z value. Since that's quite time consuming to map out in its entirety, the way I'd suggest navigating the territory more efficiently is to ignore the defaults altogether. Start with a configuration that someone familiar with tuning the database would pick for this hardware: 8192MB for shared_buffers and 100 checkpoint segments would be a reasonable base point. Run the same tests you did here, but with the value you're not changing set to those much larger values rather than the database defaults, and then I think you'd end with something more interesting. Also, I think the checkpoint_segments values >500 are a bit much, given what level of recovery time would come with a crash at that setting. Smaller steps from a smaller range would be better there I think. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
"Mark Wong" <markwkm@gmail.com> writes: >> I'm not sure how bad that is for the benchmarks. The only effect that comes to >> mind is that it might exaggerate the effects of some i/o intensive operations >> that under normal conditions might not cause any noticeable impact like wal >> log file switches or even checkpoints. > > I'm not sure I'm following. All I'm saying is that the performance characteristics won't be the same when the service wait times are 1-10 seconds rather than the 20-30ms at which alarm bells would start to ring on a real production system. I'm not exactly sure what changes it might make though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
"Mark Wong" <markwkm@gmail.com> writes: > Thanks for the input. In a more constructive vein: 1) autovacuum doesn't seem to be properly tracked. It looks like you're just tracking the autovacuum process and not the actual vacuum subprocesses which it spawns. 2) The response time graphs would be more informative if you excluded the ramp-up portion of the test. As it is there are big spikes at the low end but it's not clear whether they're really part of the curve or due to ramp-up. This is especially visible in the stock-level graph where it throws off the whole y scale. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
>>> "Mark Wong" <markwkm@gmail.com> wrote: > The DL380 G5 is an 8 core Xeon E5405 with 32GB of > memory. The MSA70 is a 25-disk 15,000 RPM SAS array, currently > configured as a 25-disk RAID-0 array. > number of connections (250): > Moving forward, what other parameters (or combinations of) do people > feel would be valuable to illustrate with this workload? To configure PostgreSQL for OLTP on that hardware, I would strongly recommend the use of a connection pool which queues requests above some limit on concurrent queries. My guess is that you'll see best results with a limit somewhere aound 40, based on my tests indicating that performance drops off above (cpucount * 2) + spindlecount. I wouldn't consider tests of the other parameters as being very useful before tuning this. This is more or less equivalent to the "engines" configuration in Sybase, for example. -Kevin
On Mon, Dec 22, 2008 at 12:59 AM, Greg Smith <gsmith@gregsmith.com> wrote: > On Sat, 20 Dec 2008, Mark Wong wrote: > >> Here are links to how the throughput changes when increasing >> shared_buffers: http://pugs.postgresql.org/node/505 My first glance takes >> tells me that the system performance is quite erratic when increasing the >> shared_buffers. > > If you smooth that curve out a bit, you have to throw out the 22528MB figure > as meaningless--particularly since it's way too close to the cliff where > performance dives hard. The sweet spot looks to me like 11264MB to 17408MB. > I'd say 14336MB is the best performing setting that's in the middle of a > stable area. > >> And another series of tests to show how throughput changes when >> checkpoint_segments are increased: http://pugs.postgresql.org/node/503 I'm >> also not what to gather from increasing the checkpoint_segments. > > What was shared_buffers set to here? Those two settings are not completely > independent, for example at a tiny buffer size it's not as obvious there's a > win in spreading the checkpoints out more. It's actually a 3-D graph, with > shared_buffers and checkpoint_segments as two axes and the throughput as the > Z value. The shared_buffers are the default, 24MB. The database parameters are saved, probably unclearly, here's an example link: http://207.173.203.223/~markwkm/community6/dbt2/baseline.1000.1/db/param.out > Since that's quite time consuming to map out in its entirety, the way I'd > suggest navigating the territory more efficiently is to ignore the defaults > altogether. Start with a configuration that someone familiar with tuning > the database would pick for this hardware: 8192MB for shared_buffers and > 100 checkpoint segments would be a reasonable base point. Run the same > tests you did here, but with the value you're not changing set to those much > larger values rather than the database defaults, and then I think you'd end > with something more interesting. Also, I think the checkpoint_segments > values >500 are a bit much, given what level of recovery time would come > with a crash at that setting. Smaller steps from a smaller range would be > better there I think. I should probably run your pgtune script, huh? Regards, Mark
On Mon, Dec 22, 2008 at 2:56 AM, Gregory Stark <stark@enterprisedb.com> wrote: > "Mark Wong" <markwkm@gmail.com> writes: > >> Thanks for the input. > > In a more constructive vein: > > 1) autovacuum doesn't seem to be properly tracked. It looks like you're just > tracking the autovacuum process and not the actual vacuum subprocesses > which it spawns. Hrm, tracking just the launcher process certainly doesn't help. Are the spawned processed short lived? I take a snapshot of /proc/<pid>/io data every 60 seconds. The only thing I see named autovacuum is the launcher process. Or perhaps I can't read? Here is the raw data of the /proc/<pid>/io captures: http://207.173.203.223/~markwkm/community6/dbt2/baseline.1000.1/db/iopp.out > 2) The response time graphs would be more informative if you excluded the > ramp-up portion of the test. As it is there are big spikes at the low end > but it's not clear whether they're really part of the curve or due to > ramp-up. This is especially visible in the stock-level graph where it > throws off the whole y scale. Ok, we'll take note and see what we can do. Regards, Mark
On Mon, Dec 22, 2008 at 7:27 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >>>> "Mark Wong" <markwkm@gmail.com> wrote: > >> The DL380 G5 is an 8 core Xeon E5405 with 32GB of >> memory. The MSA70 is a 25-disk 15,000 RPM SAS array, currently >> configured as a 25-disk RAID-0 array. > >> number of connections (250): > >> Moving forward, what other parameters (or combinations of) do people >> feel would be valuable to illustrate with this workload? > > To configure PostgreSQL for OLTP on that hardware, I would strongly > recommend the use of a connection pool which queues requests above > some limit on concurrent queries. My guess is that you'll see best > results with a limit somewhere aound 40, based on my tests indicating > that performance drops off above (cpucount * 2) + spindlecount. Yeah, we are using a homegrown connection concentrator as part of the test kit, but it's not very intelligent. > I wouldn't consider tests of the other parameters as being very useful > before tuning this. This is more or less equivalent to the "engines" > configuration in Sybase, for example. Right, I have the database configured for 250 connections but I'm using 200 of them. I'm pretty sure for this scale factor 200 is more than enough. Nevertheless I should go through the exercise. Regards, Mark
On Mon, 22 Dec 2008, Mark Wong wrote: > The shared_buffers are the default, 24MB. The database parameters are > saved, probably unclearly, here's an example link: > > http://207.173.203.223/~markwkm/community6/dbt2/baseline.1000.1/db/param.out That's a bit painful to slog through to find what was changed from the defaults. How about saving the output from this query instead, or in addition to the version sorted by name: select name,setting,source,short_desc from pg_settings order by source,name; Makes it easier to ignore everything that isn't set. > I should probably run your pgtune script, huh? That's basically where the suggestions for center points I made came from. The only other thing that does that might be interesting to examine is that it bumps up checkpoint_completion_target to 0.9 once you've got a large number of checkpoint_segments. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Mark Wong escribió: > Hrm, tracking just the launcher process certainly doesn't help. Are > the spawned processed short lived? I take a snapshot of > /proc/<pid>/io data every 60 seconds. The worker processes can be short-lived, but if they are, obviously they are not vacuuming the large tables. If you want to track all autovacuum actions, change autovacuum_log_min_messages to 0. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hi Mark, Good to see you producing results again. On Sat, 2008-12-20 at 16:54 -0800, Mark Wong wrote: > Here are links to how the throughput changes when increasing shared_buffers: > > http://pugs.postgresql.org/node/505 Only starnge thing here is the result at 22528MB. It's the only normal one there. Seems to be a freeze occurring on most tests around the 30 minute mark, which delays many backends and reduces writes. Reduction in performance as shared_buffers increases looks normal. Increase wal_buffers, but look for something else as well. Try to get a backtrace from when the lock up happens. It may not be Postgres? > And another series of tests to show how throughput changes when > checkpoint_segments are increased: > > http://pugs.postgresql.org/node/503 > > The links go to a graphical summary and raw data. Note that the > maximum theoretical throughput at this scale factor is approximately > 12000 notpm. > > My first glance takes tells me that the system performance is quite > erratic when increasing the shared_buffers. I'm also not what to > gather from increasing the checkpoint_segments. Is it simply that the > more checkpoint segments you have, the more time the database spends > fsyncing when at a checkpoint? I would ignore the checkpoint_segment tests because you aren't using a realistic value of shared_buffers. I doubt any such effect is noticeable when you use a realistic value determined from set of tests 505. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Mon, Dec 22, 2008 at 7:27 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >>>> "Mark Wong" <markwkm@gmail.com> wrote: > >> The DL380 G5 is an 8 core Xeon E5405 with 32GB of >> memory. The MSA70 is a 25-disk 15,000 RPM SAS array, currently >> configured as a 25-disk RAID-0 array. > >> number of connections (250): > >> Moving forward, what other parameters (or combinations of) do people >> feel would be valuable to illustrate with this workload? > > To configure PostgreSQL for OLTP on that hardware, I would strongly > recommend the use of a connection pool which queues requests above > some limit on concurrent queries. My guess is that you'll see best > results with a limit somewhere aound 40, based on my tests indicating > that performance drops off above (cpucount * 2) + spindlecount. It appears to peak around 220 database connections: http://pugs.postgresql.org/node/514 Of course the system still isn't really tuned all that much... I wouldn't be surprised if the workload peaked at a different number of connections as it is tuned more. > I wouldn't consider tests of the other parameters as being very useful > before tuning this. This is more or less equivalent to the "engines" > configuration in Sybase, for example. Regards, Mark
>>> "Mark Wong" <markwkm@gmail.com> wrote: > It appears to peak around 220 database connections: > > http://pugs.postgresql.org/node/514 Interesting. What did you use for connection pooling? My tests have never stayed that flat as the connections in use climbed. I'm curious why we're seeing such different results. -Kevin
On Tue, Jan 13, 2009 at 7:40 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >>>> "Mark Wong" <markwkm@gmail.com> wrote: > >> It appears to peak around 220 database connections: >> >> http://pugs.postgresql.org/node/514 > > Interesting. What did you use for connection pooling? It's a fairly dumb but custom built C program for the test kit: http://git.postgresql.org/?p=~markwkm/dbt2.git;a=summary I think the bulk of the logic is in src/client.c, src/db_threadpool.c, and src/transaction_queue.c. > My tests have never stayed that flat as the connections in use > climbed. I'm curious why we're seeing such different results. I'm sure the difference in workloads makes a difference. Like you implied earlier, I think we have to figure out what works best in for our own workloads. Regards, Mark
On Mon, Dec 22, 2008 at 12:59 AM, Greg Smith <gsmith@gregsmith.com> wrote: > On Sat, 20 Dec 2008, Mark Wong wrote: > >> Here are links to how the throughput changes when increasing >> shared_buffers: http://pugs.postgresql.org/node/505 My first glance takes >> tells me that the system performance is quite erratic when increasing the >> shared_buffers. > > If you smooth that curve out a bit, you have to throw out the 22528MB figure > as meaningless--particularly since it's way too close to the cliff where > performance dives hard. The sweet spot looks to me like 11264MB to 17408MB. > I'd say 14336MB is the best performing setting that's in the middle of a > stable area. > >> And another series of tests to show how throughput changes when >> checkpoint_segments are increased: http://pugs.postgresql.org/node/503 I'm >> also not what to gather from increasing the checkpoint_segments. > > What was shared_buffers set to here? Those two settings are not completely > independent, for example at a tiny buffer size it's not as obvious there's a > win in spreading the checkpoints out more. It's actually a 3-D graph, with > shared_buffers and checkpoint_segments as two axes and the throughput as the > Z value. > > Since that's quite time consuming to map out in its entirety, the way I'd > suggest navigating the territory more efficiently is to ignore the defaults > altogether. Start with a configuration that someone familiar with tuning > the database would pick for this hardware: 8192MB for shared_buffers and > 100 checkpoint segments would be a reasonable base point. Run the same > tests you did here, but with the value you're not changing set to those much > larger values rather than the database defaults, and then I think you'd end > with something more interesting. Also, I think the checkpoint_segments > values >500 are a bit much, given what level of recovery time would come > with a crash at that setting. Smaller steps from a smaller range would be > better there I think. Sorry for the long delay. I have a trio of results (that I actually ran about four weeks ago) setting the shared_buffers to 7680MB (I don't know remember why it wasn't set to 8192MB :( ) and checkpoint_segments to 100: http://pugs.postgresql.org/node/517 I'm also capturing the PostgreSQL parameters as suggested so we can see what's set in the config file, default, command line etc. It's the "Settings" link in the "System Summary" section on the report web page. So about a 7% change for this particular workload: http://pugs.postgresql.org/node/502 We're re-running some filesystem tests for an upcoming conference, so we'll get back to it shortly... Regards, Mark
On Thu, 22 Jan 2009, Mark Wong wrote: > I'm also capturing the PostgreSQL parameters as suggested so we can > see what's set in the config file, default, command line etc. It's > the "Settings" link in the "System Summary" section on the report web > page. Those look good, much easier to pick out the stuff that's been customized. I note that the Linux "Settings" links seems to be broken though. To recap a summary here, what you had before were: shared_buffers=24MB checkpoint_segments=100 notpm=7527 shared_buffers=8192MB checkpoint_segments=3 notpm=7996 And the new spots show: shared_buffers=7680MB checkpoint_segments=100 notpm=9178 What's neat about your graphs now is that I think you can see the checkpoints happening in the response time graphs. For example, if you look at http://207.173.203.223/~markwkm/community6/dbt2/pgtune.1000.100.1/report/rt_d.png and you focus on what happens just before each 10 minute mark, I'm guessing that response time spike is the fsync phase at the end of the checkpoint. That's followed by a period where response time is really fast. That's because those writes are all pooling into the now cleared out Linux buffer cache, but pdflush isn't really being aggressive about writing them out yet. On your server that can absorb quite a few writes before clients start blocking on them, which is when response time climbs back up. A particularly interesting bit is to compare against the result with the peak notpm you had in your earlier tests, where shared_buffers=15360MB: http://207.173.203.223/~markwkm/community6/dbt2/shared_buffers/shared_buffers.15360MB/report/rt_d.png While the average speed was faster on that one, the worst-case response time was much worse. You can really see this by comparing the response time distribution. Big shared buffers but low checkpoint_segments: http://207.173.203.223/~markwkm/community6/dbt2/shared_buffers/shared_buffers.15360MB/report/dist_d.png Medium shared buffers and medium checkpoint_segments: http://207.173.203.223/~markwkm/community6/dbt2/pgtune.1000.100.1/report/dist_d.png The checkpoint spreading logic is making a lot more transactions suffer moderate write delays in order to get a big improvement in worst-case behavior. The next fine-tuning bit I'd normally apply in this situation is to see if increasing checkpoint_completion_target from the default (0.5) to 0.9 does anything to flatten out that response time graph. I've seen a modest increase in wal_buffers (from the default to, say, 1MB) help smooth out the rough spots too. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Thu, Jan 22, 2009 at 7:44 PM, Greg Smith <gsmith@gregsmith.com> wrote: > On Thu, 22 Jan 2009, Mark Wong wrote: > >> I'm also capturing the PostgreSQL parameters as suggested so we can >> see what's set in the config file, default, command line etc. It's >> the "Settings" link in the "System Summary" section on the report web >> page. > > Those look good, much easier to pick out the stuff that's been customized. I > note that the Linux "Settings" links seems to be broken though. Oh fudge, I think I see where my scripts are broken. We're running with a different Linux kernel now than before so I don't want to grab the parameters yet. I'll switch to the previous kernel to get the parameters after the current testing is done, and fix the scripts in the meantime. Regards, Mark
On Thu, Jan 22, 2009 at 10:10 PM, Mark Wong <markwkm@gmail.com> wrote: > On Thu, Jan 22, 2009 at 7:44 PM, Greg Smith <gsmith@gregsmith.com> wrote: >> On Thu, 22 Jan 2009, Mark Wong wrote: >> >>> I'm also capturing the PostgreSQL parameters as suggested so we can >>> see what's set in the config file, default, command line etc. It's >>> the "Settings" link in the "System Summary" section on the report web >>> page. >> >> Those look good, much easier to pick out the stuff that's been customized. I >> note that the Linux "Settings" links seems to be broken though. > > Oh fudge, I think I see where my scripts are broken. We're running > with a different Linux kernel now than before so I don't want to grab > the parameters yet. I'll switch to the previous kernel to get the > parameters after the current testing is done, and fix the scripts in > the meantime. Sorry for the continuing delays. I have to make more time to spend on this part. One of the problems is that my scripts are listing the OS of the main driver system, as opposed to the db system. Mean while, I've attached the sysctl output from kernel running on the database system, 2.6.27-gentoo-r2.. Regards, Mark
Attachment
On Thu, 5 Feb 2009, Mark Wong wrote: > One of the problems is that my scripts are listing the OS of the main > driver system, as opposed to the db system. That's not a fun problem to deal with. Last time I ran into it, I ended up writing a little PL/PerlU function that gathered all the kernel-level info I wanted, and then just pulled the info over by running that on the driver system. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Thu, Jan 22, 2009 at 7:44 PM, Greg Smith <gsmith@gregsmith.com> wrote: > The next fine-tuning bit I'd normally apply in this situation is to see if > increasing checkpoint_completion_target from the default (0.5) to 0.9 does > anything to flatten out that response time graph. I've seen a modest > increase in wal_buffers (from the default to, say, 1MB) help smooth out the > rough spots too. Hi all, After yet another delay, I have .6 to .9 (I forgot .5. :(): http://pugs.postgresql.org/node/526 I don't think the effects of the checkpoint_completion_target are significant, and I sort of feel it's because the entire database is on a single device. I've started doing some runs with the database log on a separate device, so I'll be trying some of these parameters again. Regards, Mark