Thread: PostgreSQL settings for running on an SSD drive
All; I'm working with a client running PostgreSQL on a Fusion-IO drive. They have a PostgreSQL setup guide from Fusion recommending the following settings: effective_io_concurrency=0 bgwriter_lru_maxpages=0 random_page_cost=0.1 sequential_page_cost=0.1 These seem odd to me, effectively turning the background writer off, plus setting both random_page_cost and sequential_page_cost to the same (very low) value... Thoughts? Thanks in advance
On 06/20/2013 02:56 PM, CS DBA wrote: > They have a PostgreSQL setup guide from Fusion recommending the > following settings: > effective_io_concurrency=0 > bgwriter_lru_maxpages=0 > random_page_cost=0.1 > sequential_page_cost=0.1 Well, since FusionIO drives have a limited write cycle (5PB?), I can somewhat see why they would recommend turning off the background writer. We were a bit more conservative in our settings, though: seq_page_cost = 1.0 # Default random_page_cost = 1.0 # Reduce to match seq_page_cost Yep. That's it. Just the one setting. FusionIO drives are fast, but they're not infinitely fast. My tests (and others) show they're about 1/2 the speed of memory, regarding IOPS. And while they can serve very aggressive sequential reads, they're not orders of magnitude faster than spindles in anything but IOPS. Knowing that, we reduced random page fetches to be the same speed as sequential page fetches. This has served our heavy OLTP system (and its FusionIO underpinnings) very well so far. But like I said, these are pretty conservative. I'd start at 1 and reduce in 0.2 increments and run tests to see if there's a beneficial change. If it helps, here's our system stats, some only relevant during financial hours: * A billion queries per day * Sustained 500+ write queries per second * Average 7000-ish transactions per second. * Average 35,000-ish queries per second. * pg_xlog and pgdata on same FusionIO device * 2 years, 3 months in operation * 1.29PB written * 1.75PB read The load on our system right now is 3.7 on a 24 CPU box while serving 4100 TPS after active trading hours. The FusionIO drive is basically the only reason we can do all of that without a lot of excessive contortions. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Folks, First, cc'ing Greg Smith to see if he can address this with the Fusion folks so that they stop giving out a bad guide. On 06/20/2013 01:13 PM, Shaun Thomas wrote: > On 06/20/2013 02:56 PM, CS DBA wrote: > >> They have a PostgreSQL setup guide from Fusion recommending the >> following settings: >> effective_io_concurrency=0 >> bgwriter_lru_maxpages=0 >> random_page_cost=0.1 >> sequential_page_cost=0.1 > > Well, since FusionIO drives have a limited write cycle (5PB?), I can > somewhat see why they would recommend turning off the background writer. > We were a bit more conservative in our settings, though: > > seq_page_cost = 1.0 # Default > random_page_cost = 1.0 # Reduce to match seq_page_cost > > Yep. That's it. Just the one setting. FusionIO drives are fast, but > they're not infinitely fast. My tests (and others) show they're about > 1/2 the speed of memory, regarding IOPS. And while they can serve very > aggressive sequential reads, they're not orders of magnitude faster than > spindles in anything but IOPS. > > Knowing that, we reduced random page fetches to be the same speed as > sequential page fetches. This has served our heavy OLTP system (and its > FusionIO underpinnings) very well so far. Did you compare setting RPC to 1.0 vs. setting it to 1.1, or something else just slightly higher than SPC? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 06/20/2013 03:32 PM, Josh Berkus wrote: > Did you compare setting RPC to 1.0 vs. setting it to 1.1, or something > else just slightly higher than SPC? Yes, actually. My favored setting when we were on 8.3 was 1.5. But something with the planner changed pretty drastically when we went to 9.1, and we were getting some really bad query plans unless we *strongly* suggested RPC was cheap. I was afraid I'd have to go lower, but 1 seemed to do the trick. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 06/20/2013 05:23 PM, Shaun Thomas wrote: > On 06/20/2013 03:32 PM, Josh Berkus wrote: > >> Did you compare setting RPC to 1.0 vs. setting it to 1.1, or something >> else just slightly higher than SPC? > > Yes, actually. My favored setting when we were on 8.3 was 1.5. But > something with the planner changed pretty drastically when we went to > 9.1, and we were getting some really bad query plans unless we > *strongly* suggested RPC was cheap. I was afraid I'd have to go lower, > but 1 seemed to do the trick. > That would be perverse, surely, but on Fusion-IO RPC = SPC seems to make sense unless you assume that cache misses will be higher for random reads than for sequential reads. cheers andrew
On 6/20/13 4:32 PM, Josh Berkus wrote: > First, cc'ing Greg Smith to see if he can address this with the Fusion > folks so that they stop giving out a bad guide. I'm working on a completely replacement of that guide, one that actually gives out a full set of advice. Right now I'm between their product cycles, I'm expecting new hardware again here soon. The main thing that no one has done a good guide to is how to reduce SSD flash cell wear in a PostgreSQL database. If there's anyone out there who is burning through enough FusionIO cells at your site where a) you care about wear, and b) you can spare a drive for testing at your site, please contact me off list if you'd like to talk about that. I have a project aiming at increased lifetimes that's specific to FusionIO hardware, and it could use more testers. This involves a modified PostgreSQL though. It's not for the squeamish or for a production system. If I can't crash the kernel on the test server and that's fine, move along, this will not help you for a while yet. >>> They have a PostgreSQL setup guide from Fusion recommending the >>> following settings: >>> effective_io_concurrency=0 >>> bgwriter_lru_maxpages=0 I finally tracked down where this all came from. As a general advisory on what their current guide addresses, validation of its settings included things like testing with pgbench. It's a little known property of the built-in pgbench test that the best TPS *throughput* numbers come from turning the background writer off. That's why their guide suggests doing that. The problem with that approach is that the background writer is intended to improve *latency*, so measuring its impact on throughput isn't the right approach. Enabling or disabling the background writer doesn't have a large impact on flash wear. That wasn't why turning it off was recommended. It came out of the throughput improvement. Similarly, effective_io_concurrency is untested by pgbench, its queries aren't complicated enough. I would consider both of these settings worse than the defaults, and far from optimal for their hardware. >>> random_page_cost=0.1 >>> sequential_page_cost=0.1 As Shaun already commented on a bit--I agree with almost everything he suggested--the situation with random vs. sequential I/O on this hardware is not as simple as it's made out to be sometimes. Random I/O certainly is not the same speed as sequential even on their hardware. Sequential I/O is not always 10X as fast as traditional drives. Using values closer to 1.0 as he suggested is a lot more sensible to me. I also don't think random_page_cost = seq_page_cost is the best setting, even though it did work out for Shaun. The hardware is fast enough that you can make a lot of mistakes without really paying for them though, and any query tuning like that is going to be workload dependent. It's impossible to make a single recommendation here. >> FusionIO drives are fast, but >> they're not infinitely fast. My tests (and others) show they're about >> 1/2 the speed of memory, regarding IOPS. This part needs a disclaimer on it. Memory speed varies significantly between servers. The range just on new servers right now goes from 5GB/s to 40GB/s. There are a good number of single and dual socket Intel systems where "1/2 the speed of memory" is about right. There are systems where the ratio will be closer to 1:1 or 1:4 though. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
On 07/17/2013 09:04 PM, Greg Smith wrote: > I'm working on a completely replacement of that guide, one that actually > gives out a full set of advice. Right now I'm between their product > cycles, I'm expecting new hardware again here soon. Me too. It's interesting that they seem to be focusing more on using the cards as a caching layer instead of a directly readable device. I still need to test that use case. > This involves a modified PostgreSQL though. It's not for the > squeamish or for a production system. I'd volunteer, but we're actually using EDB. Unless you can convince EDB to supply similar binaries as you have, I can't get equivalent tests. :( > I also don't think random_page_cost = seq_page_cost is the best setting, > even though it did work out for Shaun. The hardware is fast enough that > you can make a lot of mistakes without really paying for them though, > and any query tuning like that is going to be workload dependent. It's > impossible to make a single recommendation here. Very, very true. I actually prefer using different values, and before 9.1, we had random at 1.5, and sequential at 1.0. Some of our query plans were being adversely affected, and didn't go back to normal until I reduced random cost to 1.0. I can't explain why that would happen, but it's not too surprising given that we jumped from 8.2 to 9.1. Since we're mainly focused on stability right now, getting the old performance back was the main concern. I haven't revisited the setting since that initial upgrade and correction, so it's hard to know what the "right" setting really is. Like you said, there is a lot of room for tuning based on system usage. > There are a good number of single and dual socket Intel systems where > "1/2 the speed of memory" is about right. There are systems where > the ratio will be closer to 1:1 or 1:4 though. Doh, yeah. It also depends on the FusionIO generation and tier you're working with. Some of their newer/bigger cards with more controller chips can (purportedly) push upwards of 6GB/s, which is a tad faster than the 800MB/s (measured) of our ancient gen-1 cards. Too many variables. -_- -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email