Thread: PostgreSQL settings for running on an SSD drive

PostgreSQL settings for running on an SSD drive

From

CS DBA

Date:

20 June 2013, 19:56:28

All;

I'm working with a client running PostgreSQL on a Fusion-IO drive.

They have a PostgreSQL setup guide from Fusion recommending the
following settings:
effective_io_concurrency=0
bgwriter_lru_maxpages=0
random_page_cost=0.1
sequential_page_cost=0.1 

These seem odd to me, effectively turning the background writer off,
plus setting both random_page_cost and sequential_page_cost to the same
(very low) value...

Thoughts?


Thanks in advance

Re: PostgreSQL settings for running on an SSD drive

From

Shaun Thomas

Date:

20 June 2013, 20:13:33

On 06/20/2013 02:56 PM, CS DBA wrote:

> They have a PostgreSQL setup guide from Fusion recommending the
> following settings:
> effective_io_concurrency=0
> bgwriter_lru_maxpages=0
> random_page_cost=0.1
> sequential_page_cost=0.1 

Well, since FusionIO drives have a limited write cycle (5PB?), I can
somewhat see why they would recommend turning off the background writer.
We were a bit more conservative in our settings, though:

seq_page_cost = 1.0      # Default
random_page_cost = 1.0   # Reduce to match seq_page_cost

Yep. That's it. Just the one setting. FusionIO drives are fast, but
they're not infinitely fast. My tests (and others) show they're about
1/2 the speed of memory, regarding IOPS. And while they can serve very
aggressive sequential reads, they're not orders of magnitude faster than
spindles in anything but IOPS.

Knowing that, we reduced random page fetches to be the same speed as
sequential page fetches. This has served our heavy OLTP system (and its
FusionIO underpinnings) very well so far.

But like I said, these are pretty conservative. I'd start at 1 and
reduce in 0.2 increments and run tests to see if there's a beneficial
change.

If it helps, here's our system stats, some only relevant during
financial hours:

* A billion queries per day
* Sustained 500+ write queries per second
* Average 7000-ish transactions per second.
* Average 35,000-ish queries per second.
* pg_xlog and pgdata on same FusionIO device
* 2 years, 3 months in operation
* 1.29PB written
* 1.75PB read

The load on our system right now is 3.7 on a 24 CPU box while serving
4100 TPS after active trading hours. The FusionIO drive is basically the
only reason we can do all of that without a lot of excessive contortions.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: PostgreSQL settings for running on an SSD drive

From

Josh Berkus

Date:

20 June 2013, 20:32:25

Folks,

First, cc'ing Greg Smith to see if he can address this with the Fusion
folks so that they stop giving out a bad guide.

On 06/20/2013 01:13 PM, Shaun Thomas wrote:
> On 06/20/2013 02:56 PM, CS DBA wrote:
>
>> They have a PostgreSQL setup guide from Fusion recommending the
>> following settings:
>> effective_io_concurrency=0
>> bgwriter_lru_maxpages=0
>> random_page_cost=0.1
>> sequential_page_cost=0.1 
>
> Well, since FusionIO drives have a limited write cycle (5PB?), I can
> somewhat see why they would recommend turning off the background writer.
> We were a bit more conservative in our settings, though:
>
> seq_page_cost = 1.0      # Default
> random_page_cost = 1.0   # Reduce to match seq_page_cost
>
> Yep. That's it. Just the one setting. FusionIO drives are fast, but
> they're not infinitely fast. My tests (and others) show they're about
> 1/2 the speed of memory, regarding IOPS. And while they can serve very
> aggressive sequential reads, they're not orders of magnitude faster than
> spindles in anything but IOPS.
>
> Knowing that, we reduced random page fetches to be the same speed as
> sequential page fetches. This has served our heavy OLTP system (and its
> FusionIO underpinnings) very well so far.

Did you compare setting RPC to 1.0 vs. setting it to 1.1, or something
else just slightly higher than SPC?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: PostgreSQL settings for running on an SSD drive

From

Shaun Thomas

Date:

20 June 2013, 21:23:24

On 06/20/2013 03:32 PM, Josh Berkus wrote:

> Did you compare setting RPC to 1.0 vs. setting it to 1.1, or something
> else just slightly higher than SPC?

Yes, actually. My favored setting when we were on 8.3 was 1.5. But
something with the planner changed pretty drastically when we went to
9.1, and we were getting some really bad query plans unless we
*strongly* suggested RPC was cheap. I was afraid I'd have to go lower,
but 1 seemed to do the trick.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: PostgreSQL settings for running on an SSD drive

From

Andrew Dunstan

Date:

20 June 2013, 21:33:09

On 06/20/2013 05:23 PM, Shaun Thomas wrote:
> On 06/20/2013 03:32 PM, Josh Berkus wrote:
>
>> Did you compare setting RPC to 1.0 vs. setting it to 1.1, or something
>> else just slightly higher than SPC?
>
> Yes, actually. My favored setting when we were on 8.3 was 1.5. But
> something with the planner changed pretty drastically when we went to
> 9.1, and we were getting some really bad query plans unless we
> *strongly* suggested RPC was cheap. I was afraid I'd have to go lower,
> but 1 seemed to do the trick.
>

That would be perverse, surely, but on Fusion-IO RPC = SPC seems to make
sense unless you assume that cache misses will be higher for random
reads than for sequential reads.

cheers

andrew

Re: PostgreSQL settings for running on an SSD drive

From

Greg Smith

Date:

18 July 2013, 02:04:43

On 6/20/13 4:32 PM, Josh Berkus wrote:
> First, cc'ing Greg Smith to see if he can address this with the Fusion
> folks so that they stop giving out a bad guide.

I'm working on a completely replacement of that guide, one that actually
gives out a full set of advice.  Right now I'm between their product
cycles, I'm expecting new hardware again here soon.

The main thing that no one has done a good guide to is how to reduce SSD
flash cell wear in a PostgreSQL database.  If there's anyone out there
who is burning through enough FusionIO cells at your site where a) you
care about wear, and b) you can spare a drive for testing at your site,
please contact me off list if you'd like to talk about that.  I have a
project aiming at increased lifetimes that's specific to FusionIO
hardware, and it could use more testers.  This involves a modified
PostgreSQL though.  It's not for the squeamish or for a production
system.  If I can't crash the kernel on the test server and that's fine,
move along, this will not help you for a while yet.

 >>> They have a PostgreSQL setup guide from Fusion recommending the
 >>> following settings:
 >>> effective_io_concurrency=0
 >>> bgwriter_lru_maxpages=0

I finally tracked down where this all came from.  As a general advisory
on what their current guide addresses, validation of its settings
included things like testing with pgbench.  It's a little known property
of the built-in pgbench test that the best TPS *throughput* numbers come
from turning the background writer off.  That's why their guide suggests
doing that.  The problem with that approach is that the background
writer is intended to improve *latency*, so measuring its impact on
throughput isn't the right approach.

Enabling or disabling the background writer doesn't have a large impact
on flash wear.  That wasn't why turning it off was recommended.  It came
out of the throughput improvement.

Similarly, effective_io_concurrency is untested by pgbench, its queries
aren't complicated enough.  I would consider both of these settings
worse than the defaults, and far from optimal for their hardware.

>>> random_page_cost=0.1
>>> sequential_page_cost=0.1 

As Shaun already commented on a bit--I agree with almost everything he
suggested--the situation with random vs. sequential I/O on this hardware
is not as simple as it's made out to be sometimes.  Random I/O certainly
is not the same speed as sequential even on their hardware.  Sequential
I/O is not always 10X as fast as traditional drives.  Using values
closer to 1.0 as he suggested is a lot more sensible to me.

I also don't think random_page_cost = seq_page_cost is the best setting,
even though it did work out for Shaun.  The hardware is fast enough that
you can make a lot of mistakes without really paying for them though,
and any query tuning like that is going to be workload dependent.  It's
impossible to make a single recommendation here.

>> FusionIO drives are fast, but
>> they're not infinitely fast. My tests (and others) show they're about
>> 1/2 the speed of memory, regarding IOPS.

This part needs a disclaimer on it.  Memory speed varies significantly
between servers.  The range just on new servers right now goes from
5GB/s to 40GB/s.  There are a good number of single and dual socket
Intel systems where "1/2 the speed of memory" is about right.  There are
systems where the ratio will be closer to 1:1 or 1:4 though.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Re: PostgreSQL settings for running on an SSD drive

From

Shaun Thomas

Date:

18 July 2013, 20:42:21

On 07/17/2013 09:04 PM, Greg Smith wrote:

> I'm working on a completely replacement of that guide, one that actually
> gives out a full set of advice.  Right now I'm between their product
> cycles, I'm expecting new hardware again here soon.

Me too. It's interesting that they seem to be focusing more on using the
cards as a caching layer instead of a directly readable device. I still
need to test that use case.

> This involves a modified PostgreSQL though.  It's not for the
> squeamish or for a production system.

I'd volunteer, but we're actually using EDB. Unless you can convince EDB
to supply similar binaries as you have, I can't get equivalent tests. :(

> I also don't think random_page_cost = seq_page_cost is the best setting,
> even though it did work out for Shaun.  The hardware is fast enough that
> you can make a lot of mistakes without really paying for them though,
> and any query tuning like that is going to be workload dependent.  It's
> impossible to make a single recommendation here.

Very, very true. I actually prefer using different values, and before
9.1, we had random at 1.5, and sequential at 1.0. Some of our query
plans were being adversely affected, and didn't go back to normal until
I reduced random cost to 1.0. I can't explain why that would happen, but
it's not too surprising given that we jumped from 8.2 to 9.1.

Since we're mainly focused on stability right now, getting the old
performance back was the main concern. I haven't revisited the setting
since that initial upgrade and correction, so it's hard to know what the
"right" setting really is. Like you said, there is a lot of room for
tuning based on system usage.

> There are a good number of single and dual socket Intel systems where
> "1/2 the speed of memory" is about right.  There are systems where
> the ratio will be closer to 1:1 or 1:4 though.

Doh, yeah. It also depends on the FusionIO generation and tier you're
working with. Some of their newer/bigger cards with more controller
chips can (purportedly) push upwards of 6GB/s, which is a tad faster
than the 800MB/s (measured) of our ancient gen-1 cards.

Too many variables. -_-

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email