Thread: best use of an EMC SAN

best use of an EMC SAN

From
Dave Cramer
Date:
Assuming we have 24 73G drives is it better to make one big metalun
and carve it up and let the SAN manage the where everything is, or is
it better to specify which spindles are where.

Currently we would require 3 separate disk arrays.

one for the main database, second one for WAL logs, third one we use
for the most active table.

Problem with dedicating the spindles to each array is that we end up
wasting space. Are the SAN's smart enough to do a better job if I
create one large metalun and cut it up ?

Dave

Re: best use of an EMC SAN

From
Dan Gorman
Date:
We do something similar here. We use Netapp and I carve one aggregate
per data volume. I generally keep the pg_xlog on the same "data" LUN,
but I don't mix other databases on the same aggregate.

In the NetApp world because they use RAID DP (dual parity) you have a
higher wastage of drives, however, you are guaranteed that an
erroneous query won't clobber the IO of another database.

In my experience, NetApp has utilities that set "IO priority" but
it's not granular enough as it's more like using "renice" in unix. It
doesn't really make that big of a difference.

My recommendation, each database gets it's own aggregate unless the
IO footprint is very low.

Let me know if you need more details.

Regards,
Dan Gorman

On Jul 11, 2007, at 6:03 AM, Dave Cramer wrote:

> Assuming we have 24 73G drives is it better to make one big metalun
> and carve it up and let the SAN manage the where everything is, or
> is it better to specify which spindles are where.
>
> Currently we would require 3 separate disk arrays.
>
> one for the main database, second one for WAL logs, third one we
> use for the most active table.
>
> Problem with dedicating the spindles to each array is that we end
> up wasting space. Are the SAN's smart enough to do a better job if
> I create one large metalun and cut it up ?
>
> Dave
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org



Re: best use of an EMC SAN

From
Gregory Stark
Date:
"Dave Cramer" <pg@fastcrypt.com> writes:

> Assuming we have 24 73G drives is it better to make one big metalun and carve
> it up and let the SAN manage the where everything is, or is  it better to
> specify which spindles are where.

This is quite a controversial question with proponents of both strategies.

I would suggest having one RAID-1 array for the WAL and throw the rest of the
drives at a single big array for the data files. That wastes space since the
WAL isn't big but the benefit is big.

If you have a battery backed cache you might not need even that. Just throwing
them all into a big raid might work just as well.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: best use of an EMC SAN

From
Dave Cramer
Date:
On 11-Jul-07, at 10:05 AM, Gregory Stark wrote:

> "Dave Cramer" <pg@fastcrypt.com> writes:
>
>> Assuming we have 24 73G drives is it better to make one big
>> metalun and carve
>> it up and let the SAN manage the where everything is, or is  it
>> better to
>> specify which spindles are where.
>
> This is quite a controversial question with proponents of both
> strategies.
>
> I would suggest having one RAID-1 array for the WAL and throw the
> rest of the

This is quite unexpected. Since the WAL is primarily all writes,
isn't a RAID 1 the slowest of all for writing ?
> drives at a single big array for the data files. That wastes space
> since the
> WAL isn't big but the benefit is big.
>
> If you have a battery backed cache you might not need even that.
> Just throwing
> them all into a big raid might work just as well.
Any ideas on how to test this before we install the database ?
>
> --
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
>


Re: best use of an EMC SAN

From
Cott Lang
Date:
In my sporadic benchmark testing, the only consistent 'trick' I found
was that the best thing I could do for performance sequential
performance was  allocating a bunch of mirrored pair LUNs and stripe
them with software raid. This made a huge difference (~2X) in sequential
performance, and a little boost in random i/o - at least in FLARE 19.

On our CX-500s, FLARE failed to fully utilize the secondary drives in
RAID 1+0 configurations. FWIW, after several months of inquiries, EMC
eventually explained that this is due to their desire to ease the usage
and thus wear on the secondaries in order to reduce the likelihood of a
mirrored pair both failing.

We've never observed a difference using separate WAL LUNs - presumably
due to the write cache. That said, we continue to use them figuring it's
"cheap" insurance against running out of space as well as performance
under conditions we didn't see while testing.

We also ended up using single large LUNs for data, but I must admit I
wanted more time to benchmark splitting off heavily hit tables.

My advice would be to read the EMC performance white papers, remain
skeptical, and then test everything yourself. :D



On Wed, 2007-07-11 at 09:03 -0400, Dave Cramer wrote:
> Assuming we have 24 73G drives is it better to make one big metalun
> and carve it up and let the SAN manage the where everything is, or is
> it better to specify which spindles are where.
>
> Currently we would require 3 separate disk arrays.
>
> one for the main database, second one for WAL logs, third one we use
> for the most active table.
>
> Problem with dedicating the spindles to each array is that we end up
> wasting space. Are the SAN's smart enough to do a better job if I
> create one large metalun and cut it up ?
>
> Dave
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org


Re: best use of an EMC SAN

From
Andrew Sullivan
Date:
On Wed, Jul 11, 2007 at 09:03:27AM -0400, Dave Cramer wrote:
> Problem with dedicating the spindles to each array is that we end up
> wasting space. Are the SAN's smart enough to do a better job if I
> create one large metalun and cut it up ?

In my experience, this largely depends on your SAN and its hard- and
firm-ware, as well as its ability to interact with the OS.  I think
the best answer is "sometimes yes".

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
However important originality may be in some fields, restraint and
adherence to procedure emerge as the more significant virtues in a
great many others.   --Alain de Botton

Re: best use of an EMC SAN

From
Chris Browne
Date:
pg@fastcrypt.com (Dave Cramer) writes:
> On 11-Jul-07, at 10:05 AM, Gregory Stark wrote:
>
>> "Dave Cramer" <pg@fastcrypt.com> writes:
>>
>>> Assuming we have 24 73G drives is it better to make one big
>>> metalun and carve
>>> it up and let the SAN manage the where everything is, or is  it
>>> better to
>>> specify which spindles are where.
>>
>> This is quite a controversial question with proponents of both
>> strategies.
>>
>> I would suggest having one RAID-1 array for the WAL and throw the
>> rest of the
>
> This is quite unexpected. Since the WAL is primarily all writes,
> isn't a RAID 1 the slowest of all for writing ?

The thing is, the disk array caches this LIKE CRAZY.  I'm not quite
sure how many batteries are in there to back things up; there seems to
be multiple levels of such, which means that as far as fsync() is
concerned, the data is committed very quickly even if it takes a while
to physically hit disk.

One piece of the controversy will be that the disk being used for WAL
is certain to be written to as heavily and continuously as your heavy
load causes.  A fallout of this is that those disks are likely to be
worked harder than the disk used for storing "plain old data," with
the result that if you devote disk to WAL, you'll likely burn thru
replacement drives faster there than you do for the "POD" disk.

It is not certain whether it is more desirable to:
a) Spread that wear and tear across the whole array, or
b) Target certain disks for that wear and tear, and expect to need to
   replace them somewhat more frequently.

At some point, I'd like to do a test on a decent disk array where we
take multiple configurations.  Assuming 24 drives:

 - Use all 24 to make "one big filesystem" as the base case
 - Split off a set (6?) for WAL
 - Split off a set (6?  9?) to have a second tablespace, and shift
   indices there

My suspicion is that the "use all 24 for one big filesystem" scenario
is likely to be fastest by some small margin, and that the other cases
will lose a very little bit in comparison.  Andrew Sullivan had a
somewhat similar finding a few years ago on some old Solaris hardware
that unfortunately isn't at all relevant today.  He basically found
that moving WAL off to separate disk didn't affect performance
materially.

What's quite regrettable is that it is almost sure to be difficult to
construct a test that, on a well-appointed modern disk array, won't
basically stay in cache.
--
let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/nonrdbms.html
16-inch Rotary Debugger: A highly effective tool for locating problems
in  computer   software.   Available   for  delivery  in   most  major
metropolitan areas.  Anchovies contribute to poor coding style.

Re: best use of an EMC SAN

From
Jim Nasby
Date:
On Jul 11, 2007, at 12:39 PM, Chris Browne wrote:
>  - Split off a set (6?) for WAL

In my limited testing, 6 drives for WAL would be complete overkill in
almost any case. The only example I've ever seen where WAL was able
to swamp 2 drives was the DBT testing that Mark Wong was doing at
OSDL; the only reason that was the case is because he had somewhere
around 70 data drives. I suppose an entirely in-memory database might
be able to swamp a 2 drive WAL as well.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



Re: best use of an EMC SAN

From
Andrew Sullivan
Date:
On Wed, Jul 11, 2007 at 01:39:39PM -0400, Chris Browne wrote:
> load causes.  A fallout of this is that those disks are likely to be
> worked harder than the disk used for storing "plain old data," with
> the result that if you devote disk to WAL, you'll likely burn thru
> replacement drives faster there than you do for the "POD" disk.

This is true, and in operation can really burn you when you start to
blow out disks.  In particular, remember to factor the cost of RAID
re-build into your RAID plans.  Because you're going to be doing it,
and if your WAL is near to its I/O limits, the only way you're going
to get your redundancy back is to go noticably slower :-(

> will lose a very little bit in comparison.  Andrew Sullivan had a
> somewhat similar finding a few years ago on some old Solaris hardware
> that unfortunately isn't at all relevant today.  He basically found
> that moving WAL off to separate disk didn't affect performance
> materially.

Right, but it's not only the hardware that isn't relevant there.  It
was also using either 7.1 or 7.2, which means that the I/O pattern
was completely different.  More recently, ISTR, we did analysis for
at least one workload that tod us to use separate LUNs for WAL, with
separate I/O paths.  This was with at least one kind of array
supported by Awful Inda eXtreme.  Other tests, IIRC, came out
differently -- the experience with one largish EMC array was I think
a dead heat between various strategies (so the additional flexibility
of doing everything on the array was worth any cost we were able to
measure).  But the last time I had to be responsible for that sort of
test was again a couple years ago.  On the whole, though, my feeling
is that you can't make general recommendations on this topic: the
advances in storage are happening too fast to make generalisations,
particularly in the top classes of hardware.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
The plural of anecdote is not data.
        --Roger Brinner

Re: best use of an EMC SAN

From
Greg Smith
Date:
On Wed, 11 Jul 2007, Jim Nasby wrote:

> I suppose an entirely in-memory database might be able to swamp a 2
> drive WAL as well.

You can really generate a whole lot of WAL volume on an EMC SAN if you're
doing UPDATEs fast enough on data that is mostly in-memory.  Takes a
fairly specific type of application to do that though, and whether you'll
ever find it outside of a benchmark is hard to say.

The main thing I would add as a consideration here is that you can
configure PostgreSQL to write WAL data using the O_DIRECT path, bypassing
the OS buffer cache, and greatly improve performance into SAN-grade
hardware like this.  That can be a big win if you're doing writes that
dirty lots of WAL, and the benefit is straightforward to measure if the
WAL is a dedicated section of disk (just change the wal_sync_method and do
benchmarks with each setting).  If the WAL is just another section on an
array, how well those synchronous writes will mesh with the rest of the
activity on the system is not as straightforward to predict.  Having the
WAL split out provides a logical separation that makes figuring all this
out easier.

Just to throw out a slightly different spin on the suggestions going by
here:  consider keeping the WAL separate, starting as a RAID-1 volume, but
keep 2 disks in reserve so that you could easily upgrade to a 0+1 set if
you end up discovering you need to double the write bandwidth.  Since
there's never much actual data on the WAL disks that would a fairly short
downtime operation.  If you don't reach a wall, the extra drives might
serve as spares to help mitigate concerns about the WAL drives burning out
faster than average because of their high write volume.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: best use of an EMC SAN

From
Dave Cramer
Date:
On 11-Jul-07, at 2:35 PM, Greg Smith wrote:

> On Wed, 11 Jul 2007, Jim Nasby wrote:
>
>> I suppose an entirely in-memory database might be able to swamp a
>> 2 drive WAL as well.
>
> You can really generate a whole lot of WAL volume on an EMC SAN if
> you're doing UPDATEs fast enough on data that is mostly in-memory.
> Takes a fairly specific type of application to do that though, and
> whether you'll ever find it outside of a benchmark is hard to say.
>
Well, this is such an application. The db fits entirely in memory,
and the site is doing over 12M page views a day (I'm not exactly sure
what this translates to in transactions) .
> The main thing I would add as a consideration here is that you can
> configure PostgreSQL to write WAL data using the O_DIRECT path,
> bypassing the OS buffer cache, and greatly improve performance into
> SAN-grade hardware like this.  That can be a big win if you're
> doing writes that dirty lots of WAL, and the benefit is
> straightforward to measure if the WAL is a dedicated section of
> disk (just change the wal_sync_method and do benchmarks with each
> setting).  If the WAL is just another section on an array, how well
> those synchronous writes will mesh with the rest of the activity on
> the system is not as straightforward to predict.  Having the WAL
> split out provides a logical separation that makes figuring all
> this out easier.
>
> Just to throw out a slightly different spin on the suggestions
> going by here:  consider keeping the WAL separate, starting as a
> RAID-1 volume, but keep 2 disks in reserve so that you could easily
> upgrade to a 0+1 set if you end up discovering you need to double
> the write bandwidth.  Since there's never much actual data on the
> WAL disks that would a fairly short downtime operation.  If you
> don't reach a wall, the extra drives might serve as spares to help
> mitigate concerns about the WAL drives burning out faster than
> average because of their high write volume.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com
> Baltimore, MD
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq