Thread: Anyone using a SAN?

Anyone using a SAN?

From
"Peter Koczan"
Date:
Hi all,

We're considering setting up a SAN where I work. Is there anyone using
a SAN, for postgres or other purposes? If so I have a few questions
for you.

- Are there any vendors to avoid or ones that are particularly good?

- What performance or reliability implications exist when using SANs?

- Are there any killer features with SANs compared to local storage?

Any other comments are certainly welcome.

Peter

Re: Anyone using a SAN?

From
Kenneth Marshall
Date:
On Wed, Feb 13, 2008 at 10:56:54AM -0600, Peter Koczan wrote:
> Hi all,
>
> We're considering setting up a SAN where I work. Is there anyone using
> a SAN, for postgres or other purposes? If so I have a few questions
> for you.
>
> - Are there any vendors to avoid or ones that are particularly good?
>
> - What performance or reliability implications exist when using SANs?
>
> - Are there any killer features with SANs compared to local storage?
>
> Any other comments are certainly welcome.
>
> Peter
>

Peter,

The key is to understand your usage patterns, both I/O and query.
SANs can be easily bandwidth limited which can tank your database
performance. There have been several threads in the mailing list
about performance problems caused by the use of a SAN for storage.

Cheers,
Ken

Re: Anyone using a SAN?

From
"Alex Deucher"
Date:
On Feb 13, 2008 12:46 PM, Kenneth Marshall <ktm@rice.edu> wrote:
> On Wed, Feb 13, 2008 at 10:56:54AM -0600, Peter Koczan wrote:
> > Hi all,
> >
> > We're considering setting up a SAN where I work. Is there anyone using
> > a SAN, for postgres or other purposes? If so I have a few questions
> > for you.
> >
> > - Are there any vendors to avoid or ones that are particularly good?
> >
> > - What performance or reliability implications exist when using SANs?
> >
> > - Are there any killer features with SANs compared to local storage?
> >
> > Any other comments are certainly welcome.
> >
> > Peter
> >
>
> Peter,
>
> The key is to understand your usage patterns, both I/O and query.
> SANs can be easily bandwidth limited which can tank your database
> performance. There have been several threads in the mailing list
> about performance problems caused by the use of a SAN for storage.

It's critical that you set up the SAN with a database in mind
otherwise the performance will be bad.  I tested a DB on a SAN
designed to maximize storage space and performance was terrible.  I
never had the time or resources to reconfigure the SAN to test a more
suitable spindle setup since the SAN was in heavy production use for
file archiving.

Alex

Re: Anyone using a SAN?

From
Tobias Brox
Date:
[Peter Koczan - Wed at 10:56:54AM -0600]
> We're considering setting up a SAN where I work. Is there anyone using
> a SAN, for postgres or other purposes? If so I have a few questions
> for you.

Some time ago, my boss was planning to order more hardware - including a
SAN - and coincidentally, SANs were discussed at this list as well.
The consensus on this list seemed to be that running postgres on SAN is
not cost efficiently - one would get better performance for a lower cost
if the database host is connected directly to the disks - and also,
buying the wrong SAN can cause quite some problems.

My boss (with good help of the local SAN-pusher) considered that the
arguments against the SAN solution on this list was not really valid for
an "enterprise" user.  The SAN-pusher really insisted that through a
state-of-the-art SAN theoretically it should be possible to achieve far
better bandwidth as well as lower latency to the disks.  Personally, I
don't have the clue, but all my colleagues believes him, so I guess he
is right ;-)  What I'm told is that the state-of-the-art SAN allows for
an "insane amount" of hard disks to be installed, much more than what
would fit into any decent database server.  We've ended up buying a SAN,
the physical installation was done last week, and I will be able to tell
in some months if it was a good idea after all, or not.


Re: Anyone using a SAN?

From
Arjen van der Meijden
Date:
On 13-2-2008 22:06 Tobias Brox wrote:
> What I'm told is that the state-of-the-art SAN allows for
> an "insane amount" of hard disks to be installed, much more than what
> would fit into any decent database server.  We've ended up buying a SAN,
> the physical installation was done last week, and I will be able to tell
> in some months if it was a good idea after all, or not.

Your SAN-pusher should have a look at the HP-submissions for TPC-C...
The recent Xeon systems are all without SAN's and still able to connect
hundreds of SAS-disks.

This one has 2+28+600 hard drives connected to it:
http://tpc.org/results/individual_results/HP/hp_ml370g5_2p_X5460_tpcc_080107_es.pdf

Long story short, using SAS you can theoretically connect up to 64k
disks to a single system. And with the HP-example they connected 26
external enclosures (MSA70) to 8 internal with external SAS-ports. I.e.
they ended up with 28+600 harddrives spread out over 16 external 4-port
SAS-connectors with a bandwidth of 12Gbit per connector...

Obviously its a bit difficult to share those 628 harddrives amongst
several systems, but the argument your colleagues have for SAN isn't a
very good one. All major hardware vendors nowadays have external
SAS-enclosures which can hold 12-25 external harddrives (and can often
be stacked to two or three enclosures) and can be connected to normal
internal PCI-e SAS-raid-cards. Those controllers have commonly two
external ports and can be used with other controllers in the system to
combine all those connected enclosures to one or more virtual images, or
you could have your software LVM/raid on top of those controllers.

Anyway, the common physical limit of 6-16 disks in a single
server-enclosure isn't very relevant anymore in an argument against SAN.

Best regards,

Arjen

Re: Anyone using a SAN?

From
Greg Smith
Date:
On Wed, 13 Feb 2008, Tobias Brox wrote:

> What I'm told is that the state-of-the-art SAN allows for an "insane
> amount" of hard disks to be installed, much more than what would fit
> into any decent database server.

You can attach a surpringly large number of drives to a server nowadays,
but in general it's easier to manage larger numbers of them on a SAN.
Also, there are significant redundancy improvements using a SAN that are
worth quite a bit in some enterprise environments.  Being able to connect
all the drives, no matter how many, to two or more machines at once
trivially is typically easier to setup on a SAN than when you're using
more direct storage.

Basically the performance breaks down like this:

1) Going through the SAN interface (fiber channel etc.) introduces some
latency and a potential write bottleneck compared with direct storage,
everything else being equal.  This can really be a problem if you've got a
poor SAN vendor or interface issues you can't sort out.

2) It can be easier to manage a large number of disks in the SAN, so for
situations where aggregate disk throughput is the limiting factor the SAN
solution might make sense.

3) At the high-end, you can get SANs with more cache than any direct
controller I'm aware of, which for some applications can lead to them
having a more quantifiable lead over direct storage.  It's easy (albeit
expensive) to get an EMC array with 16GB worth of memory for caching on it
for example (and with 480 drives).  And since they've got a more robust
power setup than a typical server, you can even enable all the individual
drive caches usefully (that's 16-32MB each nowadays, so at say 100 disks
you've potentially got another 1.6GB of cache right there).  If you're got
a typical server you can end up needing to turn off individual direct
attached drive caches for writes, because they many not survive a power
cycle even with a UPS, and you have to just rely on the controller write
cache.

There's no universal advantage on either side here, just a different set
of trade-offs.  Certainly you'll never come close to the performance/$
direct storage gets you if you buy that in SAN form instead, but at higher
budgets or feature requirements they may make sense anyway.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Anyone using a SAN?

From
Tobias Brox
Date:
[Arjen van der Meijden]
> Your SAN-pusher should have a look at the HP-submissions for TPC-C...
> The recent Xeon systems are all without SAN's and still able to connect
> hundreds of SAS-disks.

Yes, I had a feeling that the various alternative solutions for "direct
connection" hadn't been investigated fully.  I was pushing for it, but
hardware is not my thing.  Anyway, most likely the only harm done by
chosing SAN is that it's more expensive than an equivalent solution with
direct connected disks.  Well, not my money anyway. ;-)

> Obviously its a bit difficult to share those 628 harddrives amongst
> several systems, but the argument your colleagues have for SAN isn't a
> very good one.

As far as I've heard, you cannot really benefit much from this with
postgres, one cannot have two postgres servers on two hosts sharing the
same data (i.e. using one for failover or for CPU/memory-bound read
queries).

Having the SAN connected to several hosts gives us two benefits, if the
database host goes down but not the SAN, it will be quite fast to start
up a new postgres instance on a different host - and it will also be
possible to take out backups real-time from the SAN without much
performance-hit.  Anyway, with a warm standby server as described on
http://www.postgresql.org/docs/current/interactive/warm-standby.html one
can achieve pretty much the same without a SAN.


Re: Anyone using a SAN?

From
"Scott Marlowe"
Date:
On Feb 13, 2008 5:02 PM, Greg Smith <gsmith@gregsmith.com> wrote:
> On Wed, 13 Feb 2008, Tobias Brox wrote:
>
> > What I'm told is that the state-of-the-art SAN allows for an "insane
> > amount" of hard disks to be installed, much more than what would fit
> > into any decent database server.
>
> You can attach a surpringly large number of drives to a server nowadays,
> but in general it's easier to manage larger numbers of them on a SAN.
> Also, there are significant redundancy improvements using a SAN that are
> worth quite a bit in some enterprise environments.  Being able to connect
> all the drives, no matter how many, to two or more machines at once
> trivially is typically easier to setup on a SAN than when you're using
> more direct storage.

SNIP

> There's no universal advantage on either side here, just a different set
> of trade-offs.  Certainly you'll never come close to the performance/$
> direct storage gets you if you buy that in SAN form instead, but at higher
> budgets or feature requirements they may make sense anyway.

I agree with everything you've said here, and you've said it far more
clearly than I could have.

I'd like to add that it may still be feasable to have a SAN and a db
with locally attached storage.  Talk the boss into a 4 port caching
SAS controller and four very fast hard drives or something else on the
server so that you can run tests to compare the performance of a
rather limited on board RAID set to the big SAN.  For certain kinds of
things, like loading tables, it will still be a very good idea to have
local drives for caching and transforming data and such.

Going further, the argument for putting the db onto the SAN may be
weakened if the amount of data on the db server can't and likely won't
require a lot of space.  A lot of backend office dbs are running in
the sub gigabyte range and will never grow to the size of the social
security database.  Even with dozens of apps, an in house db server
might be using no more than a few dozen gigabytes of storage.  Given
the cost and performance of large SAS and SATA drives, it's not all
unlikely that you can fit everything you need for the next five years
on a single set of disks on a server that's twice as powerful as most
internal db servers need.

You can hide the cost of the extra drives in the shadow of the receipt
for the SAN.

Re: Anyone using a SAN?

From
Bruce Momjian
Date:
Should this be summarized somewhere in our docs; just a few lines with
the tradeoffs, direct storage = cheaper, faster, SAN = more configurable?

---------------------------------------------------------------------------

Scott Marlowe wrote:
> On Feb 13, 2008 5:02 PM, Greg Smith <gsmith@gregsmith.com> wrote:
> > On Wed, 13 Feb 2008, Tobias Brox wrote:
> >
> > > What I'm told is that the state-of-the-art SAN allows for an "insane
> > > amount" of hard disks to be installed, much more than what would fit
> > > into any decent database server.
> >
> > You can attach a surpringly large number of drives to a server nowadays,
> > but in general it's easier to manage larger numbers of them on a SAN.
> > Also, there are significant redundancy improvements using a SAN that are
> > worth quite a bit in some enterprise environments.  Being able to connect
> > all the drives, no matter how many, to two or more machines at once
> > trivially is typically easier to setup on a SAN than when you're using
> > more direct storage.
>
> SNIP
>
> > There's no universal advantage on either side here, just a different set
> > of trade-offs.  Certainly you'll never come close to the performance/$
> > direct storage gets you if you buy that in SAN form instead, but at higher
> > budgets or feature requirements they may make sense anyway.
>
> I agree with everything you've said here, and you've said it far more
> clearly than I could have.
>
> I'd like to add that it may still be feasable to have a SAN and a db
> with locally attached storage.  Talk the boss into a 4 port caching
> SAS controller and four very fast hard drives or something else on the
> server so that you can run tests to compare the performance of a
> rather limited on board RAID set to the big SAN.  For certain kinds of
> things, like loading tables, it will still be a very good idea to have
> local drives for caching and transforming data and such.
>
> Going further, the argument for putting the db onto the SAN may be
> weakened if the amount of data on the db server can't and likely won't
> require a lot of space.  A lot of backend office dbs are running in
> the sub gigabyte range and will never grow to the size of the social
> security database.  Even with dozens of apps, an in house db server
> might be using no more than a few dozen gigabytes of storage.  Given
> the cost and performance of large SAS and SATA drives, it's not all
> unlikely that you can fit everything you need for the next five years
> on a single set of disks on a server that's twice as powerful as most
> internal db servers need.
>
> You can hide the cost of the extra drives in the shadow of the receipt
> for the SAN.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Anyone using a SAN?

From
"Peter Koczan"
Date:
Thanks for all your input, it is very helpful. A SAN for our postgres
deployment is probably sufficient in terms of performance, because we
just don't have that much data. I'm a little concerned about needs for
user and research databases, but if a project needs a big, fast
database, it might be wise to have them shell out for DAS.

My co-workers and I are meeting with a vendor in two weeks (3Par,
specifically), and I think I have a better idea of what I should be
looking at. I'll keep you all up on the situation. Keep the ideas
coming as I still would like to know of any other important factors.

Thanks again.

Peter

Re: Anyone using a SAN?

From
"Greg Stark"
Date:
Tobias Brox wrote:
> [Peter Koczan - Wed at 10:56:54AM -0600]
>
> The consensus on this list seemed to be that running postgres on SAN is
> not cost efficiently - one would get better performance for a lower cost
> if the database host is connected directly to the disks - and also,
> buying the wrong SAN can cause quite some problems.
>
That's true about SANs in general. You don't buy a SAN because it'll
cost less than just buying the disks and a controller. You buy a SAN
because it'll let you make managing it easier. The break-even point has
more to do with how many servers you're able to put on the SAN and how
often you need to do tricky backup and upgrade procedures than it
doeswith the hardware.

Re: Anyone using a SAN?

From
Greg Smith
Date:
On Wed, 13 Feb 2008, Bruce Momjian wrote:

> Should this be summarized somewhere in our docs; just a few lines with
> the tradeoffs, direct storage = cheaper, faster, SAN = more configurable?

I think it's kind of stetching the PostgreSQL documentation to be covering
that.  It's hard to generalize here without giving a fair amount of
background and caveats--that last message was about as compact a
commentary on this as I could come up with.  One of the things I was
hoping to push into the community documentation one day was a larger look
at disk layout than covers this, RAID issues, and related topics (this got
started at http://www.postgresql.org/docs/techdocs.64 but stalled).

What's nice about putting it into a web-only format is that it's easy to
hyperlink heavily into the archives to recommend specific discussion
threads of the issues for reference, which isn't practical in the manual.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Anyone using a SAN?

From
"Peter Koczan"
Date:
> That's true about SANs in general. You don't buy a SAN because it'll
> cost less than just buying the disks and a controller. You buy a SAN
> because it'll let you make managing it easier. The break-even point has
> more to do with how many servers you're able to put on the SAN and how
> often you need to do tricky backup and upgrade procedures than it
> doeswith the hardware.

One big reason we're really looking into a SAN option is that we have
a lot of unused disk space. A typical disk usage scheme for us is 6 GB
for a clean Linux install, and 20 GB for a Windows install. Our disks
are typically 80GB, and even after decent amounts of usage we're not
even approaching half that. We install a lot of software in AFS, our
networked file system, and users' home directories and project
directories are in AFS as well. Local disk space is relegated to the
OS and vendor software, servers that need it, and seldom-used scratch
space. There might very well be a break-even point for us in terms of
cost.

One of the other things I was interested in was the "hidden costs" of
a SAN. For instance, we'd probably have to invest in more UPS capacity
to protect our data. Are there any other similar points that people
don't initially consider regarding a SAN?

Again, thanks for all your help.

Peter

Re: Anyone using a SAN?

From
Sven Geisler
Date:
Hi Peter,
Peter Koczan schrieb:
>
>
> One of the other things I was interested in was the "hidden costs" of
> a SAN. For instance, we'd probably have to invest in more UPS capacity
> to protect our data. Are there any other similar points that people
> don't initially consider regarding a SAN?
>
>

There are "hidden costs". The set up of a local disk system is easy. You
need only a few decisions.
This is totally different when it comes to SAN.
At the end of the day you need a guy who has the knowledge to design and
configure such system.
That's why you should buy a SAN and the knowledge  from a brand or a
specialist company.

BTW: You can do other things with SAN you can't do with local disks.
- mirroring to another location (room)
- mounting snapshots on another server

Sven.

--
Sven Geisler <sgeisler@aeccom.com>   Tel +49.30.921017.81  Fax .50
Senior Developer, think project! Solutions GmbH & Co. KG,  Germany


Re: Anyone using a SAN?

From
Matthew
Date:
On Mon, 18 Feb 2008, Peter Koczan wrote:
> One of the other things I was interested in was the "hidden costs" of
> a SAN. For instance, we'd probably have to invest in more UPS capacity
> to protect our data. Are there any other similar points that people
> don't initially consider regarding a SAN?

You may well find that the hardware required in each machine to access the
SAN (fibrechannel cards, etc) and switches are way more expensive than
just shoving a cheap hard drive in each machine. Hard drives are
mass-produced, and remarkably cheap for what they do. SAN hardware is
specialist, and expensive.

Matthew

--
Nog:     Look! They've made me into an ensign!
O'Brien: I didn't know things were going so badly.
Nog:     Frightening, isn't it?

Re: Anyone using a SAN?

From
"C." Bergström
Date:
On Wed, 2008-02-20 at 13:41 +0000, Matthew wrote:
> On Mon, 18 Feb 2008, Peter Koczan wrote:
> > One of the other things I was interested in was the "hidden costs" of
> > a SAN. For instance, we'd probably have to invest in more UPS capacity
> > to protect our data. Are there any other similar points that people
> > don't initially consider regarding a SAN?
>
> You may well find that the hardware required in each machine to access the
> SAN (fibrechannel cards, etc) and switches are way more expensive than
> just shoving a cheap hard drive in each machine. Hard drives are
> mass-produced, and remarkably cheap for what they do. SAN hardware is
> specialist, and expensive.

Can be, but may I point to a recent posting on Beowulf ml [1] and the
article it references [2] Showing that the per node price of SDR IB has
come down far enough to in some cases compete with GigE.  ymmv, but I'm
in the planning phase for a massive storage system and it's something
we're looking into.  Just thought I'd share


Success!

./C

[1]
http://www.mirrorservice.org/sites/www.beowulf.org/archive/2008-January/020538.html

[2] http://www.clustermonkey.net/content/view/222/1/




Re: Anyone using a SAN?

From
Michael Stone
Date:
On Mon, Feb 18, 2008 at 03:44:40PM -0600, Peter Koczan wrote:
>One big reason we're really looking into a SAN option is that we have
>a lot of unused disk space.

The cost of the SAN interfaces probably exceeds the cost of the wasted
space, and the performance will probably be lower for a lot of
workloads. There are good reasons to have SANs, but increasing
utilization of disk drives probably isn't one of them.

>A typical disk usage scheme for us is 6 GB
>for a clean Linux install, and 20 GB for a Windows install. Our disks
>are typically 80GB, and even after decent amounts of usage we're not
>even approaching half that.

I typically partition systems to use a small fraction of the disk space,
and don't even acknowledge that the rest exists unless there's an actual
reason to use it. But the disks are basically free, so there's no point
in trying to buy small ones to save space.

Mike Stone

Re: Anyone using a SAN?

From
Michael Stone
Date:
On Wed, Feb 20, 2008 at 02:52:42PM +0100, C. Bergström wrote:
>Can be, but may I point to a recent posting on Beowulf ml [1] and the
>article it references [2] Showing that the per node price of SDR IB has
>come down far enough to in some cases compete with GigE.  ymmv, but I'm
>in the planning phase for a massive storage system and it's something
>we're looking into.  Just thought I'd share

For HPC, maybe. For other sectors, it's hard to compete with the free
GBE that comes with the machine, and that low price doesn't reflect the
cost of extending an oddball network infrastructure outside of a cluster.

Mike Stone

Re: Anyone using a SAN?

From
"Peter Koczan"
Date:
Hi all,

I had a few meetings with SAN vendors and I thought I'd give you some
follow-up on points of potential interest.

- Dell/EMC
The representative was like the Dell dude grown up. The sales pitch
mentioned "price point" about twenty times (to the point where it was
annoying), and the pitch ultimately boiled down to "Dude, you're
getting a SAN." My apologies in advance to bringing back repressed
memories of the Dell dude. As far as technical stuff goes, it's about
what you'd expect from a low-level SAN. The cost for a SAN was in the
$2-3 per GB range if you went with the cheap option...not terrible,
but not great either, especially since you'd have to buy lots of GB.
Performance numbers weren't bad, but they weren't great either.

- 3par
The sales pitch was more focused on technical aspects and only
mentioned "price point" twice...which is a win in my books, at least
compared to Dell. Their real place to shine was in the technical
aspect. Whereas Dell just wanted to sell you a storage system that you
put on a network, 3par wanted to sell you a storage system
specifically designed for a network, and change the very way you think
about storage. They had a bunch of cool management concepts, and very
advanced failover, power outage, and backup techniques and tools.
Performance wasn't shabby, either, for instance a RAID 5 set could get
about 90% the IOPS and transfer rate that a RAID 10 set could. How
exactly this compares to DAS they didn't say. The main stumbling block
with 3par is price. While they didn't give any specific numbers, best
estimates put a SAN in the $5-7 per GB range. The extra features just
might be worth it though.

- Lefthand
This is going to be an upcoming meeting, so I don't have as good of an
opinion. Looking at their website, they seem more to the Dell end in
terms of price and functionality. I'll keep you in touch as I have
more info. They seem good for entry-level SANs, though.

Luckily, almost everything here works with Linux (at least the major
distros), including the management tools, in case people were worried
about that. One of the key points to consider going forward is that
the competition of iSCSI and Fibre Channel techs will likely bring
price down in the future. While SANs are certainly more expensive than
their DAS counterparts, the gap appears to be closing.

However, to paraphrase a discussion between a few of my co-workers,
you can buy toilet paper or kitty litter in huge quantities because
you know you'll eventually use it...and it doesn't change in
performance or basic functionality. Storage is just something that you
don't always want to buy a lot of in one go. It will get bigger, and
cheaper, and probably faster in a relatively short amount of time. The
other thing is that you can't really get a small SAN. The minimum is
usually in the multiple TB range (and usually >10 TB). I'd love to be
able to put together a proof of concept and a test using 3par's
technology and commodity 80GB slow disks, but I really can't. You're
stuck with going all-in right away, and enough people have had
problems being married to specific techs or vendors that it's really
hard to break that uneasiness.

Thanks for reading, hopefully you found it slightly informative.

Peter

Re: Anyone using a SAN?

From
"Peter Koczan"
Date:
>  Dell acquired Equallogic last November/December.
>
>  I noticed your Dell meeting was a Dell/EMC meeting. Have you talked to them
> or anyone else about Equallogic?

Now that you mention it, I do recall a bit about Equalogic in the Dell
pitch. It didn't really stand out in my mind and a lot of the
technical details were similar enough to the EMC details that they
just melded in my mind.

>  When I was looking at iSCSI solutions, the Equallogic was really slick. Of
> course, I needed high-end performance, which of course came at a steep
> price, and the project got canned. Oh well. Still, the EL solution claimed
> near linear scalability when additional capacity/shelves were added. And,
> they have a lot of really nice technologies for managing the system.

If you think Equalogic is slick, check out 3par. They've got a lot of
very cool features and concepts. Unfortunately, this comes at a higher
price. To each his own, I guess.

Our meetings didn't focus a lot on scalability of capacity, as we just
didn't think to ask. I think the basic pitch was "it scales well"
without any real hard data.

Peter