Thread: Asking for assistance in determining storage requirements

Asking for assistance in determining storage requirements

From
Chris Barnes
Date:
You assistance is appreciated.


BODY{font:10pt Tahoma, Verdana, sans-serif;}

I have question regarding disk storage for postgres servers

 

We are thinking long term about scalable storage and performance and would like some advise
or feedback about what other people are using.

 

We would like to get as much performance from our file systems as possible.

 

We use ibm 3650 quad processor with onboard SAS controller ( 3GB/Sec) with 15,000rpm drives.

We use raid 1 for the centos operating system and the wal archive logs.

The postgres database is on 5 drives configured as raid 5 with a global hot spare.

 

We are curious about using SAN with fiber channel hba and if anyone else uses this technology.

We would also like to know if people have preference to the level of raid with/out striping.

p.MsoNormal,li.MsoNormal,div.MsoNormal {margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;font-family:"Times New Roman";}a:link,span.MsoHyperlink {color:blue;text-decoration:underline;}a:visited,span.MsoHyperlinkFollowed {color:purple;text-decoration:underline;}span.EmailStyle17 {mso-style-type:personal-compose;font-family:Arial;color:windowtext;}@page Section1 {size:612.0pt 792.0pt;margin:72.0pt 90.0pt 72.0pt 90.0pt;}div.Section1 {page:Section1;}
Sincerely,

Chris Barnes
Recognia Inc.
Senior DBA


Attention all humans. We are your photos. Free us.

Re: Asking for assistance in determining storage requirements

From
Vick Khera
Date:
On Thu, Jul 9, 2009 at 11:15 AM, Chris
Barnes<compuguruchrisbarnes@hotmail.com> wrote:
> We are curious about using SAN with fiber channel hba and if anyone else
> uses this technology.
>
> We would also like to know if people have preference to the level of raid
> with/out striping.

I used SurfRAID Triton external RAID units connected to Sun X4100
boxes via LSI Fibre Channel cards.  I run them as RAID6 plus hot spare
with a total of 16 drives.  This is extremely fast and provides for up
to 2 disk failure.  The key is to have 1 or 2 gigs of cache on the
RAID units.  I also crank up the RAM on the servers to at least 20Gb.

Re: Asking for assistance in determining storage requirements

From
Alan McKay
Date:
No other takers on this one?

I'm wondering what exactly "direct attached storage" entails?

At PG Con I heard a lot about using only direct-attached storage, and not a SAN.
Are there numbers to back this up?

Does fibre-channel count as direct-attached storage?   I'm thinking it would.

What exactly is recommended against?  Any strorage that is TCP/IP based?

On Thu, Jul 9, 2009 at 11:15 AM, Chris
Barnes<compuguruchrisbarnes@hotmail.com> wrote:
> You assistance is appreciated.
>
>
> I have question regarding disk storage for postgres servers
>
>
>
> We are thinking long term about scalable storage and performance and would
> like some advise
> or feedback about what other people are using.
>
>
>
> We would like to get as much performance from our file systems as possible.
>
>
>
> We use ibm 3650 quad processor with onboard SAS controller ( 3GB/Sec) with
> 15,000rpm drives.
>
> We use raid 1 for the centos operating system and the wal archive logs.
>
> The postgres database is on 5 drives configured as raid 5 with a global hot
> spare.
>
>
>
> We are curious about using SAN with fiber channel hba and if anyone else
> uses this technology.
>
> We would also like to know if people have preference to the level of raid
> with/out striping.
>
> Sincerely,
>
> Chris Barnes
> Recognia Inc.
> Senior DBA
>
> ________________________________
> Attention all humans. We are your photos. Free us.



--
“Don't eat anything you've ever seen advertised on TV”
         - Michael Pollan, author of "In Defense of Food"

Re: Asking for assistance in determining storage requirements

From
Craig Ringer
Date:
On Thu, 2009-07-09 at 11:15 -0400, Chris Barnes wrote:

>
>         We would like to get as much performance from our file systems
>         as possible.

Then avoid RAID 5. Raid 10 is a pretty good option for most loads.

Actually, RAID 5 is quite decent for read-mostly large volume storage
where you really need to be disk-space efficient. However, if you spread
the RAID 5 out over enough disks for it to start getting fast reads, you
face a high risk of disk failure during RAID rebuild. For that reason,
consider using RAID 6 instead - over a large set of disks - so you're
better protected against disk failures during rebuild.

If you're doing much INSERTing / UPDATEing then RAID 5/6 are not for
you. RAID 10 is pretty much the default choice for write-heavy loads.

>         The postgres database is on 5 drives configured as raid 5 with
>         a global hot spare.

>         We are curious about using SAN with fiber channel hba and if
>         anyone else uses this technology.

There are certainly people on the list using PostgreSQL on a FC SAN. It
comes up in passing quite a bit.

It's really, REALLY important to make sure your SAN honours fsync()
though - at least to the point making sure the SAN hardware has the data
in battery-backed cache before returning from the fsync() call.
Otherwise you risk serious data loss. I'd be unpleasantly surprised if
any SAN shipped with SAN or FC HBA configuration that disregarded
fsync() but it _would_ make benchmark numbers look better, so it's not
safe to assume without testing.

From general impressions gathered from the list ( I don't use such large
scale gear myself and can't speak personally ) it does seem like most
systems built for serious performance use direct-attached SAS arrays.
People also seem to separate out read-mostly/archival tables,
update-heavy tables, the WAL, temp table space, and disk sort space into
different RAID sets.

--
Craig Ringer


Re: Asking for assistance in determining storage requirements

From
Scott Marlowe
Date:
On Thu, Jul 9, 2009 at 9:15 AM, Chris
Barnes<compuguruchrisbarnes@hotmail.com> wrote:
> You assistance is appreciated.
>
> I have question regarding disk storage for postgres servers
>
> We are thinking long term about scalable storage and performance and would
> like some advise or feedback about what other people are using.
>
> We would like to get as much performance from our file systems as possible.
>
> We use ibm 3650 quad processor with onboard SAS controller ( 3GB/Sec) with
> 15,000rpm drives.
>
> We use raid 1 for the centos operating system and the wal archive logs.
>
> The postgres database is on 5 drives configured as raid 5 with a global hot
> spare.

OK, two things jump out at me.  One is that you aren't using a
hardware RAID controller with battery backed cache, and you're using
RAID-5.

For most non-db applications, RAID-5 and no battery backed cache is
just fine.  For some DB applications like a reporting db or batch
processing it's ok too.  For DB applications that handle lots of small
transactions, it's a really bad choice.

Looking through the pgsql-performance archives, you'll see RAID-10 and
HW RAID with battery backed cache mentioned over and over again, and
for good reasons.  RAID-10 is much more resilient, and a good HW RAID
controller with battery backed cache can re-order writes into groups
that are near each other on the same drive pair to make overall
throughput higher, as well as making burst throughput to be higher as
well by fsyncing immediately when you issue a write.

I'm assuming you have 8 hard drives to play with.  If that's the case,
you can have a RAID-1 for the OS etc and a RAID-10 with 4 disks and
two hot spares, OR a RAID-10 with 6 disks and no hot spares.  As long
as you pay close attention to your server and catch failed drives and
replace them by hand that might work, but it really sits wrong with
me.

> We are curious about using SAN with fiber channel hba and if anyone else
> uses this technology.

Yep, again, check the pgsql-perform archives.  Note that the level of
complexity is much higher, as is the cost, and if you're talking about
a dozen or two dozen drives, you're often much better off just having
a good direct attached set of disks, either with an embedded RAID
controller, or JBOD and using an internal RAID controller to handle
them.  The top of the line RAID controllers that can handle 24 or so
disks run $1200 to $1500.  Taking the cost of the drives out of the
equation, I'm pretty sure any FC/SAN setup is gonna cost a LOT more
than that single RAID card.  I can buy a 16 drive 32TB DAS box for
about $6k to $7k or so, plug it into a simple but fast SCSI controller
($400 tops) and be up in a few minutes.  Setting up a new SAN is never
that fast, easy, or cheap.

OTOH, if you've got a dozen servers that need lots and lots of
storage, a SAN will start making more sense since it makes managing
lots of hard drives easier.

> We would also like to know if people have preference to the level of raid
> with/out striping.

RAID-10, then RAID-10 again, then RAID-1.  RAID-6 for really big
reporting dbs where storage is more important than performance, and
the data is mostly read anyways.  RAID-5 is to be avoided, period.  If
you have 6 disks in a RAID-6 with no spare, you're better off than a
RAID-5 with 5 disks and a spare, as in RAID-6 the "spare" is kind of
already built in.