Thread: NAS, SAN or any alternate solution ?

NAS, SAN or any alternate solution ?

From
bsimon@loxane.com
Date:

Hi all,

I've been searching the list for a while but couldn't find any up-to-date information relating to my problem.
We have a production server with postgresql on cygwin that currently deels with about 200 Gigs of data (1 big IDE drive). We plan to move to linux for some reasons I don't have to explain.
Our aim is also to be able to increase our storage capacity up to approximately 1 or 2 terabytes and to speed up our production process. As we are a small "microsoft addicted" company , we have some difficulties to choose the best configuration that would best meet our needs.
Our production process is based on transaction (mostly huge inserts) and disk access is the main bottlle-neck.

Our main concern is hardware related :

Would NAS or SAN be good solutions ? (I've read that NAS uses NFS which could slow down the transfer rate ??)
Has anyone ever tried one of these with postgresql ?

I would appreciate any comments.
Thanks in advance.

Benjamin.

================================================
Benjamin Simon - Ingénieur Développement Cartographie
http://www.loxane.com
tel : 01 30 40 24 00
Fax : 01 30 40 24 04

LOXANE
271, Chaussée Jules César 95250 Beauchamp
France

Re: NAS, SAN or any alternate solution ?

From
"Scott Marlowe"
Date:
On Tue, 2004-07-20 at 01:52, bsimon@loxane.com wrote:
> Hi all,
>
> I've been searching the list for a while but couldn't find any
> up-to-date information relating to my problem.
> We have a production server with postgresql on cygwin that currently
> deels with about 200 Gigs of data (1 big IDE drive). We plan to move
> to linux for some reasons I don't have to explain.
> Our aim is also to be able to increase our storage capacity up to
> approximately 1 or 2 terabytes and to speed up our production process.
> As we are a small "microsoft addicted" company , we have some
> difficulties to choose the best configuration that would best meet our
> needs.
> Our production process is based on transaction (mostly huge inserts)
> and disk access is the main bottlle-neck.
>
> Our main concern is hardware related :
>
> Would NAS or SAN be good solutions ? (I've read that NAS uses NFS
> which could slow down the transfer rate ??)
> Has anyone ever tried one of these with postgresql ?

Your best bet would likely be a large external RAID system with lots o
cache.  Next would be a fast internal RAID card like the LSI Megaraid
cards, with lots of drives and batter backed cache.  Next would be a
SAN, but be careful, there may be issues with some cards and their
drivers under linux, research them well before deciding.  NFS is right
out if you want good performance AND reliability.

The cheapest solution that is likely to meet your needs would be the
internal RAID card with battery backed cache.


Re: NAS, SAN or any alternate solution ?

From
Grega Bremec
Date:
...and on Tue, Jul 20, 2004 at 09:52:56AM +0200, bsimon@loxane.com used the keyboard:
> Hi all,
>
> I've been searching the list for a while but couldn't find any up-to-date
> information relating to my problem.
> We have a production server with postgresql on cygwin that currently deels
> with about 200 Gigs of data (1 big IDE drive). We plan to move to linux
> for some reasons I don't have to explain.
> Our aim is also to be able to increase our storage capacity up to
> approximately 1 or 2 terabytes and to speed up our production process. As
> we are a small "microsoft addicted" company , we have some difficulties to
> choose the best configuration that would best meet our needs.
> Our production process is based on transaction (mostly huge inserts) and
> disk access is the main bottlle-neck.
>
> Our main concern is hardware related :
>
> Would NAS or SAN be good solutions ? (I've read that NAS uses NFS which
> could slow down the transfer rate ??)
> Has anyone ever tried one of these with postgresql ?
>
> I would appreciate any comments.
> Thanks in advance.

Hello Simon,

We're testing 3ware Escalade 9000, which is a hardware-raid SATA
controller with VERY good support for Linux (including direct access
for S.M.A.R.T. applications, which is a serious problem with other
RAID controllers), featuring RAID levels 0, 1, 10, 5, JBOD, up to
12 SATA channels (that's 3ware Escalade 9500S-12, they also come in
4- and 8-channel versions, up to four cards can be fitted into a
system), up to 1GB battery-backed ECC RAM (128MB out-of-the-box)
and most of all, excellent tuning guides that actually manage to
exceed the scope of merely making you come up with good benchmark
results for that controller in a specific test environment.

Our preliminary tests show that a setup of four 250GB SATA Maxtors
that aren't really qualified as fast drives, in RAID5 can deliver
block writes of 50MB/s, rewrites at about 35MB/s and reads of
approximately 180MB/s, which is rougly 2.5-times the performance
of previous Escalades.

You can find more info on Escalade 9000 series, benchmarks and
other stuff here:

    http://www.3ware.com/products/serial_ata9000.asp
    http://www.3ware.com/products/benchmarks_sata.asp
    http://www.3ware.dk/fileadmin/3ware/documents/Benchmarks/Linux_kernel_2.6_Benchmarking.pdf

Oh, and not to forget - the price for a 3ware 9500S-12, the version
we're testing ranges between EUR1000 and EUR1500, depending on the
contract you have with the reseller and the intended use of the
device. SATA disks are dirt-cheap nowadays, as has been mentioned
before.

I do agree on the reliability of cache-usage setting those drives
report though, it may or may not be true. But one never knows that
for sure with SCSI drives either. At least you can assert that
proper controller cache sizing with drives that usually feature
8MB (!!!) cache, will mostly ensure that even the largest amount
of data that could fit into a hard disk cache of the entire array
(96MB) will still be available in the controller cache after a
power failure, for it to be re-checked and ensured it is properly
written.

Hope this helps,
--
    Grega Bremec
    Senior Administrator
    Noviforum Ltd., Software & Media
    http://www.noviforum.si/

Attachment

Re: NAS, SAN or any alternate solution ?

From
Grega Bremec
Date:
>
> Oh, and not to forget - the price for a 3ware 9500S-12, the version
> we're testing ranges between EUR1000 and EUR1500, depending on the
> contract you have with the reseller and the intended use of the
> device. SATA disks are dirt-cheap nowadays, as has been mentioned
> before.
>

Correction, EUR500 and EUR1000, VAT not included. :)

Sorry for the mix-up.
--
    Grega Bremec
    Senior Administrator
    Noviforum Ltd., Software & Media
    http://www.noviforum.si/

Attachment

Re: NAS, SAN or any alternate solution ?

From
Joe Conway
Date:
bsimon@loxane.com wrote:
> Would NAS or SAN be good solutions ? (I've read that NAS uses NFS which
> could slow down the transfer rate ??)

> Has anyone ever tried one of these with postgresql ?

Not (yet) with Postgres, but my company has run ~100GB Oracle database
on NAS (NetApp) for the past couple of years. We've found it to
outperform local attached storage, and it has been extremely reliable
and flexible. Our DBAs wouldn't give it up without a fight.

Joe

Re: NAS, SAN or any alternate solution ?

From
Rod Taylor
Date:
> Would NAS or SAN be good solutions ? (I've read that NAS uses NFS
> which could slow down the transfer rate ??)
> Has anyone ever tried one of these with postgresql ?

I've used both a NetApp and Hitachi based SANs with PostgreSQL. Both
work as well as expected, but do require some tweeking as they normally
are not optimized for the datablock size that PostgreSQL likes to deal
with (8k by default) -- this can make as much as a 50% difference in
performance levels.

For a NAS setup, be VERY careful that the NFS implementation you're
using has the semantics that the database requires (do plenty of failure
testing -- pull plugs and things at random). iSCSI looks more promising,
but I've not tested how gracefully it fails.

Have your supplier run a bunch of benchmarks for random IO with 8k
blocks.

One side note, SANs seem to be very good at scaling across multiple jobs
from multiple sources, but beware your Fibre Channel drivers -- mine
seems to spend quite a bit of time managing interrupts and I've not
found a way to put it into a polling mode (I'm not a Linux person and
that trick usually happens for me on the BSDs).



Re: NAS, SAN or any alternate solution ?

From
Andrew Hammond
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Rod Taylor wrote:
| I've used both a NetApp and Hitachi based SANs with PostgreSQL. Both
| work as well as expected, but do require some tweeking as they normally
| are not optimized for the datablock size that PostgreSQL likes to deal
| with (8k by default) -- this can make as much as a 50% difference in
| performance levels.

I'm looking for documentation about the datablock size you mentioned above.

My goal is to tune the disk / filesystem on our prototype system. It's
an EMC disk array, so sectors on disk are 512 bytes of usable space.
We've decided to go with RAID 10 since the goal is to maximize
performance. Currently the raid element size is set at 16 sectors which
is 8192 bytes of payload. I've got a sysadmin working on getting XFS
going with 8192 byte blocks. My next task will be to calculate the
amount of space used by XFS for headers etc. to find out how much of
those 8192 bytes can be used for the postgres payload. Then configure
postgres to use datablocks that size. So I'm looking for details on how
to manipulate the size of the datablock.

I'm also not entirely sure how to make the datablocks line up with the
filesystem blocks. Any suggestions on this would be greatly appreciated.

- --
Andrew Hammond    416-673-4138    ahammond@ca.afilias.info
Database Administrator, Afilias Canada Corp.
CB83 2838 4B67 D40F D086 3568 81FC E7E5 27AF 4A9A
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBUeHmgfzn5SevSpoRAu2sAJ4nHHup5lhp4+RcgBPGoJpUFoE1SQCgyvW1
ixyAvqb7ZkB+IIdGb36mpxI=
=uDLW
-----END PGP SIGNATURE-----

Re: NAS, SAN or any alternate solution ?

From
Rod Taylor
Date:
> Rod Taylor wrote:
> | I've used both a NetApp and Hitachi based SANs with PostgreSQL. Both
> | work as well as expected, but do require some tweeking as they normally
> | are not optimized for the datablock size that PostgreSQL likes to deal
> | with (8k by default) -- this can make as much as a 50% difference in
> | performance levels.

> I'm also not entirely sure how to make the datablocks line up with the
> filesystem blocks. Any suggestions on this would be greatly appreciated.

We just played with Veritas settings while running pg_bench on a 200GB
database. I no longer have access to the NetApp, but the settings for
the Hitachi are below.

In tunefstab we have:

read_pref_io=8192,read_nstream=4,write_pref_io=8192,write_nstream=2

In fstab it's:
    defaults,mincache=tmpcache,noatime


If you have better settings, please shoot them over so we can try them
out. Perhaps even get someone over there to write a new SAN section in
the Tuning Chapter.


Re: NAS, SAN or any alternate solution ?

From
Greg Stark
Date:
Andrew Hammond <ahammond@ca.afilias.info> writes:

> My goal is to tune the disk / filesystem on our prototype system. It's
> an EMC disk array, so sectors on disk are 512 bytes of usable space.
> We've decided to go with RAID 10 since the goal is to maximize
> performance. Currently the raid element size is set at 16 sectors which
> is 8192 bytes of payload.

Do people find it works well to have a stripe size that small? It seems like
it would be better to have it be at least a few filesystem/postgres blocks so
that subsequent reads stand a chance of being sequential and not causing
another spindle to have to seek. Does this depend on whether it's an DSS load
vs an OLTP load? If it's a single query at a time DSS system perhaps small
blocksizes work best to get maximum throughput?

> I've got a sysadmin working on getting XFS going with 8192 byte blocks.

Having your filesystem block size match postgres's block size is probably a
good idea. So 8k blocks is good.

> My next task will be to calculate the amount of space used by XFS for
> headers etc. to find out how much of those 8192 bytes can be used for the
> postgres payload.

No filesystem that I know of uses up space in every block. The overhead is all
stored elsewhere in blocks exclusively contain such overhead data. So just
setting postgres to 8k which the default would work well.

> Then configure postgres to use datablocks that size. So I'm looking for
> details on how to manipulate the size of the datablock.

Look in pg_config_manual.h in src/include. Postgres has to be recompiled to
change it and the database has to be reinitialized. But it could be set to 16k
or 32k. In which case you would probably want to adjust your filesystem to
match. But unless you do experiments you won't know if it would be of any
benefit to change.

> I'm also not entirely sure how to make the datablocks line up with the
> filesystem blocks. Any suggestions on this would be greatly appreciated.

They just will. The files start on a block boundary, so every 8k is a new
block. Postgres stores 8k at a time always.

--
greg