Thread: NAS, SAN or any alternate solution ?
Hi all,
I've been searching the list for a while but couldn't find any up-to-date information relating to my problem.
We have a production server with postgresql on cygwin that currently deels with about 200 Gigs of data (1 big IDE drive). We plan to move to linux for some reasons I don't have to explain.
Our aim is also to be able to increase our storage capacity up to approximately 1 or 2 terabytes and to speed up our production process. As we are a small "microsoft addicted" company , we have some difficulties to choose the best configuration that would best meet our needs.
Our production process is based on transaction (mostly huge inserts) and disk access is the main bottlle-neck.
Our main concern is hardware related :
Would NAS or SAN be good solutions ? (I've read that NAS uses NFS which could slow down the transfer rate ??)
Has anyone ever tried one of these with postgresql ?
I would appreciate any comments.
Thanks in advance.
Benjamin.
================================================
Benjamin Simon - Ingénieur Développement Cartographie
http://www.loxane.com
tel : 01 30 40 24 00
Fax : 01 30 40 24 04
LOXANE
271, Chaussée Jules César 95250 Beauchamp
France
On Tue, 2004-07-20 at 01:52, bsimon@loxane.com wrote: > Hi all, > > I've been searching the list for a while but couldn't find any > up-to-date information relating to my problem. > We have a production server with postgresql on cygwin that currently > deels with about 200 Gigs of data (1 big IDE drive). We plan to move > to linux for some reasons I don't have to explain. > Our aim is also to be able to increase our storage capacity up to > approximately 1 or 2 terabytes and to speed up our production process. > As we are a small "microsoft addicted" company , we have some > difficulties to choose the best configuration that would best meet our > needs. > Our production process is based on transaction (mostly huge inserts) > and disk access is the main bottlle-neck. > > Our main concern is hardware related : > > Would NAS or SAN be good solutions ? (I've read that NAS uses NFS > which could slow down the transfer rate ??) > Has anyone ever tried one of these with postgresql ? Your best bet would likely be a large external RAID system with lots o cache. Next would be a fast internal RAID card like the LSI Megaraid cards, with lots of drives and batter backed cache. Next would be a SAN, but be careful, there may be issues with some cards and their drivers under linux, research them well before deciding. NFS is right out if you want good performance AND reliability. The cheapest solution that is likely to meet your needs would be the internal RAID card with battery backed cache.
...and on Tue, Jul 20, 2004 at 09:52:56AM +0200, bsimon@loxane.com used the keyboard: > Hi all, > > I've been searching the list for a while but couldn't find any up-to-date > information relating to my problem. > We have a production server with postgresql on cygwin that currently deels > with about 200 Gigs of data (1 big IDE drive). We plan to move to linux > for some reasons I don't have to explain. > Our aim is also to be able to increase our storage capacity up to > approximately 1 or 2 terabytes and to speed up our production process. As > we are a small "microsoft addicted" company , we have some difficulties to > choose the best configuration that would best meet our needs. > Our production process is based on transaction (mostly huge inserts) and > disk access is the main bottlle-neck. > > Our main concern is hardware related : > > Would NAS or SAN be good solutions ? (I've read that NAS uses NFS which > could slow down the transfer rate ??) > Has anyone ever tried one of these with postgresql ? > > I would appreciate any comments. > Thanks in advance. Hello Simon, We're testing 3ware Escalade 9000, which is a hardware-raid SATA controller with VERY good support for Linux (including direct access for S.M.A.R.T. applications, which is a serious problem with other RAID controllers), featuring RAID levels 0, 1, 10, 5, JBOD, up to 12 SATA channels (that's 3ware Escalade 9500S-12, they also come in 4- and 8-channel versions, up to four cards can be fitted into a system), up to 1GB battery-backed ECC RAM (128MB out-of-the-box) and most of all, excellent tuning guides that actually manage to exceed the scope of merely making you come up with good benchmark results for that controller in a specific test environment. Our preliminary tests show that a setup of four 250GB SATA Maxtors that aren't really qualified as fast drives, in RAID5 can deliver block writes of 50MB/s, rewrites at about 35MB/s and reads of approximately 180MB/s, which is rougly 2.5-times the performance of previous Escalades. You can find more info on Escalade 9000 series, benchmarks and other stuff here: http://www.3ware.com/products/serial_ata9000.asp http://www.3ware.com/products/benchmarks_sata.asp http://www.3ware.dk/fileadmin/3ware/documents/Benchmarks/Linux_kernel_2.6_Benchmarking.pdf Oh, and not to forget - the price for a 3ware 9500S-12, the version we're testing ranges between EUR1000 and EUR1500, depending on the contract you have with the reseller and the intended use of the device. SATA disks are dirt-cheap nowadays, as has been mentioned before. I do agree on the reliability of cache-usage setting those drives report though, it may or may not be true. But one never knows that for sure with SCSI drives either. At least you can assert that proper controller cache sizing with drives that usually feature 8MB (!!!) cache, will mostly ensure that even the largest amount of data that could fit into a hard disk cache of the entire array (96MB) will still be available in the controller cache after a power failure, for it to be re-checked and ensured it is properly written. Hope this helps, -- Grega Bremec Senior Administrator Noviforum Ltd., Software & Media http://www.noviforum.si/
Attachment
> > Oh, and not to forget - the price for a 3ware 9500S-12, the version > we're testing ranges between EUR1000 and EUR1500, depending on the > contract you have with the reseller and the intended use of the > device. SATA disks are dirt-cheap nowadays, as has been mentioned > before. > Correction, EUR500 and EUR1000, VAT not included. :) Sorry for the mix-up. -- Grega Bremec Senior Administrator Noviforum Ltd., Software & Media http://www.noviforum.si/
Attachment
bsimon@loxane.com wrote: > Would NAS or SAN be good solutions ? (I've read that NAS uses NFS which > could slow down the transfer rate ??) > Has anyone ever tried one of these with postgresql ? Not (yet) with Postgres, but my company has run ~100GB Oracle database on NAS (NetApp) for the past couple of years. We've found it to outperform local attached storage, and it has been extremely reliable and flexible. Our DBAs wouldn't give it up without a fight. Joe
> Would NAS or SAN be good solutions ? (I've read that NAS uses NFS > which could slow down the transfer rate ??) > Has anyone ever tried one of these with postgresql ? I've used both a NetApp and Hitachi based SANs with PostgreSQL. Both work as well as expected, but do require some tweeking as they normally are not optimized for the datablock size that PostgreSQL likes to deal with (8k by default) -- this can make as much as a 50% difference in performance levels. For a NAS setup, be VERY careful that the NFS implementation you're using has the semantics that the database requires (do plenty of failure testing -- pull plugs and things at random). iSCSI looks more promising, but I've not tested how gracefully it fails. Have your supplier run a bunch of benchmarks for random IO with 8k blocks. One side note, SANs seem to be very good at scaling across multiple jobs from multiple sources, but beware your Fibre Channel drivers -- mine seems to spend quite a bit of time managing interrupts and I've not found a way to put it into a polling mode (I'm not a Linux person and that trick usually happens for me on the BSDs).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Rod Taylor wrote: | I've used both a NetApp and Hitachi based SANs with PostgreSQL. Both | work as well as expected, but do require some tweeking as they normally | are not optimized for the datablock size that PostgreSQL likes to deal | with (8k by default) -- this can make as much as a 50% difference in | performance levels. I'm looking for documentation about the datablock size you mentioned above. My goal is to tune the disk / filesystem on our prototype system. It's an EMC disk array, so sectors on disk are 512 bytes of usable space. We've decided to go with RAID 10 since the goal is to maximize performance. Currently the raid element size is set at 16 sectors which is 8192 bytes of payload. I've got a sysadmin working on getting XFS going with 8192 byte blocks. My next task will be to calculate the amount of space used by XFS for headers etc. to find out how much of those 8192 bytes can be used for the postgres payload. Then configure postgres to use datablocks that size. So I'm looking for details on how to manipulate the size of the datablock. I'm also not entirely sure how to make the datablocks line up with the filesystem blocks. Any suggestions on this would be greatly appreciated. - -- Andrew Hammond 416-673-4138 ahammond@ca.afilias.info Database Administrator, Afilias Canada Corp. CB83 2838 4B67 D40F D086 3568 81FC E7E5 27AF 4A9A -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFBUeHmgfzn5SevSpoRAu2sAJ4nHHup5lhp4+RcgBPGoJpUFoE1SQCgyvW1 ixyAvqb7ZkB+IIdGb36mpxI= =uDLW -----END PGP SIGNATURE-----
> Rod Taylor wrote: > | I've used both a NetApp and Hitachi based SANs with PostgreSQL. Both > | work as well as expected, but do require some tweeking as they normally > | are not optimized for the datablock size that PostgreSQL likes to deal > | with (8k by default) -- this can make as much as a 50% difference in > | performance levels. > I'm also not entirely sure how to make the datablocks line up with the > filesystem blocks. Any suggestions on this would be greatly appreciated. We just played with Veritas settings while running pg_bench on a 200GB database. I no longer have access to the NetApp, but the settings for the Hitachi are below. In tunefstab we have: read_pref_io=8192,read_nstream=4,write_pref_io=8192,write_nstream=2 In fstab it's: defaults,mincache=tmpcache,noatime If you have better settings, please shoot them over so we can try them out. Perhaps even get someone over there to write a new SAN section in the Tuning Chapter.
Andrew Hammond <ahammond@ca.afilias.info> writes: > My goal is to tune the disk / filesystem on our prototype system. It's > an EMC disk array, so sectors on disk are 512 bytes of usable space. > We've decided to go with RAID 10 since the goal is to maximize > performance. Currently the raid element size is set at 16 sectors which > is 8192 bytes of payload. Do people find it works well to have a stripe size that small? It seems like it would be better to have it be at least a few filesystem/postgres blocks so that subsequent reads stand a chance of being sequential and not causing another spindle to have to seek. Does this depend on whether it's an DSS load vs an OLTP load? If it's a single query at a time DSS system perhaps small blocksizes work best to get maximum throughput? > I've got a sysadmin working on getting XFS going with 8192 byte blocks. Having your filesystem block size match postgres's block size is probably a good idea. So 8k blocks is good. > My next task will be to calculate the amount of space used by XFS for > headers etc. to find out how much of those 8192 bytes can be used for the > postgres payload. No filesystem that I know of uses up space in every block. The overhead is all stored elsewhere in blocks exclusively contain such overhead data. So just setting postgres to 8k which the default would work well. > Then configure postgres to use datablocks that size. So I'm looking for > details on how to manipulate the size of the datablock. Look in pg_config_manual.h in src/include. Postgres has to be recompiled to change it and the database has to be reinitialized. But it could be set to 16k or 32k. In which case you would probably want to adjust your filesystem to match. But unless you do experiments you won't know if it would be of any benefit to change. > I'm also not entirely sure how to make the datablocks line up with the > filesystem blocks. Any suggestions on this would be greatly appreciated. They just will. The files start on a block boundary, so every 8k is a new block. Postgres stores 8k at a time always. -- greg