Thread: Solid State Drives with PG (was: in RAM DB)
> Have you considered using one of these: > http://www.acard.com/english/fb01-product.jsp?idno_no=270&prod_no=ANS-9010&type1_title= > Solid State Drive&type1_idno=13 We did some research which suggested that performance may not be so great with them because the PG engine is not optimized to utilize those drives. So, I'll change the subject line to see if anyone has experience using these. -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"
On Fri, Mar 26, 2010 at 10:32 AM, Alan McKay <alan.mckay@gmail.com> wrote: >> Have you considered using one of these: >> http://www.acard.com/english/fb01-product.jsp?idno_no=270&prod_no=ANS-9010&type1_title= >> Solid State Drive&type1_idno=13 > > We did some research which suggested that performance may not be so > great with them because the PG engine is not optimized to utilize > those drives. > > So, I'll change the subject line to see if anyone has experience using these. postgres works fine with flash SSD, understanding that: *) postgres disk block is 8k and ssd erase block is much larger (newer ssd controllers minimize this penalty though) *) many flash drives cheat and buffer writes to delay full sync, for performance reasons and to extend the life of the drive *) if you have a relatively small database, the big 'win' off SSD, fast random reads, is of little/no use because the o/s will buffer the database in ram anywys. The ideal candidate for flash SSD from database point of view is one who is having I/O problems coming from OLTP type activity forcing the disks to constantly seek all over the place to write and (especially) read data. This happens when your database grows to the point when its OPERATIONAL (that is, frequently used) data size exceeds ram to a certain extent and o/s buffering of reads starts to become less effective. This can crush database performance. flash SSD 'fixes' this problem because relative to a disk head seek the cost of random read i/o on flash is basically zero. however flash has some problems writing, such that you get to choose between volatility of data (irrespective of fsync) or lousy performance. So flash isn't yet a general purpose database solution, and wont be until the write performance problem is fixed in a way that doesn't compromise on volatility. If/when that happens, and there isn't a huge price premium to pay vs flash prices today, all my new servers will be spec'd with flash :-). merlin
Merlin Moncure wrote: > So flash isn't yet a general purpose database solution, and wont be until > the write performance problem is fixed in a way that doesn't > compromise on volatility. Flash drives that ship with a supercapacitor large enough to ensure orderly write cache flushing in the event of power loss seem to be the only solution anyone is making progress on for this right now. That would turn them into something even better even than the traditional approach of using regular disk with a battery-backed write caching controller. Given the relatively small write cache involved and the fast write speed, it's certainly feasible to just flush at power loss every time rather than what the BBWC products do--recover once power comes back. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Fri, Mar 26, 2010 at 2:32 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Merlin Moncure wrote: >> >> So flash isn't yet a general purpose database solution, and wont be until >> the write performance problem is fixed in a way that doesn't >> compromise on volatility. > > Flash drives that ship with a supercapacitor large enough to ensure orderly > write cache flushing in the event of power loss seem to be the only solution > anyone is making progress on for this right now. That would turn them into > something even better even than the traditional approach of using regular > disk with a battery-backed write caching controller. Given the relatively > small write cache involved and the fast write speed, it's certainly feasible > to just flush at power loss every time rather than what the BBWC products > do--recover once power comes back. right -- unfortunately there is likely going to be a fairly high cost premium on these devices for a good while yet. right now afaik you only see this stuff on boutique type devices...yeech. I have to admit until your running expose in this stuff I was led to believe by a few companies (especially Intel) that flash storage technology was a few years ahead of where it really was -- it's going to take me a long time to forgive them for that! put another way (are you listening intel?): _NO_ drive should be positioned to the server/enterprise market that does not honor fsync by default unless it is very clearly documented! This is forgivable for a company geared towards the consumer market...but Intel...ugh! merlin
On Fri, 2010-03-26 at 15:27 -0400, Merlin Moncure wrote: > On Fri, Mar 26, 2010 at 2:32 PM, Greg Smith <greg@2ndquadrant.com> wrote: > > Merlin Moncure wrote: > >> > >> So flash isn't yet a general purpose database solution, and wont be until > >> the write performance problem is fixed in a way that doesn't > >> compromise on volatility. > > > > Flash drives that ship with a supercapacitor large enough to ensure orderly > > write cache flushing in the event of power loss seem to be the only solution > > anyone is making progress on for this right now. That would turn them into > > something even better even than the traditional approach of using regular > > disk with a battery-backed write caching controller. Given the relatively > > small write cache involved and the fast write speed, it's certainly feasible > > to just flush at power loss every time rather than what the BBWC products > > do--recover once power comes back. > > right -- unfortunately there is likely going to be a fairly high cost > premium on these devices for a good while yet. right now afaik you > only see this stuff on boutique type devices...yeech. TMS RamSan products have more than adequate capacitor power to handle failure cases. They look like a very solid product. In addition to this, they have internal RAID across the chips to protect against chip failure. Wear-leveling is controlled on the board instead of offloaded to the host. I haven't gotten my hands on one yet, but should at some point in the not to distant future. I'm not sure what the price point is though. But when you factor in the cost of the products they are competing against from a performance perspective, I'd be surprise if they aren't a lot cheaper. Especially when figuring in all the other costs that go along with disk arrays - power, cooling, rack space costs. Depends on the your vantange point I guess. I'm looking at these as potential alternatives to some high end, expensive storage products, not a cheap way to get really fast disk. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
On Fri, Mar 26, 2010 at 3:43 PM, Brad Nicholson <bnichols@ca.afilias.info> wrote: > I'm not sure what the price point is though. here is a _used_ 320gb ramsan for 15k :-). dram storage is pricey. merlin
On Fri, Mar 26, 2010 at 3:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > here is a _used_ 320gb ramsan for 15k :-). dram storage is pricey. > I think using DRAM as the base is way better than flash. Just use the flash or a regular disk as the backup with a battery to power the backup operation. I have in my storage room a DRAM based SCSI storage device made by Imperial Technology. It was totally the bees knees in 2000 when I bought it (with 1GB of RAM) for almost $30k. Upgraded a year later to 5Gb for another $15k. It has 4 low-profile/offset SCSI-2 connectors and full battery backed up UPS internal to it, and writes itself to a traditional disk drive on power outage, and continually ran self diagnostics to ensure that everything was just right. Free. But it doesn't power up. Probably needs a cap replaced or something simple like that.
On Wed, Apr 7, 2010 at 3:27 PM, Vick Khera <vivek@khera.org> wrote: > On Fri, Mar 26, 2010 at 3:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> here is a _used_ 320gb ramsan for 15k :-). dram storage is pricey. >> > > I think using DRAM as the base is way better than flash. Just use the > flash or a regular disk as the backup with a battery to power the > backup operation. > > I have in my storage room a DRAM based SCSI storage device made by > Imperial Technology. It was totally the bees knees in 2000 when I > bought it (with 1GB of RAM) for almost $30k. Upgraded a year later to > 5Gb for another $15k. It has 4 low-profile/offset SCSI-2 connectors > and full battery backed up UPS internal to it, and writes itself to a > traditional disk drive on power outage, and continually ran self > diagnostics to ensure that everything was just right. > > Free. But it doesn't power up. Probably needs a cap replaced or > something simple like that. dram storage makes sense in some cases but is generally so expensive that it throws off the whole hardware cost/engineering calculus even with the insane expense of writing software (even to the 0.0001% of it managers that understand this). that's saying something. the idea behind flash storage though was to provide at least decent performance at a reasonable cost. making dram storage fault tolerant takes a lot of engineering thus the high cost. as a dba, the idea of flash being able to be swapped in for sata spinning drives for a 10-20x gain in iops makes me vibrate. except that the fault tolerance issue isn't worked out yet. so I continue to buy bulk fossilized dinosaur plop and waste precious time figuring out how to make it work with otherwise fairly modern equipment. did i mention that i was annoyed with intel? check out their faq entry on ssd/write back cache: Does the Intel SSD have a write cache? Yes. However data caching is limited to the controller for enhanced performance. huh!? merlin
On Wed, Apr 7, 2010 at 4:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > except that the fault tolerance issue isn't worked out yet. Yep. I do not want to be the guy doing the product testing to see if they're suitable for a high-write DB load.
Vick Khera wrote: > On Wed, Apr 7, 2010 at 4:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > >> except that the fault tolerance issue isn't worked out yet. >> > > Yep. I do not want to be the guy doing the product testing to see if > they're suitable for a high-write DB load. > > all the enterprise SAN guys I've talked with say the Intel x25 drives are consumer junk, about the only thing they will use is STEC Zeus, and even then they mirror them. These are SAS or FC, not SATA, so write barriers are well behaved (assuming your OS doesn't toss them like <cough>LVM</cough>)
John R Pierce wrote: > all the enterprise SAN guys I've talked with say the Intel x25 drives > are consumer junk, about the only thing they will use is STEC Zeus, and > even then they mirror them. A couple of points there. 1) Mirroring flash drives is a bit ill advised since flash has a rather predictable long-term wear-out failure point. It would make more sense to mirror with a mechanical disk and use the SSD for reads, with some clever firmware to buffer up the extra writes to the mechanical disk and return completed status as soon as the data has been committed to the faster flash disk. 2) How much of that dislike of Intel is actually justified by something other than the margins offered / procurement policy (a.k.a. buying from the vendor that sends you the best present rather than from the vendor that has the best product)? Intel X25-E drives have write endurance, performance and power consumption (150mW TDP!) that are at least as good as other enterprise grade drives. Most enterprise grade drives don't even have trim support yet. I wouldn't knock Intel drives until you've tried them. Also bear in mind that Intel X25-E drives have high street prices similar to similar sized 15,000rpm mechanical drives you might buy from a SAN vendor (of course, same drives without the re-badge can be had for a fraction of the price). Then again, I never did have a very high opinion of big name SAN vendor hardware - I have always achieved better results at a fraction of the cost with appliances I've built myself. Gordan
Gordan Bobic wrote: > How much of that dislike of Intel is actually justified by something > other than the margins offered / procurement policy (a.k.a. buying > from the vendor that sends you the best present rather than from the > vendor that has the best product)? Intel X25-E drives have write > endurance, performance and power consumption (150mW TDP!) that are at > least as good as other enterprise grade drives. Please; there is nobody bashing Intel here who gives a damn about vendor payola in any direction. Intel's drives are not suitable for enterprise database use because their write cache policy both fails testing and isn't documented properly to figure out how to work around its limitations (if that's even possible). That' s the end of the story; if your drive gets corrupted and you lose your database, it doesn't matter how good any of the other things you mention are. > Then again, I never did have a very high opinion of big name SAN > vendor hardware - I have always achieved better results at a fraction > of the cost with appliances I've built myself. If you're not testing write cache durability under harsh conditions like a power plug pull, you're not doing a fair comparison. SAN hardware should include good behavior under such situations, it's part of what you're paying for, while many cheaper solutions do not. It's straightforward to beat the performance of a SAN, but what makes people buy them anyway is their ruggedness under really bad failure conditions that direct-attached storage can struggle with. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Gordan Bobic wrote: > John R Pierce wrote: > >> all the enterprise SAN guys I've talked with say the Intel x25 drives >> are consumer junk, about the only thing they will use is STEC Zeus, >> and even then they mirror them. > > A couple of points there. > > 1) Mirroring flash drives is a bit ill advised since flash has a > rather predictable long-term wear-out failure point. It would make > more sense to mirror with a mechanical disk and use the SSD for reads, > with some clever firmware to buffer up the extra writes to the > mechanical disk and return completed status as soon as the data has > been committed to the faster flash disk. Interesting, a few days ago I read something in the mdadm config about a config for mirroring over 'slower' links, and was waiting for a proper use case/excuse to go playing with it ;-) (looking up again)... -W, --write-mostly subsequent devices lists in a --build, --create, or --add command will be flagged as 'write-mostly'. This is valid for RAID1 only and means that the 'md' driver will avoid reading from these devices if at all possible. This can be useful if mirroring over a slow link. regards, Yeb Havinga