Re: IDE and write cache - Mailing list pgsql-admin
From | scott.marlowe |
---|---|
Subject | Re: IDE and write cache |
Date | |
Msg-id | Pine.LNX.4.33.0402110839430.32376-100000@css120.ihs.com Whole thread Raw |
In response to | IDE and write cache (Mark Lubratt <mark.lubratt@indeq.com>) |
List | pgsql-admin |
On Wed, 11 Feb 2004, Mark Lubratt wrote: > Interesting discussions on IDE drives and their write caches. > > I have a question... > > You mentioned that you'd see the problem during a large number of > concurrent transactions. My question is, is this a necessary condition > for the database crashing when the plug was pulled, or did you need use > a large number of concurrent transactions to "guarantee" that when you > pulled the plug, that it would be at an inopportune time? In other > words, is an IDE drive still "more" susceptible to a power outage > problem even under light load? Basically, if the data has been written to WAL, and an fsync issued, and the drive has it in cache, but hasn't written it to the platters, and you lose power, the database will likely be corrupted and will refuse to startup when the machine boots up. Also, of course, some data will be lost that was supposedly committed in a transaction. So, yeah, the reason for having hundreds of open transactions is that it makes the window of opportunity for a lying drive to corrupt the database. So, yes, even under light load, you could have a corrupted database if you lose power while a write is happening. Of course, if the database is sitting idle at the time of the power outage then you're ok. ------------------------------------------------------------------------- Funny little story. We had an electrician working above our main power switch (the big box that switches us from line power, to UPS, to the diesel generator) and said electrician clipped a piece of wire that fell into the switch, shorting it out, and taking down our entire hosting center (think $1,000 a minute...) As I was walking down a hallway, one of the winders / fox pro guys asked me if my machine would come back up when the power came on (it runs on dial 36 gig 10krpm SCSI drives under an LSI megaraid with battery backed cache, and I've tested it pulling the plug before going production.) I'd been bragging to him about the power plug pull tests it had passed, so of course, he's just teasing me. I told him that as long as the power cut hadn't spiked the box and fried anything we were gold. An hour later when they got the switch fixed and everything came back up, my machine came up fine, but the NAS machines that provide the web storage behind it (not the database, that's local) took about 10 minutes to fsck or mount or whatever it is they do. So I'm walking by foxpro guy's desk and I casually say "Well, looks like my box had some problems coming back up." He smiles, thinking he's got me, the bragging postgresql guy, by the short ones. "yeah, seems it boots faster than the network storage it sits on. Just CTRL-ALT-DEL and it was up and running fine." He laughed along with me. I trust Postgresql. On SCSI or RAID with battery backed cache.
pgsql-admin by date: