Thread: Data Corruption in case of abrupt failure
Hi,
I am trying to do some reliability tests on postgre SQL. I have use-case where the power can go off abruptly. I initiated 10,000 insert operations and pulled out the cable in the middle. I had auto-commit option turned on. I observed 2 out of 5 times the tables were totally corrupted and could not read any data whereas 3 times I was able to read the data which was inserted.
Is there any way that I could avoid that data corruption and ensure that atleast the records inserted till that point are available in the database. Or are there any tools through which I can recover the data in case the database gets corrupted?
Thanks in advance,
Sathish
Do you Yahoo!?
Yahoo! Search - Find what you�re looking for faster.
On Wed, Mar 03, 2004 at 04:27:33 -0800, satish satish <satish_ach2003@yahoo.com> wrote: > Hi, > > I am trying to do some reliability tests on postgre SQL. I have use-case where the power can go off abruptly. I initiated10,000 insert operations and pulled out the cable in the middle. I had auto-commit option turned on. I observed2 out of 5 times the tables were totally corrupted and could not read any data whereas 3 times I was able to readthe data which was inserted. > > Is there any way that I could avoid that data corruption and ensure that atleast the records inserted till that point areavailable in the database. Or are there any tools through which I can recover the data in case the database gets corrupted? Are you using IDE disks with write caching enabled? If so that is probably your problem.
On Wed, 3 Mar 2004, satish satish wrote: > Hi, > > I am trying to do some reliability tests on postgre SQL. I have > use-case where the power can go off abruptly. I initiated 10,000 insert > operations and pulled out the cable in the middle. I had auto-commit > option turned on. I observed 2 out of 5 times the tables were totally > corrupted and could not read any data whereas 3 times I was able to read > the data which was inserted. > > Is there any way that I could avoid that data corruption and ensure > that atleast the records inserted till that point are available in the > database. Or are there any tools through which I can recover the data in > case the database gets corrupted? There are a few steps to making sure your data can survive a power failure when writing. 1: Make sure fsync is enabled in postgresql.conf / postmaster startup. 2: Use a journaling file system. Meta data only is fine. 3: Use hardware that fsyncs and doesn't lie about it. 4: Ditto # 3 for your OS. Most the OSes pgsql runs on are fine. #3 above is the big sticker most of the time. write caching raid controllers without battery backup or write caching hard drives that lie about their fsync. IDEs drives are known to generally do this. SCSI drives generally don't. No one on the list has done a lot of testing with SATA, but if someone wants to send me a drive and a controller card I'd be quite happy to pull the power plug on my box to test it. :-0 Anyway, someone else on the list has reported that the Escalade IDE RAID controller passes the power pull test. I have personally tested the LSI / MegaRAID controllers (the U160 one) with battery backed cache and found they survive the power off test well. Someone else has tested the adaptec SCSI RAID controllers with battery backed cache and reported that they worked as well. Turning off the cache on IDE drives will drop your average performance to about 1/3 that of what you get with caching on. But it should make them reliable for power loss recovery. The command in linux is: hdparm -W0 /dev/hda. Replace hda with the drive you would like to disable the write caching for.
On Wed, 2004-03-03 at 23:27, satish satish wrote: > Hi, > > I am trying to do some reliability tests on postgre SQL. I have > use-case where the power can go off abruptly. I initiated 10,000 > insert operations and pulled out the cable in the middle. I had > auto-commit option turned on. I observed 2 out of 5 times the tables > were totally corrupted and could not read any data whereas 3 times I > was able to read the data which was inserted. > > Is there any way that I could avoid that data corruption and ensure > that atleast the records inserted till that point are available in the > database. Or are there any tools through which I can recover the data > in case the database gets corrupted? > > Thanks in advance, > Sathish > We'll need more information - what OS and what version of PostgreSQL at the least. Stephen
Attachment
Stephen Robert Norris <srn@commsecure.com.au> writes: > On Wed, 2004-03-03 at 23:27, satish satish wrote: >> I am trying to do some reliability tests on postgre SQL. I have >> use-case where the power can go off abruptly. I initiated 10,000 >> insert operations and pulled out the cable in the middle. I had >> auto-commit option turned on. I observed 2 out of 5 times the tables >> were totally corrupted and could not read any data whereas 3 times I >> was able to read the data which was inserted. > We'll need more information - what OS and what version of PostgreSQL at > the least. I doubt this has anything to do with either the OS or the Postgres version. My money is on IDE drives with write cache enabled. Use drives that don't lie about write completion, and you'll be in good shape. regards, tom lane
On Tue, 2004-03-16 at 16:44, Tom Lane wrote: > Stephen Robert Norris <srn@commsecure.com.au> writes: > > On Wed, 2004-03-03 at 23:27, satish satish wrote: > >> I am trying to do some reliability tests on postgre SQL. I have > >> use-case where the power can go off abruptly. I initiated 10,000 > >> insert operations and pulled out the cable in the middle. I had > >> auto-commit option turned on. I observed 2 out of 5 times the tables > >> were totally corrupted and could not read any data whereas 3 times I > >> was able to read the data which was inserted. > > > We'll need more information - what OS and what version of PostgreSQL at > > the least. > > I doubt this has anything to do with either the OS or the Postgres > version. My money is on IDE drives with write cache enabled. > Use drives that don't lie about write completion, and you'll be in > good shape. > > regards, tom lane I was wondering if he'd turned off fsync(), as seems common for some reason... Stephen
Attachment
Stephen Robert Norris <srn@commsecure.com.au> writes: > On Tue, 2004-03-16 at 16:44, Tom Lane wrote: >> My money is on IDE drives with write cache enabled. > I was wondering if he'd turned off fsync(), as seems common for some > reason... That would be bad too. But the nasty thing about IDE write caches is that even when you think you've fsync'd, you haven't :-( regards, tom lane
Quoting Tom Lane <tgl@sss.pgh.pa.us>: > Stephen Robert Norris <srn@commsecure.com.au> writes: > > On Wed, 2004-03-03 at 23:27, satish satish wrote: > >> I am trying to do some reliability tests on postgre SQL. I have > >> use-case where the power can go off abruptly. I initiated 10,000 > >> insert operations and pulled out the cable in the middle. I had > >> auto-commit option turned on. I observed 2 out of 5 times the tables > >> were totally corrupted and could not read any data whereas 3 times I > >> was able to read the data which was inserted. > > > We'll need more information - what OS and what version of PostgreSQL at > > the least. > > I doubt this has anything to do with either the OS or the Postgres > version. My money is on IDE drives with write cache enabled. > Use drives that don't lie about write completion, and you'll be in > good shape. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > I've read threads like this before and because I've never lost data on servers with IDE drives after doing some basic torture tests (e.g. pulling the plug in the middle of an update et al), I don't think I've paid close enough attention. Is there some definite way someone can test their IDE drives so see whether or not they are "lying" about write completions? Perhaps someone somewhere has a list of drives that are known to exhibit this problem- a IDE drive blacklist of sorts? -- Keith C. Perry, MS E.E. Director of Networks & Applications VCSN, Inc. http://vcsn.com ____________________________________ This email account is being host by: VCSN, Inc : http://vcsn.com
Keith C. Perry wrote: > Is there some definite way someone can test their IDE drives so see whether or > not they are "lying" about write completions? Perhaps someone somewhere has a > list of drives that are known to exhibit this problem- a IDE drive blacklist of > sorts? I believe using hdparm on linux, one can see whether or not write caching is enabled and also set the caching. I don't think any IDE disk would lie about write caching status. (If one does then it is really hard to find out unless brute force is used.) Just a thought.. Shridhar
"Keith C. Perry" <netadmin@vcsn.com> writes: > I've read threads like this before and because I've never lost data on > servers with IDE drives after doing some basic torture tests > (e.g. pulling the plug in the middle of an update et al), I don't > think I've paid close enough attention. On many IDE drives it is possible to turn write caching on and off with some incantation involving "hdparm" (don't have the details but you can probably find 'em in the list archives). Possibly your system is already configured safely. > Is there some definite way someone can test their IDE drives so see > whether or not they are "lying" about write completions? What I'd suggest is to set up a simple test involving a long string of very small transactions (a bunch of separate INSERTs into a table with no indexes works fine). Time it twice, once with "fsync" enabled and once without. If there's not a huge difference, your drive is lying. The reason this works is that successive commits will re-write the same page of WAL (at least till that page fills up, but you can fit quite a few small transactions per page). If fsync is actually waiting for the bits to hit disk, then it is physically impossible to commit more than one transaction per disk revolution for as long as the commit records are going to the same page. If you've got, say, a 10000 RPM disk then ideally you should see about 10000 commits per minute in this scenario. However, most modern CPUs can do way better than that if they don't have to wait for the disk to spin ... so if you see a transaction rate considerably better than your disk's RPM, you know the disk is lying about having written the WAL page. regards, tom lane
Hi, Postgres 7.3.4 Perl 5.6.0 and Perl 5.8.1 RH 7 I have to implement some new postgres functions that need LWP::Simple , but i've received errors in locating that module. By the way, the server contains 2 perl versions for the previous administrator had problems implementing scripts with LWP::Simple and has installed a second, higher-version of Perl i.e. 5.8.1. Upon running, the following trigger and functions in the db produced this error: ERROR: plperl: error from function : Can't locate LWP/Simple.pm in @INC (@INC contains: /usr/lib/perl5/5.6.0/i386-linux /usr/lib/perl5/5.6.0 /usr/lib/perl5/site_perl/5.6.0/i386-linux /usr/lib/perl5/site_perl/5.6.0 /usr/lib/perl5/site_perl .) at (eval 3) line 8. BEGIN failed--compilation aborted at (eval 3) line 8. The specific sequence of files included or processed is : /hello/world.cfm CREATE TRIGGER autokuda AFTER INSERT ON temp_spp_kuda_live FOR EACH ROW EXECUTE PROCEDURE sendkuda (); CREATE FUNCTION sendkuda () RETURNS "trigger" AS ' begin perform KudaRedirect (NEW.tkl_gateway, NEW.tkl_keyword , NEW.tkl_mobile); return NEW; end; ' LANGUAGE plpgsql; CREATE FUNCTION kudaredirect (character varying, character varying, character varying) RETURNS integer AS ' $gateway = $_[1]; $keyword = $_[2]; $mobileno = $_[3]; my $url = "http://www.helloworld.com/xxx.asp?gateway=$gateway&text=$keyword&from=$mobi leno"; use LWP::Simple; my $content = get $url; if (! defined $content) { system("/bin/echo KUDA: FAIL REDIRECT TO $gateway $keyword $mobileno >> /path/to/fail.log"); } system("/bin/echo KUDA: REDIRECT $gateway $keyword $mobileno >> /path/to/success.log"); return 0; ' LANGUAGE plperlu; When i tried to install LWP::Simple, it just installed in 5.8.1 by default. How can i point postgresql to use LWP::Simple in perl 5.8.1? Or at least install LWP::Simple in perl 5.6...Other solutions are very much welcomed. Also,I didn't uninstall perl 5.8 for i don't know what else depends on it. Thank you for the time taken to ponder this :) best regards siew hui
On Tue, 16 Mar 2004, Tom Lane wrote: > "Keith C. Perry" <netadmin@vcsn.com> writes: > > I've read threads like this before and because I've never lost data on > > servers with IDE drives after doing some basic torture tests > > (e.g. pulling the plug in the middle of an update et al), I don't > > think I've paid close enough attention. > > On many IDE drives it is possible to turn write caching on and off with > some incantation involving "hdparm" (don't have the details but you can > probably find 'em in the list archives). Possibly your system is > already configured safely. hdparm -W0 /dev/hda > > Is there some definite way someone can test their IDE drives so see > > whether or not they are "lying" about write completions? > > What I'd suggest is to set up a simple test involving a long string of > very small transactions (a bunch of separate INSERTs into a table with > no indexes works fine). Time it twice, once with "fsync" enabled and > once without. If there's not a huge difference, your drive is lying. pgbench is a nice candidate for this. pgbench -c 100 -t 100000
"scott.marlowe" <scott.marlowe@ihs.com> writes: > On Tue, 16 Mar 2004, Tom Lane wrote: >> What I'd suggest is to set up a simple test involving a long string of >> very small transactions (a bunch of separate INSERTs into a table with >> no indexes works fine). Time it twice, once with "fsync" enabled and >> once without. If there's not a huge difference, your drive is lying. > pgbench is a nice candidate for this. > pgbench -c 100 -t 100000 I wouldn't do that, first because pgbench transactions are relatively large (several updates per xact IIRC), and second because you'll be measuring contention effects as well as pure WAL write activity. If you simply must use pgbench for this, use -c 1 ... but it's surely easy enough to make a file of a few thousand copies of INSERT INTO foo VALUES(1); and feed it to psql. regards, tom lane
Does anyone know the equivalent to hdparm on FreeBSD? > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org]On Behalf Of scott.marlowe > Sent: Wednesday, March 17, 2004 11:27 AM > To: Tom Lane > Cc: Keith C. Perry; Stephen Robert Norris; satish satish; > pgsql-general@postgresql.org > Subject: Re: [GENERAL] Data Corruption in case of abrupt failure > > > On Tue, 16 Mar 2004, Tom Lane wrote: > > > "Keith C. Perry" <netadmin@vcsn.com> writes: > > > I've read threads like this before and because I've never lost data on > > > servers with IDE drives after doing some basic torture tests > > > (e.g. pulling the plug in the middle of an update et al), I don't > > > think I've paid close enough attention. > > > > On many IDE drives it is possible to turn write caching on and off with > > some incantation involving "hdparm" (don't have the details but you can > > probably find 'em in the list archives). Possibly your system is > > already configured safely. > > hdparm -W0 /dev/hda >
Rod K wrote: > Does anyone know the equivalent to hdparm on FreeBSD? Yes, something like: In FreeBSD, add "hw.ata.wc=0" to /boot/loader.conf. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, Mar 16, 2004 at 01:16:56PM +0530, Shridhar Daithankar wrote: > I believe using hdparm on linux, one can see whether or not write caching > is enabled and also set the caching. > > I don't think any IDE disk would lie about write caching status. (If one > does then it is really hard to find out unless brute force is used.) I have in fact seen IDE drives lie about this, although not recently. You can be sure it's lying if, even when you turn caching off, performance never changes. A -- Andrew Sullivan | ajs@crankycanuck.ca
Dear list, THe location of Simple.pm in my server is /usr/local/lib/perl5/site_perl/5.8.1/LWP/. The problem is solved by adding a code that points to LWP/Simple.pm before the "use LWP::Simple". <snip!> use lib "/usr/local/lib/perl5/site_perl/5.8.1/; use LWP::Simple; <snip!> It worked like a charm, but other problems cropped up. But that's more like a perl error that i shouldn't bother you guys :) thanks best regards, siew hui ----- Original Message ----- From: "Siew Hui, Wong" <shwong@sebasasia.com> To: <pgsql-general@postgresql.org> Sent: Wednesday, March 17, 2004 2:02 PM Subject: [GENERAL] LWP::Simple in Postgresql 7.3.4 > Hi, > > Postgres 7.3.4 > Perl 5.6.0 and Perl 5.8.1 > RH 7 > > I have to implement some new postgres functions that need LWP::Simple , but > i've received errors in locating that module. By the way, the server > contains 2 perl versions for the previous administrator had problems > implementing scripts with LWP::Simple and has installed a second, > higher-version of Perl i.e. 5.8.1. > > Upon running, the following trigger and functions in the db produced this > error: > > ERROR: plperl: error from function : Can't locate LWP/Simple.pm in @INC > (@INC contains: /usr/lib/perl5/5.6.0/i386-linux /usr/lib/perl5/5.6.0 > /usr/lib/perl5/site_perl/5.6.0/i386-linux /usr/lib/perl5/site_perl/5.6.0 > /usr/lib/perl5/site_perl .) at (eval 3) line 8. BEGIN failed--compilation > aborted at (eval 3) line 8. The specific sequence of files included or > processed is : /hello/world.cfm > > > CREATE TRIGGER autokuda > AFTER INSERT ON temp_spp_kuda_live > FOR EACH ROW > EXECUTE PROCEDURE sendkuda (); > > CREATE FUNCTION sendkuda () RETURNS "trigger" > AS ' > begin > perform KudaRedirect (NEW.tkl_gateway, NEW.tkl_keyword , NEW.tkl_mobile); > return NEW; > end; > ' > LANGUAGE plpgsql; > > > CREATE FUNCTION kudaredirect (character varying, character varying, > character varying) RETURNS integer > AS ' > $gateway = $_[1]; > $keyword = $_[2]; > $mobileno = $_[3]; > my $url = > "http://www.helloworld.com/xxx.asp?gateway=$gateway&text=$keyword&from=$mobi > leno"; > use LWP::Simple; > my $content = get $url; > if (! defined $content) { > system("/bin/echo KUDA: FAIL REDIRECT TO $gateway $keyword $mobileno >> > /path/to/fail.log"); > } > > system("/bin/echo KUDA: REDIRECT $gateway $keyword $mobileno >> > /path/to/success.log"); > return 0; > ' > LANGUAGE plperlu; > > > When i tried to install LWP::Simple, it just installed in 5.8.1 by default. > How can i point postgresql to use LWP::Simple in perl 5.8.1? Or at least > install LWP::Simple in perl 5.6...Other solutions are very much welcomed. > > Also,I didn't uninstall perl 5.8 for i don't know what else depends on it. > > Thank you for the time taken to ponder this :) > > > best regards > siew hui > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >