Thread: Data Corruption in case of abrupt failure

Data Corruption in case of abrupt failure

From
satish satish
Date:
Hi,
 
I am trying to do some reliability tests on postgre SQL. I have use-case where the power can go off abruptly. I initiated 10,000 insert operations and pulled out the cable in the middle. I had auto-commit option turned on. I observed 2 out of 5 times the tables were totally corrupted and could not read any data whereas 3 times I was able to read the data which was inserted.
 
Is there any way that I could avoid that data corruption and ensure that atleast the records inserted till that point are available in the database. Or are there any tools through which I can recover the data in case the database gets corrupted?
 
Thanks in advance,
Sathish
 


Do you Yahoo!?
Yahoo! Search - Find what you�re looking for faster.

Re: Data Corruption in case of abrupt failure

From
Bruno Wolff III
Date:
On Wed, Mar 03, 2004 at 04:27:33 -0800,
  satish satish <satish_ach2003@yahoo.com> wrote:
> Hi,
>
> I am trying to do some reliability tests on postgre SQL. I have use-case where the power can go off abruptly. I
initiated10,000 insert operations and pulled out the cable in the middle. I had auto-commit option turned on. I
observed2 out of 5 times the tables were totally corrupted and could not read any data whereas 3 times I was able to
readthe data which was inserted. 
>
> Is there any way that I could avoid that data corruption and ensure that atleast the records inserted till that point
areavailable in the database. Or are there any tools through which I can recover the data in case the database gets
corrupted?

Are you using IDE disks with write caching enabled? If so that is probably
your problem.

Re: Data Corruption in case of abrupt failure

From
"scott.marlowe"
Date:
On Wed, 3 Mar 2004, satish satish wrote:

> Hi,
>
> I am trying to do some reliability tests on postgre SQL. I have
> use-case where the power can go off abruptly. I initiated 10,000 insert
> operations and pulled out the cable in the middle. I had auto-commit
> option turned on. I observed 2 out of 5 times the tables were totally
> corrupted and could not read any data whereas 3 times I was able to read
> the data which was inserted.
>
> Is there any way that I could avoid that data corruption and ensure
> that atleast the records inserted till that point are available in the
> database. Or are there any tools through which I can recover the data in
> case the database gets corrupted?

There are a few steps to making sure your data can survive a power failure
when writing.

1:  Make sure fsync is enabled in postgresql.conf / postmaster startup.
2:  Use a journaling file system.  Meta data only is fine.
3:  Use hardware that fsyncs and doesn't lie about it.
4:  Ditto # 3 for your OS.  Most the OSes pgsql runs on are fine.

#3 above is the big sticker most of the time.  write caching raid
controllers without battery backup or write caching hard drives that lie
about their fsync.  IDEs drives are known to generally do this.  SCSI
drives generally don't.  No one on the list has done a lot of testing with
SATA, but if someone wants to send me a drive and a controller card I'd be
quite happy to pull the power plug on my box to test it.  :-0

Anyway, someone else on the list has reported that the Escalade IDE RAID
controller passes the power pull test.  I have personally tested the LSI /
MegaRAID controllers (the U160 one) with battery backed cache and found
they survive the power off test well.  Someone else has tested the
adaptec SCSI RAID controllers with battery backed cache and reported that
they worked as well.  Turning off the cache on IDE drives will drop your
average performance to about 1/3 that of what you get with caching on.
But it should make them reliable for power loss recovery.  The command in
linux is: hdparm -W0 /dev/hda.  Replace hda with the drive you would like
to disable the write caching for.


Re: Data Corruption in case of abrupt failure

From
Stephen Robert Norris
Date:
On Wed, 2004-03-03 at 23:27, satish satish wrote:
> Hi,
>
> I am trying to do some reliability tests on postgre SQL. I have
> use-case where the power can go off abruptly. I initiated 10,000
> insert operations and pulled out the cable in the middle. I had
> auto-commit option turned on. I observed 2 out of 5 times the tables
> were totally corrupted and could not read any data whereas 3 times I
> was able to read the data which was inserted.
>
> Is there any way that I could avoid that data corruption and ensure
> that atleast the records inserted till that point are available in the
> database. Or are there any tools through which I can recover the data
> in case the database gets corrupted?
>
> Thanks in advance,
> Sathish
>

We'll need more information - what OS and what version of PostgreSQL at
the least.

    Stephen

Attachment

Re: Data Corruption in case of abrupt failure

From
Tom Lane
Date:
Stephen Robert Norris <srn@commsecure.com.au> writes:
> On Wed, 2004-03-03 at 23:27, satish satish wrote:
>> I am trying to do some reliability tests on postgre SQL. I have
>> use-case where the power can go off abruptly. I initiated 10,000
>> insert operations and pulled out the cable in the middle. I had
>> auto-commit option turned on. I observed 2 out of 5 times the tables
>> were totally corrupted and could not read any data whereas 3 times I
>> was able to read the data which was inserted.

> We'll need more information - what OS and what version of PostgreSQL at
> the least.

I doubt this has anything to do with either the OS or the Postgres
version.  My money is on IDE drives with write cache enabled.
Use drives that don't lie about write completion, and you'll be in
good shape.

            regards, tom lane

Re: Data Corruption in case of abrupt failure

From
Stephen Robert Norris
Date:
On Tue, 2004-03-16 at 16:44, Tom Lane wrote:
> Stephen Robert Norris <srn@commsecure.com.au> writes:
> > On Wed, 2004-03-03 at 23:27, satish satish wrote:
> >> I am trying to do some reliability tests on postgre SQL. I have
> >> use-case where the power can go off abruptly. I initiated 10,000
> >> insert operations and pulled out the cable in the middle. I had
> >> auto-commit option turned on. I observed 2 out of 5 times the tables
> >> were totally corrupted and could not read any data whereas 3 times I
> >> was able to read the data which was inserted.
>
> > We'll need more information - what OS and what version of PostgreSQL at
> > the least.
>
> I doubt this has anything to do with either the OS or the Postgres
> version.  My money is on IDE drives with write cache enabled.
> Use drives that don't lie about write completion, and you'll be in
> good shape.
>
>             regards, tom lane

I was wondering if he'd turned off fsync(), as seems common for some
reason...

    Stephen

Attachment

Re: Data Corruption in case of abrupt failure

From
Tom Lane
Date:
Stephen Robert Norris <srn@commsecure.com.au> writes:
> On Tue, 2004-03-16 at 16:44, Tom Lane wrote:
>> My money is on IDE drives with write cache enabled.

> I was wondering if he'd turned off fsync(), as seems common for some
> reason...

That would be bad too.  But the nasty thing about IDE write caches is
that even when you think you've fsync'd, you haven't :-(

            regards, tom lane

Re: Data Corruption in case of abrupt failure

From
"Keith C. Perry"
Date:
Quoting Tom Lane <tgl@sss.pgh.pa.us>:

> Stephen Robert Norris <srn@commsecure.com.au> writes:
> > On Wed, 2004-03-03 at 23:27, satish satish wrote:
> >> I am trying to do some reliability tests on postgre SQL. I have
> >> use-case where the power can go off abruptly. I initiated 10,000
> >> insert operations and pulled out the cable in the middle. I had
> >> auto-commit option turned on. I observed 2 out of 5 times the tables
> >> were totally corrupted and could not read any data whereas 3 times I
> >> was able to read the data which was inserted.
>
> > We'll need more information - what OS and what version of PostgreSQL at
> > the least.
>
> I doubt this has anything to do with either the OS or the Postgres
> version.  My money is on IDE drives with write cache enabled.
> Use drives that don't lie about write completion, and you'll be in
> good shape.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

I've read threads like this before and because I've never lost data on servers
with IDE drives after doing some basic torture tests (e.g. pulling the plug in
the middle of an update et al), I don't think I've paid close enough attention.

Is there some definite way someone can test their IDE drives so see whether or
not they are "lying" about write completions?  Perhaps someone somewhere has a
list of drives that are known to exhibit this problem- a IDE drive blacklist of
sorts?


--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

Re: Data Corruption in case of abrupt failure

From
Shridhar Daithankar
Date:
Keith C. Perry wrote:
> Is there some definite way someone can test their IDE drives so see whether or
> not they are "lying" about write completions?  Perhaps someone somewhere has a
> list of drives that are known to exhibit this problem- a IDE drive blacklist of
> sorts?

I believe using hdparm on linux, one can see whether or not write caching is
enabled and also set the caching.

I don't think any IDE disk would lie about write caching status. (If one does
then it is really hard to find out unless brute force is used.)

Just a thought..

  Shridhar


Re: Data Corruption in case of abrupt failure

From
Tom Lane
Date:
"Keith C. Perry" <netadmin@vcsn.com> writes:
> I've read threads like this before and because I've never lost data on
> servers with IDE drives after doing some basic torture tests
> (e.g. pulling the plug in the middle of an update et al), I don't
> think I've paid close enough attention.

On many IDE drives it is possible to turn write caching on and off with
some incantation involving "hdparm" (don't have the details but you can
probably find 'em in the list archives).  Possibly your system is
already configured safely.

> Is there some definite way someone can test their IDE drives so see
> whether or not they are "lying" about write completions?

What I'd suggest is to set up a simple test involving a long string of
very small transactions (a bunch of separate INSERTs into a table with
no indexes works fine).  Time it twice, once with "fsync" enabled and
once without.  If there's not a huge difference, your drive is lying.

The reason this works is that successive commits will re-write the same
page of WAL (at least till that page fills up, but you can fit quite
a few small transactions per page).  If fsync is actually waiting for
the bits to hit disk, then it is physically impossible to commit more
than one transaction per disk revolution for as long as the commit
records are going to the same page.  If you've got, say, a 10000 RPM
disk then ideally you should see about 10000 commits per minute in this
scenario.  However, most modern CPUs can do way better than that if they
don't have to wait for the disk to spin ... so if you see a transaction
rate considerably better than your disk's RPM, you know the disk is
lying about having written the WAL page.

            regards, tom lane

LWP::Simple in Postgresql 7.3.4

From
"Siew Hui, Wong"
Date:
Hi,

Postgres 7.3.4
Perl 5.6.0 and Perl 5.8.1
RH 7

I have to implement some new postgres functions that need LWP::Simple , but
i've received errors in locating that module. By the way, the server
contains 2 perl versions for the previous administrator had problems
implementing scripts with LWP::Simple and has installed a second,
higher-version of Perl i.e. 5.8.1.

Upon running, the following trigger and functions in the db produced this
error:

ERROR: plperl: error from function : Can't locate LWP/Simple.pm in @INC
(@INC contains: /usr/lib/perl5/5.6.0/i386-linux /usr/lib/perl5/5.6.0
/usr/lib/perl5/site_perl/5.6.0/i386-linux /usr/lib/perl5/site_perl/5.6.0
/usr/lib/perl5/site_perl .) at (eval 3) line 8. BEGIN failed--compilation
aborted at (eval 3) line 8. The specific sequence of files included or
processed is : /hello/world.cfm


CREATE TRIGGER autokuda
    AFTER INSERT ON temp_spp_kuda_live
    FOR EACH ROW
    EXECUTE PROCEDURE sendkuda ();

CREATE FUNCTION sendkuda () RETURNS "trigger"
    AS '
 begin
  perform KudaRedirect (NEW.tkl_gateway, NEW.tkl_keyword , NEW.tkl_mobile);
  return NEW;
 end;
'
    LANGUAGE plpgsql;


CREATE FUNCTION kudaredirect (character varying, character varying,
character varying) RETURNS integer
    AS '
 $gateway = $_[1];
 $keyword = $_[2];
 $mobileno = $_[3];
 my $url =
"http://www.helloworld.com/xxx.asp?gateway=$gateway&text=$keyword&from=$mobi
leno";
 use LWP::Simple;
  my $content = get $url;
  if (! defined $content) {
  system("/bin/echo KUDA: FAIL REDIRECT TO $gateway $keyword $mobileno >>
/path/to/fail.log");
 }

 system("/bin/echo KUDA: REDIRECT $gateway $keyword $mobileno >>
/path/to/success.log");
 return 0;
'
    LANGUAGE plperlu;


When i tried to install LWP::Simple, it just installed in 5.8.1 by default.
How can i point postgresql to use LWP::Simple in perl 5.8.1? Or at least
install LWP::Simple in perl 5.6...Other solutions are very much welcomed.

Also,I didn't uninstall perl 5.8 for i don't know what else depends on it.

Thank you for the time taken to ponder this :)


best regards
siew hui




Re: Data Corruption in case of abrupt failure

From
"scott.marlowe"
Date:
On Tue, 16 Mar 2004, Tom Lane wrote:

> "Keith C. Perry" <netadmin@vcsn.com> writes:
> > I've read threads like this before and because I've never lost data on
> > servers with IDE drives after doing some basic torture tests
> > (e.g. pulling the plug in the middle of an update et al), I don't
> > think I've paid close enough attention.
>
> On many IDE drives it is possible to turn write caching on and off with
> some incantation involving "hdparm" (don't have the details but you can
> probably find 'em in the list archives).  Possibly your system is
> already configured safely.

hdparm -W0 /dev/hda

> > Is there some definite way someone can test their IDE drives so see
> > whether or not they are "lying" about write completions?
>
> What I'd suggest is to set up a simple test involving a long string of
> very small transactions (a bunch of separate INSERTs into a table with
> no indexes works fine).  Time it twice, once with "fsync" enabled and
> once without.  If there's not a huge difference, your drive is lying.

pgbench is a nice candidate for this.

pgbench -c 100 -t 100000


Re: Data Corruption in case of abrupt failure

From
Tom Lane
Date:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> On Tue, 16 Mar 2004, Tom Lane wrote:
>> What I'd suggest is to set up a simple test involving a long string of
>> very small transactions (a bunch of separate INSERTs into a table with
>> no indexes works fine).  Time it twice, once with "fsync" enabled and
>> once without.  If there's not a huge difference, your drive is lying.

> pgbench is a nice candidate for this.
> pgbench -c 100 -t 100000

I wouldn't do that, first because pgbench transactions are relatively
large (several updates per xact IIRC), and second because you'll be
measuring contention effects as well as pure WAL write activity.
If you simply must use pgbench for this, use -c 1 ... but it's surely
easy enough to make a file of a few thousand copies of
    INSERT INTO foo VALUES(1);
and feed it to psql.

            regards, tom lane


Re: Data Corruption in case of abrupt failure

From
"Rod K"
Date:
Does anyone know the equivalent to hdparm on FreeBSD?

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of scott.marlowe
> Sent: Wednesday, March 17, 2004 11:27 AM
> To: Tom Lane
> Cc: Keith C. Perry; Stephen Robert Norris; satish satish;
> pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Data Corruption in case of abrupt failure
>
>
> On Tue, 16 Mar 2004, Tom Lane wrote:
>
> > "Keith C. Perry" <netadmin@vcsn.com> writes:
> > > I've read threads like this before and because I've never lost data on
> > > servers with IDE drives after doing some basic torture tests
> > > (e.g. pulling the plug in the middle of an update et al), I don't
> > > think I've paid close enough attention.
> >
> > On many IDE drives it is possible to turn write caching on and off with
> > some incantation involving "hdparm" (don't have the details but you can
> > probably find 'em in the list archives).  Possibly your system is
> > already configured safely.
>
> hdparm -W0 /dev/hda
>



Re: Data Corruption in case of abrupt failure

From
Bruce Momjian
Date:
Rod K wrote:
> Does anyone know the equivalent to hdparm on FreeBSD?

Yes, something like:

    In FreeBSD, add "hw.ata.wc=0" to /boot/loader.conf.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Data Corruption in case of abrupt failure

From
Andrew Sullivan
Date:
On Tue, Mar 16, 2004 at 01:16:56PM +0530, Shridhar Daithankar wrote:

> I believe using hdparm on linux, one can see whether or not write caching
> is enabled and also set the caching.
>
> I don't think any IDE disk would lie about write caching status. (If one
> does then it is really hard to find out unless brute force is used.)

I have in fact seen IDE drives lie about this, although not recently.
You can be sure it's lying if, even when you turn caching off,
performance never changes.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca

Re: LWP::Simple in Postgresql 7.3.4 - SOLUTION

From
"Siew Hui, Wong"
Date:
Dear list,

THe location of  Simple.pm in my server is
/usr/local/lib/perl5/site_perl/5.8.1/LWP/. The problem is solved by adding a
code that points to LWP/Simple.pm before the "use LWP::Simple".

<snip!>

use lib "/usr/local/lib/perl5/site_perl/5.8.1/;
use LWP::Simple;

<snip!>

It worked like a charm, but other problems cropped up. But that's more like
a perl error that i shouldn't bother you guys :)

thanks

best regards,
siew hui



----- Original Message -----
From: "Siew Hui, Wong" <shwong@sebasasia.com>
To: <pgsql-general@postgresql.org>
Sent: Wednesday, March 17, 2004 2:02 PM
Subject: [GENERAL] LWP::Simple in Postgresql 7.3.4


> Hi,
>
> Postgres 7.3.4
> Perl 5.6.0 and Perl 5.8.1
> RH 7
>
> I have to implement some new postgres functions that need LWP::Simple ,
but
> i've received errors in locating that module. By the way, the server
> contains 2 perl versions for the previous administrator had problems
> implementing scripts with LWP::Simple and has installed a second,
> higher-version of Perl i.e. 5.8.1.
>
> Upon running, the following trigger and functions in the db produced this
> error:
>
> ERROR: plperl: error from function : Can't locate LWP/Simple.pm in @INC
> (@INC contains: /usr/lib/perl5/5.6.0/i386-linux /usr/lib/perl5/5.6.0
> /usr/lib/perl5/site_perl/5.6.0/i386-linux /usr/lib/perl5/site_perl/5.6.0
> /usr/lib/perl5/site_perl .) at (eval 3) line 8. BEGIN failed--compilation
> aborted at (eval 3) line 8. The specific sequence of files included or
> processed is : /hello/world.cfm
>
>
> CREATE TRIGGER autokuda
>     AFTER INSERT ON temp_spp_kuda_live
>     FOR EACH ROW
>     EXECUTE PROCEDURE sendkuda ();
>
> CREATE FUNCTION sendkuda () RETURNS "trigger"
>     AS '
>  begin
>   perform KudaRedirect (NEW.tkl_gateway, NEW.tkl_keyword ,
NEW.tkl_mobile);
>   return NEW;
>  end;
> '
>     LANGUAGE plpgsql;
>
>
> CREATE FUNCTION kudaredirect (character varying, character varying,
> character varying) RETURNS integer
>     AS '
>  $gateway = $_[1];
>  $keyword = $_[2];
>  $mobileno = $_[3];
>  my $url =
>
"http://www.helloworld.com/xxx.asp?gateway=$gateway&text=$keyword&from=$mobi
> leno";
>  use LWP::Simple;
>   my $content = get $url;
>   if (! defined $content) {
>   system("/bin/echo KUDA: FAIL REDIRECT TO $gateway $keyword $mobileno >>
> /path/to/fail.log");
>  }
>
>  system("/bin/echo KUDA: REDIRECT $gateway $keyword $mobileno >>
> /path/to/success.log");
>  return 0;
> '
>     LANGUAGE plperlu;
>
>
> When i tried to install LWP::Simple, it just installed in 5.8.1 by
default.
> How can i point postgresql to use LWP::Simple in perl 5.8.1? Or at least
> install LWP::Simple in perl 5.6...Other solutions are very much welcomed.
>
> Also,I didn't uninstall perl 5.8 for i don't know what else depends on it.
>
> Thank you for the time taken to ponder this :)
>
>
> best regards
> siew hui
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>