Thread: Win32 Powerfail testing - results

Win32 Powerfail testing - results

From
"Dave Page"
Date:
Well the results are finally in. Hopefully we can concentrate on putting
them right, rather than having a round of "told you so's" :-)

I modified the test program slightly to improve the consistency checks.
The updated version is attached.

Regards, Dave.

System
======

Gigabyte GA-6VTXD Motherboard
Dual 1GHz PIII Processors
1Gb Non-ECC RAM
Fujitsu MPG3240AH IDE Disk Drive

Enhanced IDE Performance disabled in the BIOS.

Test
====

Test program run from a seperate machine.
20 Tests per OS.
Powerfail randomly applied.


Windows 2000 Testing
====================

Write back cache on IDE disk disabled.
Clean installation of Windows 2000 Server with Service Pack 3

Run | Errors Detected
=============================================================
 01 | None
 02 | None
 03 | None
 04 | None
 05 | None
 06 | None
 07 | COUNT CHECK - Duplicate or missing rows detected (10262)!!
 08 | None
 09 | DISTINCT CHECK - Duplicate or missing rows detected (9893)!!
    | COUNT CHECK - Duplicate or missing rows detected (9893)!!
 10 | None
 11 | None
 12 | None
 13 | None
 14 | COUNT CHECK - Duplicate or missing rows detected (10024)!!
 15 | None
 16 | None
 17 | None
 18 | None
 19 | None
 20 | None

Linux Testing
=============

Clean installation of Slackware Linux 8.1 on ext3
Kernel 2.4.18

Run | Errors Detected
=============================================================
 01 | None
 02 | None
 03 | None
 04 | None
 05 | None
 06 | None
 07 | None
 08 | None
 09 | None
 10 | None
 11 | None
 12 | None
 13 | None
 14 | None
 15 | None
 16 | None
 17 | None
 18 | None
 19 | None
 20 | None

Attachment

Re: Win32 Powerfail testing - results

From
Vince Vielhaber
Date:
On Mon, 3 Feb 2003, Dave Page wrote:

> Well the results are finally in. Hopefully we can concentrate on putting
> them right, rather than having a round of "told you so's" :-)
>
> I modified the test program slightly to improve the consistency checks.
> The updated version is attached.

[...]

>
> Run | Errors Detected
> =============================================================
>  07 | COUNT CHECK - Duplicate or missing rows detected (10262)!!
>  09 | DISTINCT CHECK - Duplicate or missing rows detected (9893)!!
>     | COUNT CHECK - Duplicate or missing rows detected (9893)!!
>  14 | COUNT CHECK - Duplicate or missing rows detected (10024)!!

Out of curiousity, what was required to return things to normal
again?

Vince.
-- Fast, inexpensive internet service 56k and beyond!  http://www.pop4.net/  http://www.meanstreamradio.com
http://www.unknown-artists.com       Internet radio: It's not file sharing, it's just radio.
 



Re: Win32 Powerfail testing - results

From
Rod Taylor
Date:
> I modified the test program slightly to improve the consistency checks.
> The updated version is attached.

For curiosity sake, I've compiled it and am running it on FreeBSD with
soft-updates enabled.

A few variable declarations needed to be bumped up to the top of their
respective function.

Any change of tossing in a periodic VACUUM or would that throw off the
results?


--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Re: Win32 Powerfail testing - results

From
"Dave Page"
Date:
 Vince Vielhaber allegedly said:
> On Mon, 3 Feb 2003, Dave Page wrote:
>
>> Run | Errors Detected
>> =============================================================
>>  07 | COUNT CHECK - Duplicate or missing rows detected (10262)!! 09 |
>>  DISTINCT CHECK - Duplicate or missing rows detected (9893)!!
>>     | COUNT CHECK - Duplicate or missing rows detected (9893)!!
>>  14 | COUNT CHECK - Duplicate or missing rows detected (10024)!!
>
> Out of curiousity, what was required to return things to normal
> again?

I ran the test app in reset mode which drops the table, then re-creates it
and populates it with fresh data. I thought it best to drop first to
eliminate possible problems with corrupt, but invisible tuples (if such a
thing could have occured).
Regards, Dave.




Re: Win32 Powerfail testing - results

From
"Dave Page"
Date:
 Rod Taylor allegedly said:
>> I modified the test program slightly to improve the consistency
>> checks. The updated version is attached.
>
> For curiosity sake, I've compiled it and am running it on FreeBSD with
> soft-updates enabled.
>
> A few variable declarations needed to be bumped up to the top of their
> respective function.

I've been doing a fair bit of C++ recently...

> Any change of tossing in a periodic VACUUM or would that throw off the
> results?

Dunno, Tom could best answer that, but a *complete guess* based on piecing
together tidbits of how it all works from various threads here, would be
that it would merely increase the time period during which a powerfail
would be unlikely to cause duplicate rows. Reasoning for this is that
vacuum would be messing with tuples that are already dead.
Please correct me if I'm wrong :-)

Regards, Dave.




Re: Win32 Powerfail testing - results

From
Tom Lane
Date:
"Dave Page" <dpage@vale-housing.co.uk> writes:
>  Rod Taylor allegedly said:
>> Any change of tossing in a periodic VACUUM or would that throw off the
>> results?

> Dunno, Tom could best answer that, but a *complete guess* based on piecing
> together tidbits of how it all works from various threads here, would be
> that it would merely increase the time period during which a powerfail
> would be unlikely to cause duplicate rows. Reasoning for this is that
> vacuum would be messing with tuples that are already dead.

I think it'd be interesting to try it both ways.  VACUUM might throw in
new failure modes.  I'm not sure if it could mask the failure mode you
already found.
        regards, tom lane


Re: Win32 Powerfail testing - results

From
"Dave Page"
Date:

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: 03 February 2003 21:52
> To: Dave Page
> Cc: rbt@rbt.ca; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Win32 Powerfail testing - results
>
>
> "Dave Page" <dpage@vale-housing.co.uk> writes:
> >  Rod Taylor allegedly said:
> >> Any change of tossing in a periodic VACUUM or would that throw off
> >> the results?
>
> > Dunno, Tom could best answer that, but a *complete guess* based on
> > piecing together tidbits of how it all works from various threads
> > here, would be that it would merely increase the time period during
> > which a powerfail would be unlikely to cause duplicate
> rows. Reasoning
> > for this is that vacuum would be messing with tuples that
> are already
> > dead.
>
> I think it'd be interesting to try it both ways.  VACUUM
> might throw in new failure modes.  I'm not sure if it could
> mask the failure mode you already found.

OK, I'll bung Win2K back on the test box tomorrow. Any preference as to
the type of vacuum? I assume full would be most likely to cause
problems. I'll add the vacuum after the commit...

Regards, Dave.


Re: Win32 Powerfail testing - results

From
Hannu Krosing
Date:
Dave Page kirjutas E, 03.02.2003 kell 18:51:
> Well the results are finally in. Hopefully we can concentrate on putting
> them right, rather than having a round of "told you so's" :-)
> 
> I modified the test program slightly to improve the consistency checks.
> The updated version is attached.
> 
> Regards, Dave.
> 
> System
> ======
> 
> Gigabyte GA-6VTXD Motherboard
> Dual 1GHz PIII Processors
> 1Gb Non-ECC RAM
> Fujitsu MPG3240AH IDE Disk Drive
> 
> Enhanced IDE Performance disabled in the BIOS.
> 
> Test
> ====
> 
> Test program run from a seperate machine.
> 20 Tests per OS.
> Powerfail randomly applied.

Your hardware should also be able to run Postgres on BeOS

http://www.bebits.com/app/2752

Being the only non-unix "port" before/besides win32, it could be an
interesting excercise.

You should be able to get and installable BeOS itself from SourceForge

http://sourceforge.net/projects/crux/

> Windows 2000 Testing
> ====================

Is this NTFS ?

Any possibility of trying the same tests with SCSI disks ?

> Write back cache on IDE disk disabled.
> Clean installation of Windows 2000 Server with Service Pack 3
> 
> Run | Errors Detected
> =============================================================
>  01 | None
>  02 | None
>  03 | None
>  04 | None
>  05 | None
>  06 | None
>  07 | COUNT CHECK - Duplicate or missing rows detected (10262)!!
>  08 | None
>  09 | DISTINCT CHECK - Duplicate or missing rows detected (9893)!!

I remember having problems with UNIQUE columns having duplicate values a
few versions back on Linux-ext2-IDE. Could this be the same problem or
must it be something completely different ?

>     | COUNT CHECK - Duplicate or missing rows detected (9893)!!
>  10 | None
>  11 | None
>  12 | None
>  13 | None
>  14 | COUNT CHECK - Duplicate or missing rows detected (10024)!!
>  15 | None
>  16 | None
>  17 | None
>  18 | None
>  19 | None
>  20 | None
> 
> Linux Testing
> =============
> 
> Clean installation of Slackware Linux 8.1 on ext3
> Kernel 2.4.18
> 
> Run | Errors Detected
> =============================================================
>  01 | None
> ...
>  20 | None

BTW, are the tests portable enough to run also on MSSQL, Oracle and DB2
?

I know that you can't publish exact results, but perhaps something like
the GreatBridge results - the one that runs only on Win32 did so-and-so,
the one that has 'i' at the end of version number this, and the one
whose name consists of two letters and a number did that ?

-- 
Hannu Krosing <hannu@tm.ee>


Re: Win32 Powerfail testing - results

From
"Dave Page"
Date:

> -----Original Message-----
> From: Hannu Krosing [mailto:hannu@tm.ee]
> Sent: 03 February 2003 22:30
> To: Dave Page
> Cc: PostgreSQL Hackers; Katie Ward
> Subject: Re: [HACKERS] Win32 Powerfail testing - results
>
>
> Your hardware should also be able to run Postgres on BeOS
>
> http://www.bebits.com/app/2752
>
> Being the only non-unix "port" before/besides win32, it could
> be an interesting excercise.

One that will have to go untested I'm afraid. These tests take a fair
while and you know how many pies I've got my fingers in right now just
on this project, never mind my paying gig and Uni!!

> > Windows 2000 Testing
> > ====================
>
> Is this NTFS ?

Yes.

> Any possibility of trying the same tests with SCSI disks ?

Depends on my time. I have a couple of 29160's and some Seagate Cheetah
X15's knocking about.

>
> I remember having problems with UNIQUE columns having
> duplicate values a few versions back on Linux-ext2-IDE. Could
> this be the same problem or must it be something completely
> different ?

Pass. I don't know the details of your problem, or how Peerdirect have
handled the IO. If I'm honest, I'm probably not experienced enough in
that sort of thing to know what's going wrong anyway :-(

>
> BTW, are the tests portable enough to run also on MSSQL,
> Oracle and DB2 ?

Well I posted the source. If you pull out the libpq stuff then I guess
so. I only have DB2 and MSSQL here though (and they both fall over at
will anyway). Again though, I can't really spend time testing them just
for interest's sake (not at present anyway).

Regards, Dave.