Thread: Test disk reliability (or HGST HTS721010A9E630 surprisingly reliable)

Test disk reliability (or HGST HTS721010A9E630 surprisingly reliable)

From
Félix GERZAGUET
Date:
Hello,

I am trying to assess disk reliability.
After reading http://www.postgresql.org/docs/current/static/wal-reliability.html, I tried the recommended diskchecker.pl but I am not satisfied:

I always get:
Total errors: 0

even if I tested with with a HGST HTS721010A9E630 that the vendor's datasheet (http://www.hgst.com/sites/default/files/resources/TS7K1000_ds.pdf) advertise as "
Designed for low duty cycle, non mission-critical applications in PC,nearline and consumer electronics environments, which vary application to application
"

Since it is not, a high end disk, I expect some errors.

Here is my methodology:

On another machine: disckchecker.pl -l
On the tested machine: disckchecker.pl -s IP_OF_OTHER_MACHINE create test_file 500
Perform an electrical reboot of the tested machine (the web application of my hosting service warn me that my application will not be properly stopped)
On the tested machine: disckchecker.pl -s IP_OF_OTHER_MACHINE verify test_file
[...]
Total errors: 0

I re-did the test 5 times. The tested machine is a real machine, not a VM.

Could you please correct me, If am doing something wrong, or give me some pointers to some other methods.

Félix

Re: Test disk reliability (or HGST HTS721010A9E630 surprisingly reliable)

From
Jim Nasby
Date:
On 12/20/15 1:09 PM, Félix GERZAGUET wrote:
> After reading
> http://www.postgresql.org/docs/current/static/wal-reliability.html, I
> tried the recommended diskchecker.pl
> <http://brad.livejournal.com/2116715.html> but I am not satisfied:
>
> I always get:
> Total errors: 0
>
> even if I tested with with a HGST HTS721010A9E630 that the vendor's
> datasheet
> (http://www.hgst.com/sites/default/files/resources/TS7K1000_ds.pdf)
> advertise as "
> Designed for low duty cycle, non mission-critical applications in
> PC,nearline and consumer electronics environments, which vary
> application to application
> "
>
> Since it is not, a high end disk, I expect some errors.

Why? Just because a disk isn't enterprise-grade doesn't mean it has to
lie about fsync, which is the only thing diskchecker.pl tests for.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: Test disk reliability (or HGST HTS721010A9E630 surprisingly reliable)

From
Félix GERZAGUET
Date:
On Mon, Dec 21, 2015 at 12:31 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 12/20/15 1:09 PM, Félix GERZAGUET wrote:
After reading
http://www.postgresql.org/docs/current/static/wal-reliability.html, I
tried the recommended diskchecker.pl
<http://brad.livejournal.com/2116715.html> but I am not satisfied:

I always get:
Total errors: 0

even if I tested with with a HGST HTS721010A9E630 that the vendor's
datasheet
(http://www.hgst.com/sites/default/files/resources/TS7K1000_ds.pdf)
advertise as "
Designed for low duty cycle, non mission-critical applications in
PC,nearline and consumer electronics environments, which vary
application to application
"

Since it is not, a high end disk, I expect some errors.

Why? Just because a disk isn't enterprise-grade doesn't mean it has to lie about fsync, which is the only thing diskchecker.pl tests for.

I was thinking that since the disk have a 32M write-cache (with not battery) it would lie to the OS (and postgres) about when data are really on disk (not in the disk write cache). But maybe that thinking was wrong.
 

Re: Test disk reliability (or HGST HTS721010A9E630 surprisingly reliable)

From
Bill Moran
Date:
On Mon, 21 Dec 2015 14:54:14 +0100
Félix GERZAGUET <felix.gerzaguet@gmail.com> wrote:

> On Mon, Dec 21, 2015 at 12:31 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
> wrote:
>
> > On 12/20/15 1:09 PM, Félix GERZAGUET wrote:
> >
> >> After reading
> >> http://www.postgresql.org/docs/current/static/wal-reliability.html, I
> >> tried the recommended diskchecker.pl
> >> <http://brad.livejournal.com/2116715.html> but I am not satisfied:
> >>
> >> I always get:
> >> Total errors: 0
> >>
> >> even if I tested with with a HGST HTS721010A9E630 that the vendor's
> >> datasheet
> >> (http://www.hgst.com/sites/default/files/resources/TS7K1000_ds.pdf)
> >> advertise as "
> >> Designed for low duty cycle, non mission-critical applications in
> >> PC,nearline and consumer electronics environments, which vary
> >> application to application
> >> "
> >>
> >> Since it is not, a high end disk, I expect some errors.
> >>
> >
> > Why? Just because a disk isn't enterprise-grade doesn't mean it has to lie
> > about fsync, which is the only thing diskchecker.pl tests for.
> >
>
> I was thinking that since the disk have a 32M write-cache (with not
> battery) it would lie to the OS (and postgres) about when data are really
> on disk (not in the disk write cache). But maybe that thinking was wrong.

It varies by vendor and product, which is why diskchecker.pl exists.
It's even possible that the behavior is configurable ... check to see
if the vendor provides a utility for configuring it.

--
Bill Moran


Re: Test disk reliability (or HGST HTS721010A9E630 surprisingly reliable)

From
Jim Nasby
Date:
On 12/21/15 8:22 AM, Bill Moran wrote:
>>> Why? Just because a disk isn't enterprise-grade doesn't mean it has to lie
>>> > >about fsync, which is the only thing diskchecker.pl tests for.
>>> > >
>> >
>> >I was thinking that since the disk have a 32M write-cache (with not
>> >battery) it would lie to the OS (and postgres) about when data are really
>> >on disk (not in the disk write cache). But maybe that thinking was wrong.

There are ways to make on-disk write caches safe without a battery. IIRC
some hard drives would use the inertia of the platter (turning the motor
into a generator) to write contents out on power-off. You could also use
a "super cap".

> It varies by vendor and product, which is why diskchecker.pl exists.
> It's even possible that the behavior is configurable ... check to see
> if the vendor provides a utility for configuring it.

Your OS might let you control it too; I know FreeBSD has support for
this. (Whether the drive obeys or not is a different matter...)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com