Thread: Postgresql data integrity during RAID10 drive rebuild

Postgresql data integrity during RAID10 drive rebuild

From
"Steve Poe"
Date:
I need some input from the Postgresql community.

Our animal hospital runs Postgresql 7.4 on a 6-disc RAID10. The database logs are on a separate RAID1.

We're using an LSI MegaRAID 320-2X controller. The controller reports one 146GB SCSI disc has failed in the RAID10 performance is in "DEGRADED" mode. The database seems to be running fine but slower.

I've never had to replace a disc in an array with Postgresql running on it. LSI says I can replace the disc and do a rebuild while everything is running. I am of course concerned about data integrity/corruption.

Has anyone had to rebuild one of their disc in an array of their database?

Thanks for your help.

Steve Poe




Re: Postgresql data integrity during RAID10 drive rebuild

From
Scott Marlowe
Date:
On Wed, 2006-11-29 at 10:56, Steve Poe wrote:
> I need some input from the Postgresql community.
>
> Our animal hospital runs Postgresql 7.4 on a 6-disc RAID10. The
> database logs are on a separate RAID1.
>
> We're using an LSI MegaRAID 320-2X controller. The controller reports
> one 146GB SCSI disc has failed in the RAID10 performance is in
> "DEGRADED" mode. The database seems to be running fine but slower.
>
> I've never had to replace a disc in an array with Postgresql running
> on it. LSI says I can replace the disc and do a rebuild while
> everything is running. I am of course concerned about data
> integrity/corruption.
>
> Has anyone had to rebuild one of their disc in an array of their
> database?

Yep, I've done it a few times.  A few tips:

backup your database with pg_dump.  confirm you can restore on a test
machine.

replace the drive during the lowest traffic period for your database.

The LSI cards are very stable.  I've replaced drives under them back in
the Ultra-320 days with removable caddies.  I would imagine that as long
as you have the proper drive caddies you're set.  If the drives are not
in removable caddies, you'll need to power down to safely unplug them.

Re: Postgresql data integrity during RAID10 drive rebuild

From
"Joshua D. Drake"
Date:
On Wed, 2006-11-29 at 08:56 -0800, Steve Poe wrote:
> I need some input from the Postgresql community.
>
> Our animal hospital runs Postgresql 7.4 on a 6-disc RAID10. The
> database logs are on a separate RAID1.
>
> We're using an LSI MegaRAID 320-2X controller. The controller reports
> one 146GB SCSI disc has failed in the RAID10 performance is in
> "DEGRADED" mode. The database seems to be running fine but slower.
> I've never had to replace a disc in an array with Postgresql running
> on it. LSI says I can replace the disc and do a rebuild while
> everything is running. I am of course concerned about data
> integrity/corruption.

Well of course do a backup, but LSI is correct it will rebuild
correctly.

>
> Has anyone had to rebuild one of their disc in an array of their
> database?

Of course :) and you should be fine. However make sure you grab a backup
just in case, and check the firmware version on your LSI. There was a
parity bug that caused data corruption a coupld of revs ago.

Joshua D. Drake

>
> Thanks for your help.
>
> Steve Poe
>
>
>
>
--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




Re: Postgresql data integrity during RAID10 drive rebuild

From
Vivek Khera
Date:

On Nov 29, 2006, at 11:56 AM, Steve Poe wrote:

I've never had to replace a disc in an array with Postgresql running on it. LSI says I can replace the disc and do a rebuild while everything is running. I am of course concerned about data integrity/corruption. 


This is the whole entire complete purpose you have a RAID card and hot-swap drives:  To make it transparent to the layers above the disk interface.

Has anyone had to rebuild one of their disc in an array of their database?

Yes.  The OS (let alone an application such as the DB) has no clue other than possibly slower response from the mirrored pair being rebuilt.
Attachment

Re: Postgresql data integrity during RAID10 drive rebuild

From
"Steve Poe"
Date:

Yep, I've done it a few times.  A few tips:

backup your database with pg_dump.  confirm you can restore on a test
machine.

Thanks. We do nightly dump and restore  to a second server for testing/backup purposes.
 The data is entact. Are you recommending a dump before we begin the drive rebuild?

replace the drive during the lowest traffic period for your database.

Sounds good. According to LSI, the drive will take 8 hrs to rebuild a 146GB disc (at a 30% rebuild rate), so doing this in the middle of the day is not ideal.

The LSI cards are very stable.  I've replaced drives under them back in
the Ultra-320 days with removable caddies.  I would imagine that as long
as you have the proper drive caddies you're set.  If the drives are not
in removable caddies, you'll need to power down to safely unplug them.

The drive is on a removable caddie in a 4-disc array backplane.

Scott, Joshua, and Vivek...thanks for your feedback.

Steve Poe
Adobe Animal Hospital

Re: Postgresql data integrity during RAID10 drive rebuild

From
Scott Marlowe
Date:
On Wed, 2006-11-29 at 13:21, Steve Poe wrote:
>         Yep, I've done it a few times.  A few tips:
>
>         backup your database with pg_dump.  confirm you can restore on
>         a test
>         machine.
>
> Thanks. We do nightly dump and restore  to a second server for
> testing/backup purposes.
>  The data is entact. Are you recommending a dump before we begin the
> drive rebuild?

Either that, or wait until the nightly dump / restore has run and start
then.

>         replace the drive during the lowest traffic period for your
>         database.
>
> Sounds good. According to LSI, the drive will take 8 hrs to rebuild a
> 146GB disc (at a 30% rebuild rate), so doing this in the middle of the
> day is not ideal.

The rebuild time also tends to depend on how full the array is.  If
you're only using 5% or so, it won't take the full 8 hours they're
projecting.



Re: Postgresql data integrity during RAID10 drive rebuild

From
Vivek Khera
Date:
On Nov 29, 2006, at 2:39 PM, Scott Marlowe wrote:

>> Sounds good. According to LSI, the drive will take 8 hrs to rebuild a
>> 146GB disc (at a 30% rebuild rate), so doing this in the middle of
>> the
>> day is not ideal.
>
> The rebuild time also tends to depend on how full the array is.  If
> you're only using 5% or so, it won't take the full 8 hours they're
> projecting.

But how does the RAID card know what is and what is not "full" in the
unix file system stored on it? It has to rebuild the entire drive.


Attachment

Re: Postgresql data integrity during RAID10 drive rebuild

From
Scott Marlowe
Date:
On Wed, 2006-11-29 at 14:16, Vivek Khera wrote:
> On Nov 29, 2006, at 2:39 PM, Scott Marlowe wrote:
>
> >> Sounds good. According to LSI, the drive will take 8 hrs to rebuild a
> >> 146GB disc (at a 30% rebuild rate), so doing this in the middle of
> >> the
> >> day is not ideal.
> >
> > The rebuild time also tends to depend on how full the array is.  If
> > you're only using 5% or so, it won't take the full 8 hours they're
> > projecting.
>
> But how does the RAID card know what is and what is not "full" in the
> unix file system stored on it? It has to rebuild the entire drive.

Not sure how they do it exactly, but it seems a lot of RAID controllers
know which parts of a drive have been written to and which haven't.  I
recall seeing it happen on rebuilding a RAID 5 on an old LSI card.

It could just be that it's a lot faster if it's got zeros on a block and
can short circuit the parity there.

As for RAID 1+0, not sure if it will be faster or not.