Thread: What is the best way to storage music files in Postgresql

What is the best way to storage music files in Postgresql

From

Rich

Date:

16 March 2008, 03:12:05

I am going to embarkon building a music library using apache,
postgresql and php.  What is the best way to store the music files?
Which file type should I use?

Re: What is the best way to storage music files in Postgresql

From

Craig James

Date:

16 March 2008, 03:20:48

Rich wrote:
> I am going to embarkon building a music library using apache,
> postgresql and php.  What is the best way to store the music files?
> Which file type should I use?

In Postgres, its all just binary data.  It's entirely up to you which particular format you use. mp2, mp3 mp4, wmv,
avi,whatever, it's all the same to Postgres. 

A better question is: Should you store the binary data in Postgres itself, or keep it in files and only store the
filenames? The Postgres archives have several discussions of this topic, and the answer is, "it depends." 

Craig

Re: What is the best way to storage music files in Postgresql

From

Craig Ringer

Date:

16 March 2008, 03:25:57

Rich wrote:
> I am going to embarkon building a music library using apache,
> postgresql and php.  What is the best way to store the music files?

Your options are either to use a BLOB within the database or to store
paths to normal files in the file system in the database. I suspect
using normal files will make backup and management a great deal easier
than using in-database BLOBs, so personally I'd do it that way.

Storing the audio files in the database does make it easier to keep the
database and file system backups in sync, but I'm not really sure that's
worth the costs.

I'm sure that what you're doing has been done many times before, though,
so even if you're not going to use one of the existing options you might
at least  want to have a look at how they work.

> Which file type should I use?

I'm not sure I understand this question. Are you asking which audio
compression codec and audio container file type (like "mp3", "aac", etc)
  you should use? If so, this is really not the right place to ask that.

Do you mean which file /system/ ?

--
Craig Ringer

Re: What is the best way to storage music files in Postgresql

From

"Peter Koczan"

Date:

17 March 2008, 15:29:47

>  > I am going to embarkon building a music library using apache,
>  > postgresql and php.  What is the best way to store the music files?
>
>  Your options are either to use a BLOB within the database or to store
>  paths to normal files in the file system in the database. I suspect
>  using normal files will make backup and management a great deal easier
>  than using in-database BLOBs, so personally I'd do it that way.

I discussed something like this with some co-workers recently, and
here's what I had to say. Not all of these apply to the original
message, but they are things to consider when marrying a database to a
file storage system.

Storing the files in the database as BLOBs:
Pros:
- The files can always be seen by the database system as long as it's
up (there's no dependence on an external file system).
- There is one set of locking mechanisms, meaning that the file
operations can be atomic with the database operations.
- There is one set of permissions to deal with.
Cons:
- There is almost no way to access files outside of the database. If
the database goes down, you are screwed.
- If you don't make good use of tablespaces and put blobs on a
separate disk system, the disk could thrash going between data and
blobs, affecting performance.
- There are stricter limits for PostgreSQL blobs (1 GB size limits, I've read).

Storing files externally, storing pathnames in the database:
Pros:
- You can access and manage files from outside the database and
possibly using different interfaces.
- There's a lot less to store directly in the database.
- You can use existing file-system permissions, mechanisms, and limits.
Cons:
- You are dealing with two storage systems and two different locking
systems which are unlikely to play nice with each other. Transactions
are not guaranteed to be atomic (e.g. a database rollback will not
rollback a file system operation, a commit will not guarantee that
data in a file will stay).
- The file system has to be seen by the database system and any remote
clients that wish to use your application, meaning that a networked FS
is likely to be used (depending on how many clients you have and how
you like to separate services), with all the fun that comes from
administering one of those. Note that this one in particular really
only applies to enterprise-level installations, not smaller
installations like the original poster's.
- If you don't put files on a separate disk-system or networked FS,
you can get poor performance from the disk thrashing between the
database and the files.

There are a couple main points:
1. The favorite answer in computing, "it depends", applies here. What
you decide depends on your storage system, your service and
installation policies, and how important fully atomic transactions are
to you.
2. If you want optimal performance out of either of these basic
models, you should make proper use of separate disk systems. I have no
idea which one is faster (it depends, I'm sure) nor do I have much of
an idea of how to benchmark this properly.

Peter

Re: What is the best way to storage music files in Postgresql

From

Rich

Date:

17 March 2008, 15:50:41

On Mon, Mar 17, 2008 at 2:01 PM, Peter Koczan <pjkoczan@gmail.com> wrote:
> >  > I am going to embarkon building a music library using apache,
>  >  > postgresql and php.  What is the best way to store the music files?
>  >
>  >  Your options are either to use a BLOB within the database or to store
>  >  paths to normal files in the file system in the database. I suspect
>  >  using normal files will make backup and management a great deal easier
>  >  than using in-database BLOBs, so personally I'd do it that way.
>
>  I discussed something like this with some co-workers recently, and
>  here's what I had to say. Not all of these apply to the original
>  message, but they are things to consider when marrying a database to a
>  file storage system.
>
>  Storing the files in the database as BLOBs:
>  Pros:
>  - The files can always be seen by the database system as long as it's
>  up (there's no dependence on an external file system).
>  - There is one set of locking mechanisms, meaning that the file
>  operations can be atomic with the database operations.
>  - There is one set of permissions to deal with.
>  Cons:
>  - There is almost no way to access files outside of the database. If
>  the database goes down, you are screwed.
>  - If you don't make good use of tablespaces and put blobs on a
>  separate disk system, the disk could thrash going between data and
>  blobs, affecting performance.
>  - There are stricter limits for PostgreSQL blobs (1 GB size limits, I've read).
>
>  Storing files externally, storing pathnames in the database:
>  Pros:
>  - You can access and manage files from outside the database and
>  possibly using different interfaces.
>  - There's a lot less to store directly in the database.
>  - You can use existing file-system permissions, mechanisms, and limits.
>  Cons:
>  - You are dealing with two storage systems and two different locking
>  systems which are unlikely to play nice with each other. Transactions
>  are not guaranteed to be atomic (e.g. a database rollback will not
>  rollback a file system operation, a commit will not guarantee that
>  data in a file will stay).
>  - The file system has to be seen by the database system and any remote
>  clients that wish to use your application, meaning that a networked FS
>  is likely to be used (depending on how many clients you have and how
>  you like to separate services), with all the fun that comes from
>  administering one of those. Note that this one in particular really
>  only applies to enterprise-level installations, not smaller
>  installations like the original poster's.
>  - If you don't put files on a separate disk-system or networked FS,
>  you can get poor performance from the disk thrashing between the
>  database and the files.
>
>  There are a couple main points:
>  1. The favorite answer in computing, "it depends", applies here. What
>  you decide depends on your storage system, your service and
>  installation policies, and how important fully atomic transactions are
>  to you.
>  2. If you want optimal performance out of either of these basic
>  models, you should make proper use of separate disk systems. I have no
>  idea which one is faster (it depends, I'm sure) nor do I have much of
>  an idea of how to benchmark this properly.
>
>  Peter
>  It seems to me as such a database gets larger, it will become much harder to manage with the 2 systems.  I am
talkingmostly about music.  So each song should not get too large.  I have read alot on this list and on other
resourcesand there seems to be leanings toward 1+0 raids for storage.  It seems to the most flexible when it comes to
speed,redundancy and recovery time.  I do want my database to be fully atomic.  I think that is important as this
databasegrows.  Are my assumptions wrong?

Re: What is the best way to storage music files in Postgresql

From

Rich

Date:

17 March 2008, 15:52:06

On Sun, Mar 16, 2008 at 2:25 AM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> Rich wrote:
>  > I am going to embarkon building a music library using apache,
>  > postgresql and php.  What is the best way to store the music files?
>
>  Your options are either to use a BLOB within the database or to store
>  paths to normal files in the file system in the database. I suspect
>  using normal files will make backup and management a great deal easier
>  than using in-database BLOBs, so personally I'd do it that way.
>
>  Storing the audio files in the database does make it easier to keep the
>  database and file system backups in sync, but I'm not really sure that's
>  worth the costs.
What costs are to speaking of?
>
>  I'm sure that what you're doing has been done many times before, though,
>  so even if you're not going to use one of the existing options you might
>  at least  want to have a look at how they work.
>
>
>  > Which file type should I use?
>
>  I'm not sure I understand this question. Are you asking which audio
>  compression codec and audio container file type (like "mp3", "aac", etc)
>   you should use? If so, this is really not the right place to ask that.
>
>  Do you mean which file /system/ ?
>
>  --
>  Craig Ringer
>
>

Re: What is the best way to storage music files in Postgresql

From

"Peter Koczan"

Date:

18 March 2008, 00:26:21

> It seems to me as such a database gets larger, it will become much harder to manage with the 2 systems. I am
talkingmostly about music. So each song should not get too large.

I was just talking about points to consider in general. Getting to
your specific situation...

As far as BLOBs vs. file pointers. Test it out, use what you're most
comfortable using.

I would not set up a networked file system for the sole purpose of
managing and storing files a database will point to. If you already
have access to a networked file system, consider that as an option,
but don't add more work for yourself if you don't have to. Many
applications I work on use the database to store pathnames while the
files themselves are stored in a networked file system. It's honestly
not a huge pain to manage this if it's already available, but as I
mentioned before, there are caveats.

Also, in my experiences, the amount of management you do in a database
doesn't directly depending on the amount of data you put in. In other
words, your database shouldn't become much more difficult to manage
over time if all you are doing is adding more rows to tables.

> I have read alot on this list and on other resources and there seems to be leanings toward 1+0 raids for storage. It
seemsto the most flexible when it comes to speed, redundancy and recovery time. I do want my database to be fully
atomic. I think that is important as this database grows. Are my assumptions wrong?
>

As far as RAID levels go, RAID 10 is usually optimal for databases, so
your assumptions are correct. The extra cost for disks, I believe, is
paid off by the advantages you mentioned, at least for typical
database-related workloads. RAID 0 doesn't allow for any disaster
recovery, RAID 1 is ok as long as you can handle having only 2 disks
available, and RAID 5 and RAID 6 are just huge pains and terribly slow
for writes.

Note that you should go for a battery-backup if you use hardware RAID.

Hope this helps.

Peter

Re: What is the best way to storage music files in Postgresql

From

"Andrej Ricnik-Bay"

Date:

18 March 2008, 06:14:02

On 18/03/2008, Peter Koczan <pjkoczan@gmail.com> wrote:

>  available, and RAID 5 and RAID 6 are just huge pains and terribly slow
>  for writes.
RAID 5 and RAID 6 are just huge pains and terribly slow for writes
with small numbers of spindles....  ;}

In my testing I found that once you hit 10 spindles in a RAID5 the
differences between it and a RAID10 started to become negligible
(around 6% slower on writes average with 10 runs of bonnie++ on
10 spindles) while the read speed (if you're doing similar amounts
of reads &  writes it's a fair criterion) were in about the 10% region
faster.  With 24 spindles I couldn't see any difference at all. Those
were 73GB 15K SCAs, btw, and the SAN connected via 2GB fibre.

>  Peter
Cheers,
Andrej

--
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

Re: What is the best way to storage music files in Postgresql

From

Craig Ringer

Date:

18 March 2008, 06:59:46

Andrej Ricnik-Bay wrote:

> In my testing I found that once you hit 10 spindles in a RAID5 the
> differences between it and a RAID10 started to become negligible
> (around 6% slower on writes average with 10 runs of bonnie++ on
> 10 spindles) while the read speed (if you're doing similar amounts
> of reads &  writes it's a fair criterion) were in about the 10% region
> faster.  With 24 spindles I couldn't see any difference at all. Those
> were 73GB 15K SCAs, btw, and the SAN connected via 2GB fibre.

Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a
double disk failure (such as during the load imposed by rebuild onto a
spare) ?

I guess if you have good backups - as you must - it's not that big a
deal, but I'd be pretty nervous with anything less than RAID 6 or RAID 10 .

--
Craig Ringer

Re: What is the best way to storage music files in Postgresql

From

"Andrej Ricnik-Bay"

Date:

18 March 2008, 14:45:05

On 18/03/2008, Craig Ringer <craig@postnewspapers.com.au> wrote:
> Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a
>  double disk failure (such as during the load imposed by rebuild onto a
>  spare) ?

I never said that we actually USED that set-up.  I just said
I did extensive testing with varied RAID-setups. ;}  We did go
with the 10 in the end because of that very consideration.

It's just that the mantra "RAID5 = slow writes" isn't quite true.

Cheers,
Andrej

--
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

Re: What is the best way to storage music files in Postgresql

From

david@lang.hm

Date:

18 March 2008, 15:01:10

On Wed, 19 Mar 2008, Andrej Ricnik-Bay wrote:

>
> On 18/03/2008, Craig Ringer <craig@postnewspapers.com.au> wrote:
>> Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a
>>  double disk failure (such as during the load imposed by rebuild onto a
>>  spare) ?

that's why you should use raid6 (allowing for dual failures)

David Lang

Re: What is the best way to storage music files in Postgresql

From

david@lang.hm

Date:

18 March 2008, 16:57:54

On Tue, 18 Mar 2008, Gregory Stark wrote:

> <david@lang.hm> writes:
>
>> On Wed, 19 Mar 2008, Andrej Ricnik-Bay wrote:
>>
>>>
>>> On 18/03/2008, Craig Ringer <craig@postnewspapers.com.au> wrote:
>>>> Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a
>>>>  double disk failure (such as during the load imposed by rebuild onto a
>>>>  spare) ?
>>
>> that's why you should use raid6 (allowing for dual failures)
>
> You can have as many parity drives as you want with RAID 5 too.

you can? I've never seen a raid 5 setup with more then a single parity
dirve (or even the option of having more then one drives worth of
redundancy). you can have hot-spare drives, but thats a different thing.

what controller/software lets you do this?

David Lang

Re: What is the best way to storage music files in Postgresql

From

Gregory Stark

Date:

18 March 2008, 17:03:33

<david@lang.hm> writes:

> On Wed, 19 Mar 2008, Andrej Ricnik-Bay wrote:
>
>>
>> On 18/03/2008, Craig Ringer <craig@postnewspapers.com.au> wrote:
>>> Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a
>>>  double disk failure (such as during the load imposed by rebuild onto a
>>>  spare) ?
>
> that's why you should use raid6 (allowing for dual failures)

You can have as many parity drives as you want with RAID 5 too.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

Re: What is the best way to storage music files in Postgresql

From

Craig Ringer

Date:

18 March 2008, 19:45:16

Gregory Youngblood wrote:
> Also, a very informative read:
> http://research.google.com/archive/disk_failures.pdf
> In short, best thing to do is watch SMART and be prepared to try and
> swap a drive out before it fails completely. :)
>
I currently have four brand new 1TB disks (7200RPM SATA - they're for
our backup server). Two of them make horrible clicking noises - they're
rapidly parking and unparking or doing constant seeks. One of those two
also spins up very loudly, and on spin down rattles and buzzes.

Their internal SMART "health check" reports the problem two to be just
fine, and both pass a short SMART self test (smartctl -d ata -t short).
Both have absurdly huge seek_error_rate values, but the SMART thresholds
see nothing wrong with this.

The noisy spin down one is so defective that I can't even write data to
it successfully, and the other problem disk has regular I/O errors and
fails an extended SMART self test (the short test fails).

I see this sort of thing regularly. Vendors are obviously setting the
SMART health thresholds so that there's absolutely no risk of reporting
an issue with a working drive, and in the process making it basically
useless for detecting failing or faulty drives.

I rely on manual examination of the vendor attributes like the seek
error rate, ECC recovered sectors, offline uncorrectable sectors
(usually a REALLY bad sign if this grows), etc combined with regular
extended SMART tests (which do a surface scan). Just using SMART - say,
the basic health check - really isn't enough.

--
Craig Ringer

Re: What is the best way to storage music files in Postgresql

From

Craig Ringer

Date:

18 March 2008, 19:52:48

david@lang.hm wrote:
> you can? I've never seen a raid 5 setup with more then a single parity
> dirve (or even the option of having more then one drives worth of
> redundancy). you can have hot-spare drives, but thats a different thing.
>
With RAID 4, where the "parity drives"  are in fact dedicated to parity
information, the controller could just store the parity data mirrored on
more than one drive. Unfortunately write performance on RAID 4 is
absolutely horrible, and a second or third parity disk would not help
with that.

I suppose there's nothing stopping a controller adding a second disk's
worth of duplicate parity information when striping a four or more disk
RAID 5 array, but I thought that's basically what RAID 6 was.

--
Craig Ringer

Re: What is the best way to storage music files in Postgresql

From

david@lang.hm

Date:

18 March 2008, 20:06:26

On Wed, 19 Mar 2008, Craig Ringer wrote:

> david@lang.hm wrote:
>> you can? I've never seen a raid 5 setup with more then a single parity
>> dirve (or even the option of having more then one drives worth of
>> redundancy). you can have hot-spare drives, but thats a different thing.
>>
> With RAID 4, where the "parity drives"  are in fact dedicated to parity
> information, the controller could just store the parity data mirrored on more
> than one drive. Unfortunately write performance on RAID 4 is absolutely
> horrible, and a second or third parity disk would not help with that.
>
> I suppose there's nothing stopping a controller adding a second disk's worth
> of duplicate parity information when striping a four or more disk RAID 5
> array, but I thought that's basically what RAID 6 was.

just duplicating the Raid 4 or 5 pairity information will not help you if
the parity drive is not one of the drives that fail.

raid 6 uses a different pairity algorithm so that any two drives in the
array can fail with no data loss.

even this isn't completely error proof. I just went through a scare with a
15 disk array where it reported 3 dead drives after a power outage. one of
the dead drives ended up being the hot-spare, and another drive that acted
up worked well enough to let me eventually recover all the data (seek
errors), but it was a very scary week while I worked through this.

David Lang

Re: What is the best way to storage music files in Postgresql

From

Gregory Stark

Date:

18 March 2008, 21:30:31

<david@lang.hm> writes:

> On Tue, 18 Mar 2008, Gregory Stark wrote:
>
>> You can have as many parity drives as you want with RAID 5 too.
>
> you can? I've never seen a raid 5 setup with more then a single parity dirve
> (or even the option of having more then one drives worth of redundancy). you
> can have hot-spare drives, but thats a different thing.
>
> what controller/software lets you do this?

Hm, some research shows I may have completely imagined this. I don't see why
you couldn't but I can't find any evidence that this feature exists. I could
have sworn I've seen it before though.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

Re: What is the best way to storage music files in Postgresql

From

Craig Ringer

Date:

18 March 2008, 21:32:01

david@lang.hm wrote:

> just duplicating the Raid 4 or 5 pairity information will not help you
> if the parity drive is not one of the drives that fail.

Good point - and no doubt why nothing supports extra disks worth of
parity on RAID 5, which would be entirely useless (still only protecting
against a 1-disk failure but wasting more space).

Except, apparently, the earlier poster's RAID 5 controller that DOES
support extra parity disks.

It must just be hot spares, nothing else makes any sense.

> even this isn't completely error proof. I just went through a scare with
> a 15 disk array where it reported 3 dead drives after a power outage.
> one of the dead drives ended up being the hot-spare, and another drive
> that acted up worked well enough to let me eventually recover all the
> data (seek errors), but it was a very scary week while I worked through
> this.

As file systems can be corrupted, files deleted, etc, I try to make sure
that all my data is sufficiently well backed up that a week's worth of
recovery effort is never needed. Dead array? Rebuild and restore from
backups. Admittedly this practice has arisen because of a couple of
scares much like you describe, but at least now it happens.

I even test the backups ;-)

Big SATA 7200rpm disks are so cheap compared to high performance SAS or
even 10kRPM SATA disks that it seems like a really bad idea not to have
a disk-based backup server with everything backed up quick to hand.

For that reason I'm absolutely loving PostgreSQL's archive_wal feature
and support for a warm spare server. I can copy the WAL files to another
machine and immediately restore them there (providing a certain level of
inherent testing) as well as writing them to tape. It's absolutely
wonderful. Sure, the warm spare will run like a man in knee-deep mud,
but it'll do in an emergency.

The existing "database" used by the app I'm working to replace is an
ISAM-based single host shared-file DB where all the user processes
access the DB directly. Concurrency is only supported through locking,
there's no transaction support, referential integrity checking, data
typing, no SQL of any sort, AND it's prone to corruption and data loss
if a user process is killed. User processes are killed when the user's
terminal is closed or loses its connection. Backups are only possible
once a day when all users are logged off. It's not an application where
losing half a day of data is fun. On top of all that it runs on SCO
OpenServer 5.0.5 (which has among other things the most broken C
toolchain I've ever seen).

So ... hooray for up-to-date, well tested backups and how easy
PostgreSQL makes them.

--
Craig Ringer

Re: What is the best way to storage music files in Postgresql

From

Craig Ringer

Date:

18 March 2008, 22:14:00

Gregory Youngblood wrote:
> In my way of thinking, and what I was referring to above, was using
> those error conditions to identify drives to change before the reported
> complete failures. Yes, that will mean changing drives before SMART
> actually says there is a full failure, and you may have to fight to get
> a drive replaced under warranty when you do so, but you are protecting
> your data.
>
>
I actually find it surprisingly easy to get a disk replaced based on a
printed SMART report showing uncorrectable sectors or just very high
reallocated sector counts etc. Almost suspiciously easy. I would not be
at all surprised if the disk vendors are, at least for their 7200rpm
SATA disks, recording a "black mark" against the serial number, doing a
low level reformat and sending them back out as a new disk to another
customer. Some of the "new" disks I've received have lifetimes and logs
that suggest they might be such refurbs - much longer test logs than
most new drives for example, as well as earlier serial numbers than
others ordered at the same time. They're also much, much more likely to
be DOA or develop defects early.
> I agree with you completely that waiting for SMART to actually indicate
> a true failure is pointless due to the thresholds set by mfrs. But using
> SMART for early warning signs still has value IMO.
>
I could not agree more. smartmontools is right up there with tools like
wireshark, mrt, and tcptraceroute in my most-vital toolbox, and it's
mostly because of its ability to examine the vendor attributes and kick
off scheduled self tests.

I've saved a great deal of dead-disk-replacement hassle by ensuring that
smartd is configured to  run extended self tests on the disks in all the
machines I operate at least fortnightly, and short tests at least
weekly. Being able to plan ahead to swap a dying disk is very nice indeed.

--
Craig Ringer

Re: What is the best way to storage music files in Postgresql

From

Gregory Youngblood

Date:

19 March 2008, 02:53:09

On Wed, 2008-03-19 at 07:44 +0900, Craig Ringer wrote:

Gregory Youngblood wrote:
> Also, a very informative read:
> http://research.google.com/archive/disk_failures.pdf
> In short, best thing to do is watch SMART and be prepared to try and
> swap a drive out before it fails completely. :)
>   
I currently have four brand new 1TB disks (7200RPM SATA - they're for 
our backup server). Two of them make horrible clicking noises - they're 
rapidly parking and unparking or doing constant seeks. One of those two 
also spins up very loudly, and on spin down rattles and buzzes.

Their internal SMART "health check" reports the problem two to be just 
fine, and both pass a short SMART self test (smartctl -d ata -t short). 
Both have absurdly huge seek_error_rate values, but the SMART thresholds 
see nothing wrong with this.

-->8 snip 8<--

In that Google report, one of their conclusions was that after the first scan error drives were 39 times more likely to fail within the next 60 days. And, first errors in reallocations, etc. also correlated to higher failure probabilities.

In my way of thinking, and what I was referring to above, was using those error conditions to identify drives to change before the reported complete failures. Yes, that will mean changing drives before SMART actually says there is a full failure, and you may have to fight to get a drive replaced under warranty when you do so, but you are protecting your data.

I agree with you completely that waiting for SMART to actually indicate a true failure is pointless due to the thresholds set by mfrs. But using SMART for early warning signs still has value IMO.

Re: What is the best way to storage music files in Postgresql

From

Gregory Youngblood

Date:

19 March 2008, 10:41:14

On Tue, 2008-03-18 at 11:04 -0700, david@lang.hm wrote:

On Wed, 19 Mar 2008, Andrej Ricnik-Bay wrote:

> 
> On 18/03/2008, Craig Ringer <craig@postnewspapers.com.au> wrote:
>> Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a
>>  double disk failure (such as during the load imposed by rebuild onto a
>>  spare) ?

that's why you should use raid6 (allowing for dual failures)

And, if possible, try to use drives from a couple of different batches. I wouldn't go crazy trying to do this, but sometimes a combination of drives from two or three batches can make a difference (though not always).

Also, a very informative read: http://research.google.com/archive/disk_failures.pdf
In short, best thing to do is watch SMART and be prepared to try and swap a drive out before it fails completely. :)