Re: What is the best way to storage music files in Postgresql - Mailing list pgsql-performance

From Rich
Subject Re: What is the best way to storage music files in Postgresql
Date
Msg-id 8fb205ea0803171139h63b55fdrf62776851583a74d@mail.gmail.com
Whole thread Raw
In response to Re: What is the best way to storage music files in Postgresql  ("Peter Koczan" <pjkoczan@gmail.com>)
Responses Re: What is the best way to storage music files in Postgresql
List pgsql-performance
On Mon, Mar 17, 2008 at 2:01 PM, Peter Koczan <pjkoczan@gmail.com> wrote:
> >  > I am going to embarkon building a music library using apache,
>  >  > postgresql and php.  What is the best way to store the music files?
>  >
>  >  Your options are either to use a BLOB within the database or to store
>  >  paths to normal files in the file system in the database. I suspect
>  >  using normal files will make backup and management a great deal easier
>  >  than using in-database BLOBs, so personally I'd do it that way.
>
>  I discussed something like this with some co-workers recently, and
>  here's what I had to say. Not all of these apply to the original
>  message, but they are things to consider when marrying a database to a
>  file storage system.
>
>  Storing the files in the database as BLOBs:
>  Pros:
>  - The files can always be seen by the database system as long as it's
>  up (there's no dependence on an external file system).
>  - There is one set of locking mechanisms, meaning that the file
>  operations can be atomic with the database operations.
>  - There is one set of permissions to deal with.
>  Cons:
>  - There is almost no way to access files outside of the database. If
>  the database goes down, you are screwed.
>  - If you don't make good use of tablespaces and put blobs on a
>  separate disk system, the disk could thrash going between data and
>  blobs, affecting performance.
>  - There are stricter limits for PostgreSQL blobs (1 GB size limits, I've read).
>
>  Storing files externally, storing pathnames in the database:
>  Pros:
>  - You can access and manage files from outside the database and
>  possibly using different interfaces.
>  - There's a lot less to store directly in the database.
>  - You can use existing file-system permissions, mechanisms, and limits.
>  Cons:
>  - You are dealing with two storage systems and two different locking
>  systems which are unlikely to play nice with each other. Transactions
>  are not guaranteed to be atomic (e.g. a database rollback will not
>  rollback a file system operation, a commit will not guarantee that
>  data in a file will stay).
>  - The file system has to be seen by the database system and any remote
>  clients that wish to use your application, meaning that a networked FS
>  is likely to be used (depending on how many clients you have and how
>  you like to separate services), with all the fun that comes from
>  administering one of those. Note that this one in particular really
>  only applies to enterprise-level installations, not smaller
>  installations like the original poster's.
>  - If you don't put files on a separate disk-system or networked FS,
>  you can get poor performance from the disk thrashing between the
>  database and the files.
>
>  There are a couple main points:
>  1. The favorite answer in computing, "it depends", applies here. What
>  you decide depends on your storage system, your service and
>  installation policies, and how important fully atomic transactions are
>  to you.
>  2. If you want optimal performance out of either of these basic
>  models, you should make proper use of separate disk systems. I have no
>  idea which one is faster (it depends, I'm sure) nor do I have much of
>  an idea of how to benchmark this properly.
>
>  Peter
>  It seems to me as such a database gets larger, it will become much harder to manage with the 2 systems.  I am
talkingmostly about music.  So each song should not get too large.  I have read alot on this list and on other
resourcesand there seems to be leanings toward 1+0 raids for storage.  It seems to the most flexible when it comes to
speed,redundancy and recovery time.  I do want my database to be fully atomic.  I think that is important as this
databasegrows.  Are my assumptions wrong? 

pgsql-performance by date:

Previous
From: Justin
Date:
Subject: Re: Benchmark: Dell/Perc 6, 8 disk RAID 10
Next
From: Rich
Date:
Subject: Re: What is the best way to storage music files in Postgresql