Re: What is the best way to storage music files in Postgresql - Mailing list pgsql-performance
From | Rich |
---|---|
Subject | Re: What is the best way to storage music files in Postgresql |
Date | |
Msg-id | 8fb205ea0803171139h63b55fdrf62776851583a74d@mail.gmail.com Whole thread Raw |
In response to | Re: What is the best way to storage music files in Postgresql ("Peter Koczan" <pjkoczan@gmail.com>) |
Responses |
Re: What is the best way to storage music files in Postgresql
|
List | pgsql-performance |
On Mon, Mar 17, 2008 at 2:01 PM, Peter Koczan <pjkoczan@gmail.com> wrote: > > > I am going to embarkon building a music library using apache, > > > postgresql and php. What is the best way to store the music files? > > > > Your options are either to use a BLOB within the database or to store > > paths to normal files in the file system in the database. I suspect > > using normal files will make backup and management a great deal easier > > than using in-database BLOBs, so personally I'd do it that way. > > I discussed something like this with some co-workers recently, and > here's what I had to say. Not all of these apply to the original > message, but they are things to consider when marrying a database to a > file storage system. > > Storing the files in the database as BLOBs: > Pros: > - The files can always be seen by the database system as long as it's > up (there's no dependence on an external file system). > - There is one set of locking mechanisms, meaning that the file > operations can be atomic with the database operations. > - There is one set of permissions to deal with. > Cons: > - There is almost no way to access files outside of the database. If > the database goes down, you are screwed. > - If you don't make good use of tablespaces and put blobs on a > separate disk system, the disk could thrash going between data and > blobs, affecting performance. > - There are stricter limits for PostgreSQL blobs (1 GB size limits, I've read). > > Storing files externally, storing pathnames in the database: > Pros: > - You can access and manage files from outside the database and > possibly using different interfaces. > - There's a lot less to store directly in the database. > - You can use existing file-system permissions, mechanisms, and limits. > Cons: > - You are dealing with two storage systems and two different locking > systems which are unlikely to play nice with each other. Transactions > are not guaranteed to be atomic (e.g. a database rollback will not > rollback a file system operation, a commit will not guarantee that > data in a file will stay). > - The file system has to be seen by the database system and any remote > clients that wish to use your application, meaning that a networked FS > is likely to be used (depending on how many clients you have and how > you like to separate services), with all the fun that comes from > administering one of those. Note that this one in particular really > only applies to enterprise-level installations, not smaller > installations like the original poster's. > - If you don't put files on a separate disk-system or networked FS, > you can get poor performance from the disk thrashing between the > database and the files. > > There are a couple main points: > 1. The favorite answer in computing, "it depends", applies here. What > you decide depends on your storage system, your service and > installation policies, and how important fully atomic transactions are > to you. > 2. If you want optimal performance out of either of these basic > models, you should make proper use of separate disk systems. I have no > idea which one is faster (it depends, I'm sure) nor do I have much of > an idea of how to benchmark this properly. > > Peter > It seems to me as such a database gets larger, it will become much harder to manage with the 2 systems. I am talkingmostly about music. So each song should not get too large. I have read alot on this list and on other resourcesand there seems to be leanings toward 1+0 raids for storage. It seems to the most flexible when it comes to speed,redundancy and recovery time. I do want my database to be fully atomic. I think that is important as this databasegrows. Are my assumptions wrong?
pgsql-performance by date: