Re: What is the best way to storage music files in Postgresql - Mailing list pgsql-performance

From Peter Koczan
Subject Re: What is the best way to storage music files in Postgresql
Date
Msg-id 4544e0330803171101g680ffa5bj8ac32908a9f3dfb5@mail.gmail.com
Whole thread Raw
In response to Re: What is the best way to storage music files in Postgresql  (Craig Ringer <craig@postnewspapers.com.au>)
Responses Re: What is the best way to storage music files in Postgresql
List pgsql-performance
>  > I am going to embarkon building a music library using apache,
>  > postgresql and php.  What is the best way to store the music files?
>
>  Your options are either to use a BLOB within the database or to store
>  paths to normal files in the file system in the database. I suspect
>  using normal files will make backup and management a great deal easier
>  than using in-database BLOBs, so personally I'd do it that way.

I discussed something like this with some co-workers recently, and
here's what I had to say. Not all of these apply to the original
message, but they are things to consider when marrying a database to a
file storage system.

Storing the files in the database as BLOBs:
Pros:
- The files can always be seen by the database system as long as it's
up (there's no dependence on an external file system).
- There is one set of locking mechanisms, meaning that the file
operations can be atomic with the database operations.
- There is one set of permissions to deal with.
Cons:
- There is almost no way to access files outside of the database. If
the database goes down, you are screwed.
- If you don't make good use of tablespaces and put blobs on a
separate disk system, the disk could thrash going between data and
blobs, affecting performance.
- There are stricter limits for PostgreSQL blobs (1 GB size limits, I've read).

Storing files externally, storing pathnames in the database:
Pros:
- You can access and manage files from outside the database and
possibly using different interfaces.
- There's a lot less to store directly in the database.
- You can use existing file-system permissions, mechanisms, and limits.
Cons:
- You are dealing with two storage systems and two different locking
systems which are unlikely to play nice with each other. Transactions
are not guaranteed to be atomic (e.g. a database rollback will not
rollback a file system operation, a commit will not guarantee that
data in a file will stay).
- The file system has to be seen by the database system and any remote
clients that wish to use your application, meaning that a networked FS
is likely to be used (depending on how many clients you have and how
you like to separate services), with all the fun that comes from
administering one of those. Note that this one in particular really
only applies to enterprise-level installations, not smaller
installations like the original poster's.
- If you don't put files on a separate disk-system or networked FS,
you can get poor performance from the disk thrashing between the
database and the files.

There are a couple main points:
1. The favorite answer in computing, "it depends", applies here. What
you decide depends on your storage system, your service and
installation policies, and how important fully atomic transactions are
to you.
2. If you want optimal performance out of either of these basic
models, you should make proper use of separate disk systems. I have no
idea which one is faster (it depends, I'm sure) nor do I have much of
an idea of how to benchmark this properly.

Peter

pgsql-performance by date:

Previous
From: Justin
Date:
Subject: Re: Benchmark: Dell/Perc 6, 8 disk RAID 10
Next
From: Craig James
Date:
Subject: Re: Benchmark: Dell/Perc 6, 8 disk RAID 10