Re: Storing many big files in database- should I do it? - Mailing list pgsql-general

From David Wall
Subject Re: Storing many big files in database- should I do it?
Date
Msg-id 4BD9AED8.5060502@computer.org
Whole thread Raw
In response to Re: Storing many big files in database- should I do it?  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
Responses Re: Storing many big files in database- should I do it?  (Justin Graf <justin@magwerks.com>)
List pgsql-general
Things to consider when not storing them in the DB:

1) Backups of DB are incomplete without a corresponding backup of the files.

2) No transactional integrity between filesystem and DB, so you will have to deal with orphans from both INSERT and DELETE (assuming you don't also update the files).

3) No built in ability for replication, such as WAL shipping

Big downside for the DB is that all large objects appear to be stored together in pg_catalog.pg_largeobject, which seems axiomatically troubling that you know you have lots of big data, so you then store them together, and then worry about running out of 'loids'.

David

On 4/29/2010 2:10 AM, Cédric Villemain wrote:
2010/4/28 Adrian Klaver <adrian.klaver@gmail.com>: 
On Tuesday 27 April 2010 5:45:43 pm Anthony wrote:   
On Tue, Apr 27, 2010 at 5:17 AM, Cédric Villemain <

cedric.villemain.debian@gmail.com> wrote:     
store your files in a filesystem, and keep the path to the file (plus
metadata, acl, etc...) in database.       
What type of filesystem is good for this?  A filesystem with support for
storing tens of thousands of files in a single directory, or should one
play the 41/56/34/41563489.ext game?     
I'll prefer go with XFS or ext{3-4}. In both case with a path game.
You path game will let you handle the scalability of your uploads. (so
the first increment is the first directory) something like
1/2/3/4/foo.file 2/2/3/4/bar.file etc... You might explore a hash
function or something that split a SHA1(or other) sum of the file to
get the path.

 
Are there any open source systems which handle keeping a filesystem and
database in sync for this purpose, or is it a wheel that keeps getting
reinvented?

I know "store your files in a filesystem" is the best long-term solution.
But it's just so much easier to just throw everything in the database.     
In the for what it is worth department check out this Wiki:
http://sourceforge.net/apps/mediawiki/fuse/index.php?title=DatabaseFileSystems   
and postgres fuse also :-D
 
--
Adrian Klaver
adrian.klaver@gmail.com
   

 

pgsql-general by date:

Previous
From: Guillaume Lelarge
Date:
Subject: Re: Start-up script for few clusters: just add water?
Next
From: Greg Smith
Date:
Subject: Re: Performance and Clustering