On Fri, 2009-06-12 at 19:53 +1000, Yaroslav Tykhiy wrote:
> DimitryASuplatov wrote:
> >
> > My task is to store a lot (10^5) of small ( <10 MB) text files in the
> > database with the ability to restore them back to the hard drive on
> > demand.
>
> I cannot but ask the community a related question here: Can such
> design, that is, storing quite large objects of varying size in a
> PostgreSQL database, be a good idea in the first place? I used to
> believe that what RDBMS were really good at was storing a huge number
> of relations, each of a small and mostly uniform size if expressed in
> bytes; but today people tend to put big things, e.g., email or files,
> in relational databases because it's convenient to them. That's
> absolutely normal as typical data objects we have to deal with keep
> growing in size, but how well can databases stand the pressure? And
> can't it still be better to store large things as plain files and put
> just their names in the database? File systems were designed for such
> kind of job after all, unlike RDBMS.
It depends a great deal on what you need.
Using the DB allows you to access that data using the same tools,
methods, connections, and security credentials you use for your other
data. It also allows you to manage it in the same transactional
environment, and verify its consistency.
Using the file system can be faster and offers a wider variety of
methods for accessing and manipulating the data. It can be easier to
back up efficiently (differential/incremental backups etc) may take up
less space, and more.
To me, a nearly ideal option would be a file system that supported
transactional operations and two phase commit. You could work with your
data objects normally in the FS (at least for read access), but you
could _ALSO_ modify them in tandem with the DB:
- Begin DB trans
- Begin FS trans
... do your work...
- prepare db trans for commit
- prepare fs trans for commit
- commit db trans
- commit fs trans
A transaction manager could hide those from you, or the DB could take
care of the FS 2PC as part of its own LOB and transaction management. In
fact, I think that's how Microsoft have done it with transactional NTFS
integration in MS SQL Server, which I must say sounds awfully nice.
--
Craig Ringer