Thread: Large objetcs performance
Hello All, I've been searching the archives for something similar, without success.. We have an application subjected do sign documents and store them somewhere. The files size may vary from Kb to Mb. Delelopers are arguing about the reasons to store files direcly on operating system file system or on the database, as large objects. My boss is considering file system storing, because he is concerned about integrity, backup/restore corruptions. I'd like to know some reasons to convince them to store these files on PosgtreSQL, including integrity, and of course, performance. I would like to know the file system storing disadvantages as well. Thanks in advace. Alex
On 04.04.2007, at 08:03, Alexandre Vasconcelos wrote: > We have an application subjected do sign documents and store them > somewhere. The files size may vary from Kb to Mb. Delelopers are > arguing about the reasons to store files direcly on operating system > file system or on the database, as large objects. My boss is > considering file system storing, because he is concerned about > integrity, backup/restore corruptions. I'd like to know some reasons > to convince them to store these files on PosgtreSQL, including > integrity, and of course, performance. I would like to know the file > system storing disadvantages as well. It is not directly PostgreSQL related, but this might give you something to think about: http://en.wikibooks.org/wiki/Programming:WebObjects/Web_Applications/ Development/Database_vs_Filesystem cug
On 4/4/07, Alexandre Vasconcelos <alex.vasconcelos@gmail.com> wrote: > We have an application subjected do sign documents and store them > somewhere. The files size may vary from Kb to Mb. Delelopers are > arguing about the reasons to store files direcly on operating system > file system or on the database, as large objects. My boss is > considering file system storing, because he is concerned about > integrity, backup/restore corruptions. I'd like to know some reasons > to convince them to store these files on PosgtreSQL, including > integrity, and of course, performance. I would like to know the file > system storing disadvantages as well. This topic actually gets debated about once a month on the lists :-). Check the archives, but here is a quick summary: Storing objects on the file system: * usually indexed on the database for searching * faster than database (usually) * more typical usage pattern * requires extra engineering if you want to store huge numbers of objects * requires extra engineering to keep your database in sync. on postgresql irc someone suggested a clever solution with inotify * backup can be a pain (even rsync has its limits) -- for really big systems, look at clustering solutions (drbd for example) * lots of people will tell you this 'feels' right or wrong -- ignore them :-) * well traveled path. it can be made to work. Storing objects on the database: * slower, but getting faster -- its mostly cpu bound currently * get very recent cpu. core2 xeons appear to be particularly good at this. * use bytea, not large objects * will punish you if your client interface does not communicate with database in binary * less engineering in the sense you are not maintaining two separate systems * forget backing up with pg_dump...go right to pitr (maybe slony?) * 1gb limit. be aware of high memory requirements * you get to work with all your data with single interface and administrate one system -- thats the big payoff. * less well traveled path. put your r&d cap on and be optimistic but skeptical. do some tests. merlin
Hello Alexandre, <We have an application subjected do sign documents and store them somewhere.> I developed a relative simple "file archive" with PostgreSQL (web application with JSF for user interface). The major structure is one table with some "key word fields", and 3 blob-fields (because exactly 3 files belong to one record). I have do deal with millions of files (95% about 2-5KB, 5% are greater than 1MB). The great advantage is that I don't have to "communicate" with the file system (try to open a directory with 300T files on a windows system... it's horrible, even on the command line). The database now is 12Gb, but searching with the web interface has a maximum of 5 seconds (most searches are faster). The one disadvantage is the backup (I use pg_dump once a week which needs about 10 hours). But for now, this is acceptable for me. But I want to look at slony or port everything to a linux machine. Ulrich
Hello Alexandre,
<We have an application subjected do sign documents and store them somewhere.>
I developed a relative simple "file archive" with PostgreSQL (web application with JSF for user interface). The major structure is one table with some "key word fields", and 3 blob-fields (because exactly 3 files belong to one record). I have do deal with millions of files (95% about 2-5KB, 5% are greater than 1MB).
The great advantage is that I don't have to "communicate" with the file system (try to open a directory with 300T files on a windows system... it's horrible, even on the command line).
The database now is 12Gb, but searching with the web interface has a maximum of 5 seconds (most searches are faster). The one disadvantage is the backup (I use pg_dump once a week which needs about 10 hours). But for now, this is acceptable for me. But I want to look at slony or port everything to a linux machine.
Ulrich
<We have an application subjected do sign documents and store them somewhere.>
I developed a relative simple "file archive" with PostgreSQL (web application with JSF for user interface). The major structure is one table with some "key word fields", and 3 blob-fields (because exactly 3 files belong to one record). I have do deal with millions of files (95% about 2-5KB, 5% are greater than 1MB).
The great advantage is that I don't have to "communicate" with the file system (try to open a directory with 300T files on a windows system... it's horrible, even on the command line).
The database now is 12Gb, but searching with the web interface has a maximum of 5 seconds (most searches are faster). The one disadvantage is the backup (I use pg_dump once a week which needs about 10 hours). But for now, this is acceptable for me. But I want to look at slony or port everything to a linux machine.
Ulrich