Thread: Large objetcs performance

Large objetcs performance

From
"Alexandre Vasconcelos"
Date:
Hello All,

I've been searching the archives for something similar, without success..

We have an application subjected do sign documents and store them
somewhere. The files size may vary from Kb to Mb. Delelopers are
arguing about the reasons to store files direcly on operating system
file system or on the database, as large objects. My boss is
considering file system storing, because he is concerned about
integrity, backup/restore corruptions. I'd like to know some reasons
to convince them to store these files on PosgtreSQL, including
integrity, and of course, performance. I would like to know the file
system storing disadvantages as well.

Thanks in advace.
Alex

Re: Large objetcs performance

From
Guido Neitzer
Date:
On 04.04.2007, at 08:03, Alexandre Vasconcelos wrote:

> We have an application subjected do sign documents and store them
> somewhere. The files size may vary from Kb to Mb. Delelopers are
> arguing about the reasons to store files direcly on operating system
> file system or on the database, as large objects. My boss is
> considering file system storing, because he is concerned about
> integrity, backup/restore corruptions. I'd like to know some reasons
> to convince them to store these files on PosgtreSQL, including
> integrity, and of course, performance. I would like to know the file
> system storing disadvantages as well.

It is not directly PostgreSQL related, but this might give you
something to think about:

http://en.wikibooks.org/wiki/Programming:WebObjects/Web_Applications/
Development/Database_vs_Filesystem

cug

Re: Large objetcs performance

From
"Merlin Moncure"
Date:
On 4/4/07, Alexandre Vasconcelos <alex.vasconcelos@gmail.com> wrote:
> We have an application subjected do sign documents and store them
> somewhere. The files size may vary from Kb to Mb. Delelopers are
> arguing about the reasons to store files direcly on operating system
> file system or on the database, as large objects. My boss is
> considering file system storing, because he is concerned about
> integrity, backup/restore corruptions. I'd like to know some reasons
> to convince them to store these files on PosgtreSQL, including
> integrity, and of course, performance. I would like to know the file
> system storing disadvantages as well.

This topic actually gets debated about once a month on the lists :-).
Check the archives, but here is a quick summary:

Storing objects on the file system:
* usually indexed on the database for searching
* faster than database (usually)
* more typical usage pattern
* requires extra engineering if you want to store huge numbers of objects
* requires extra engineering to keep your database in sync.  on
postgresql irc someone suggested a clever solution with inotify
* backup can be a pain (even rsync has its limits) -- for really big
systems, look at clustering solutions (drbd for example)
* lots of people will tell you this 'feels' right or wrong -- ignore them :-)
* well traveled path. it can be made to work.

Storing objects on the database:
* slower, but getting faster -- its mostly cpu bound currently
* get very recent cpu. core2 xeons appear to be particularly good at this.
* use bytea, not large objects
* will punish you if your client interface does not communicate with
database in binary
* less engineering in the sense you are not maintaining two separate systems
* forget backing up with pg_dump...go right to pitr (maybe slony?)
* 1gb limit. be aware of high memory requirements
* you get to work with all your data with single interface and
administrate one system -- thats the big payoff.
* less well traveled path.  put your r&d cap on and be optimistic but
skeptical.  do some tests.

merlin

Re: Large objetcs performance

From
Ulrich Cech
Date:
Hello Alexandre,

<We have an application subjected do sign documents and store them
somewhere.>

I developed a relative simple "file archive" with PostgreSQL (web
application with JSF for user interface). The major structure is one
table with some "key word fields", and 3 blob-fields (because exactly 3
files belong to one record). I have do deal with millions of files (95%
about 2-5KB, 5% are greater than 1MB).
The great advantage is that I don't have to "communicate" with the file
system (try to open a directory with 300T files on a windows system...
it's horrible, even on the command line).

The database now is 12Gb, but searching with the web interface has a
maximum of 5 seconds (most searches are faster). The one disadvantage is
the backup (I use pg_dump once a week which needs about 10 hours). But
for now, this is acceptable for me. But I want to look at slony or port
everything to a linux machine.

Ulrich

Re: Large objetcs performance

From
Ulrich Cech
Date:
Hello Alexandre,

<
We have an application subjected do sign documents and store them somewhere.>

I developed a relative simple "file archive" with PostgreSQL (web application with JSF for user interface). The major structure is one table with some "key word fields", and 3 blob-fields (because exactly 3 files belong to one record). I have do deal with millions of files (95% about 2-5KB, 5% are greater than 1MB).
The great advantage is that I don't have to "communicate" with the file system (try to open a directory with 300T files on a windows system... it's horrible, even on the command line).

The database now is 12Gb, but searching with the web interface has a maximum of 5 seconds (most searches are faster). The one disadvantage is the backup (I use pg_dump once a week which needs about 10 hours). But for now, this is acceptable for me. But I want to look at slony or port everything to a linux machine.

Ulrich