Re: bytea or large object - Mailing list pgsql-sql

From Frank Joerdens
Subject Re: bytea or large object
Date
Msg-id 20020404114051.A28343@superfly.archi-me-des.de
Whole thread Raw
In response to bytea or large object  (Jean-Christophe FABRE <jean-christophe.fabre@ensam.inra.fr>)
Responses Re: bytea or large object
List pgsql-sql
On Wed, Apr 03, 2002 at 04:47:29PM +0200, Jean-Christophe FABRE wrote:
> 
> Hi,
> 
> I would like to store .pdf files in a database, which is accessed through a 
> PHP interface.
> which is the best way to store binary data?
> - bytea? (with the tricky handling of escape chars)
> - large objects?

I asked the same question a week ago on pgsql-general. Here's my 2
cents: The Postgres large object interface is older and therefore more
tried and tested. It used to have a reputation for slowness, but the
code was overhauled in 7.1 to address this issue. The LO interface is a
little convoluted and takes some getting used to, but works well. For
Bytea you can use "normal" SQL syntax. Escaping the data is not
really difficult.

The Jury is still out on the perfomance issue. You might guesstimate though
that LO works better for really big chunks because with bytea,
everything needs to be read into memory on select/insert. This probably
also means that bytea is likely to be faster for smaller chunks (i.e.
those that easily fit into physical memory) because the server OS's and
PostgreSQL's cacheing mechanisms would come to the fore if the data is
small enough to be held in RAM a bit until it's written out to disk
(actually I don't have the faintest idea how large objects are handled
internally in Postres so this is more of a hunch on my part).

Last but not least it needs to be said that there is ample controversy
on the subject of whether to stuff binary data into the SQL database at
all - the alternative is to put them into the file system and store the
path to the file in an SQL table column. For web applications in
particular, the case _against_ putting e.g.  images (think of a content
management app, for instance) into the database is pretty strong because
- especially on a loaded webserver - the overhead of retrieving images
from the database versus serving them straight from the file system is
pretty huge. Remember that web servers are designed to do this a quickly
as possible, whereas with transactional databases, the design focus is
elsewhere: data integrity and consistency, transactional functionality,
insert/update speeds, and then select speeds. You might see orders of
magnitude in performance difference.

However, if you do this - keep the big binary chunks out of the database
- then you partly compromise the very rationale for using a database in
the first place: namely data integrity, consistency, and transactional
functionality. What if someone accesses the files through the file
system directly rather than through your application which manipulates
filesystem and database simultaneously - your system would go out of
sync. What if 2 users try to manipulate the same set of data
simultaneously? How do you ensure transactional atomicity if you have
related data in 2 different places? Hence what I decided to do for the
next web project is this: Keeping the image data in the database, but
putting a copy of the image into the filesystem, as in a "permanent
image cache". The data in the database is then always considered
authoritative, and the image is only written out to the filesystem cache
once a transaction is successfully completed.

For a more philosophical, visionary slant on the subject matter, I would
recommend Hans Reiser's seminal paper on what he thinks ReiserFS might
be, or should be in the future:

http://www.namesys.com/whitepaper.html

He argues for a conversion between filesystems and databases and
approaches the problem from the filesystem end, i.e. he wants to make
filesystems more database-like.

I was thinking that it would be actually neat to have filesystem-like
functionality with an SQL database, and did some searching: There is a
little paper on how to do this with Postgres about something the author
calls PgFS (allows you to NFS mount a database), but as someone else on
some other list notes "code has not been sighted in years". There is
another approach called PerlFS which one might use for something
interesting with Posgres DBI, but the site where it used to be has been
down for months.

Regards, Frank


pgsql-sql by date:

Previous
From: Iklódi Lajos
Date:
Subject: Re: Update in trigger
Next
From: Frank Joerdens
Date:
Subject: Re: bytea or large object