Re: bytea or large object - Mailing list pgsql-sql
From | Frank Joerdens |
---|---|
Subject | Re: bytea or large object |
Date | |
Msg-id | 20020404114051.A28343@superfly.archi-me-des.de Whole thread Raw |
In response to | bytea or large object (Jean-Christophe FABRE <jean-christophe.fabre@ensam.inra.fr>) |
Responses |
Re: bytea or large object
|
List | pgsql-sql |
On Wed, Apr 03, 2002 at 04:47:29PM +0200, Jean-Christophe FABRE wrote: > > Hi, > > I would like to store .pdf files in a database, which is accessed through a > PHP interface. > which is the best way to store binary data? > - bytea? (with the tricky handling of escape chars) > - large objects? I asked the same question a week ago on pgsql-general. Here's my 2 cents: The Postgres large object interface is older and therefore more tried and tested. It used to have a reputation for slowness, but the code was overhauled in 7.1 to address this issue. The LO interface is a little convoluted and takes some getting used to, but works well. For Bytea you can use "normal" SQL syntax. Escaping the data is not really difficult. The Jury is still out on the perfomance issue. You might guesstimate though that LO works better for really big chunks because with bytea, everything needs to be read into memory on select/insert. This probably also means that bytea is likely to be faster for smaller chunks (i.e. those that easily fit into physical memory) because the server OS's and PostgreSQL's cacheing mechanisms would come to the fore if the data is small enough to be held in RAM a bit until it's written out to disk (actually I don't have the faintest idea how large objects are handled internally in Postres so this is more of a hunch on my part). Last but not least it needs to be said that there is ample controversy on the subject of whether to stuff binary data into the SQL database at all - the alternative is to put them into the file system and store the path to the file in an SQL table column. For web applications in particular, the case _against_ putting e.g. images (think of a content management app, for instance) into the database is pretty strong because - especially on a loaded webserver - the overhead of retrieving images from the database versus serving them straight from the file system is pretty huge. Remember that web servers are designed to do this a quickly as possible, whereas with transactional databases, the design focus is elsewhere: data integrity and consistency, transactional functionality, insert/update speeds, and then select speeds. You might see orders of magnitude in performance difference. However, if you do this - keep the big binary chunks out of the database - then you partly compromise the very rationale for using a database in the first place: namely data integrity, consistency, and transactional functionality. What if someone accesses the files through the file system directly rather than through your application which manipulates filesystem and database simultaneously - your system would go out of sync. What if 2 users try to manipulate the same set of data simultaneously? How do you ensure transactional atomicity if you have related data in 2 different places? Hence what I decided to do for the next web project is this: Keeping the image data in the database, but putting a copy of the image into the filesystem, as in a "permanent image cache". The data in the database is then always considered authoritative, and the image is only written out to the filesystem cache once a transaction is successfully completed. For a more philosophical, visionary slant on the subject matter, I would recommend Hans Reiser's seminal paper on what he thinks ReiserFS might be, or should be in the future: http://www.namesys.com/whitepaper.html He argues for a conversion between filesystems and databases and approaches the problem from the filesystem end, i.e. he wants to make filesystems more database-like. I was thinking that it would be actually neat to have filesystem-like functionality with an SQL database, and did some searching: There is a little paper on how to do this with Postgres about something the author calls PgFS (allows you to NFS mount a database), but as someone else on some other list notes "code has not been sighted in years". There is another approach called PerlFS which one might use for something interesting with Posgres DBI, but the site where it used to be has been down for months. Regards, Frank