Thread: bytea or large object
Hi, I would like to store .pdf files in a database, which is accessed through a PHP interface. which is the best way to store binary data? - bytea? (with the tricky handling of escape chars) - large objects? thanks JC PS: I didn't found any example of scripts using bytea in PHP, does someone has some? _________________________________________________________ Jean-Christophe FABRE INRA - UMR Sol & Environnement tel: +33 (0)4 99 61 23 51 2, place Viala fax: +33 (0)4 67 63 26 14 34060 MONTPELLIER cedex 1 FRANCE
Jean-Christophe FABRE wrote: > > Hi, > > I would like to store .pdf files in a database, which is accessed through > a PHP interface. which is the best way to store binary data? - > bytea? (with the tricky handling of escape chars) - large objects? > > thanks > > JC > > PS: I didn't found any example of scripts using bytea in PHP, does someone > has some? Here's an escape script that was sent to me by Thomas T. Thai. function escByteA($binData) { /** * \134 = 92 = backslash, \000 = 00 = NULL, \047 = 39 = Single Quote * * str_replace() replaces the searches array in order. We must * process the 'backslash' character first. If we process it last, * it'll replace all the escaped backslashes from the other searches * that came before. */ $search = array(chr(92), chr(0), chr(39)); $replace = array('\\\134', '\\\000', '\\\047'); $binData = str_replace($search, $replace, $binData); return $binData; } There is also a function built in to PHP 4.1.2, available if you're using PostgreSQL 7.2, called pg_escape_bytea. Read the pdf file into a string, then use either of these to escape the string. The you can do a simple SQL insert. To retrieve the original files, use stripcslashes() to restore the query result string. Hope this helps, Joe
an alternative that I do for situations like this is store the file location in the db, then when you want to call it, you can generate a link on the web page from the db pointing to the file. The file is stored on the server, not in the db. Tim. Timothy P. Maguire Web Developer II Harte-Hanks 978 436 3325 Jean-Christophe FABRE <jean-christophe.fabre@ens To: pgsql-sql@postgresql.org, pgsql-php@postgresql.org am.inra.fr> cc: Sent by: Subject: bytea or large object pgsql-php-owner@postgresql .org 04/03/2002 09:47 AM Hi, I would like to store .pdf files in a database, which is accessed through a PHP interface. which is the best way to store binary data? - bytea? (with the tricky handling of escape chars) - large objects? thanks JC PS: I didn't found any example of scripts using bytea in PHP, does someone has some? _________________________________________________________ Jean-Christophe FABRE INRA - UMR Sol & Environnement tel: +33 (0)4 99 61 23 51 2, place Viala fax: +33 (0)4 67 63 26 14 34060 MONTPELLIER cedex 1 FRANCE ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org HARTE- HANKS PRIVILEGED AND CONFIDENTIAL INFORMATION- if you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose or take any action based on this message or any information herein and should delete this message.
I can do full joins just fine on two tables at a time: FROM t1 FULL JOIN t2 ON (t1.id = t2.id) Now, how would I full joint in a third table? FROM t1 FULL JOIN t2 ON (t1.id = t2.id) FULL JOIN t3 ON t1.id = t3.id That previous line would not join together two rows from t2 and t3. Thanks, Mike
On Wed, Apr 03, 2002 at 04:47:29PM +0200, Jean-Christophe FABRE wrote: > > Hi, > > I would like to store .pdf files in a database, which is accessed through a > PHP interface. > which is the best way to store binary data? > - bytea? (with the tricky handling of escape chars) > - large objects? I asked the same question a week ago on pgsql-general. Here's my 2 cents: The Postgres large object interface is older and therefore more tried and tested. It used to have a reputation for slowness, but the code was overhauled in 7.1 to address this issue. The LO interface is a little convoluted and takes some getting used to, but works well. For Bytea you can use "normal" SQL syntax. Escaping the data is not really difficult. The Jury is still out on the perfomance issue. You might guesstimate though that LO works better for really big chunks because with bytea, everything needs to be read into memory on select/insert. This probably also means that bytea is likely to be faster for smaller chunks (i.e. those that easily fit into physical memory) because the server OS's and PostgreSQL's cacheing mechanisms would come to the fore if the data is small enough to be held in RAM a bit until it's written out to disk (actually I don't have the faintest idea how large objects are handled internally in Postres so this is more of a hunch on my part). Last but not least it needs to be said that there is ample controversy on the subject of whether to stuff binary data into the SQL database at all - the alternative is to put them into the file system and store the path to the file in an SQL table column. For web applications in particular, the case _against_ putting e.g. images (think of a content management app, for instance) into the database is pretty strong because - especially on a loaded webserver - the overhead of retrieving images from the database versus serving them straight from the file system is pretty huge. Remember that web servers are designed to do this a quickly as possible, whereas with transactional databases, the design focus is elsewhere: data integrity and consistency, transactional functionality, insert/update speeds, and then select speeds. You might see orders of magnitude in performance difference. However, if you do this - keep the big binary chunks out of the database - then you partly compromise the very rationale for using a database in the first place: namely data integrity, consistency, and transactional functionality. What if someone accesses the files through the file system directly rather than through your application which manipulates filesystem and database simultaneously - your system would go out of sync. What if 2 users try to manipulate the same set of data simultaneously? How do you ensure transactional atomicity if you have related data in 2 different places? Hence what I decided to do for the next web project is this: Keeping the image data in the database, but putting a copy of the image into the filesystem, as in a "permanent image cache". The data in the database is then always considered authoritative, and the image is only written out to the filesystem cache once a transaction is successfully completed. For a more philosophical, visionary slant on the subject matter, I would recommend Hans Reiser's seminal paper on what he thinks ReiserFS might be, or should be in the future: http://www.namesys.com/whitepaper.html He argues for a conversion between filesystems and databases and approaches the problem from the filesystem end, i.e. he wants to make filesystems more database-like. I was thinking that it would be actually neat to have filesystem-like functionality with an SQL database, and did some searching: There is a little paper on how to do this with Postgres about something the author calls PgFS (allows you to NFS mount a database), but as someone else on some other list notes "code has not been sighted in years". There is another approach called PerlFS which one might use for something interesting with Posgres DBI, but the site where it used to be has been down for months. Regards, Frank
On Thu, Apr 04, 2002 at 11:40:51AM +0200, Frank Joerdens wrote: > On Wed, Apr 03, 2002 at 04:47:29PM +0200, Jean-Christophe FABRE wrote: > > > > Hi, > > > > I would like to store .pdf files in a database, which is accessed through a > > PHP interface. > > which is the best way to store binary data? > > - bytea? (with the tricky handling of escape chars) > > - large objects? > > I asked the same question a week ago on pgsql-general. Here's my 2 > cents: The Postgres large object interface is older and therefore more > tried and tested. It used to have a reputation for slowness, but the > code was overhauled in 7.1 to address this issue. The LO interface is a > little convoluted and takes some getting used to, but works well. For > Bytea you can use "normal" SQL syntax. Escaping the data is not > really difficult. I forgot: A distinct advantage of bytea over large objects is that you can still use pg_dumpall to back up your entire server in a single step. It doesn't work with large objects because pg_dumpall can't create binary output in an e.g. tar file. This means that if you use large objects, you have to back up your databases one by one, using a binary output format with pg_dump. Regards, Frank
On Wed, 3 Apr 2002 15:58:55 -0500 (EST) Michael Adler <adler@glimpser.org> wrote: > I can do full joins just fine on two tables at a time: > > FROM t1 FULL JOIN t2 ON (t1.id = t2.id) > > Now, how would I full joint in a third table? > > FROM t1 FULL JOIN t2 ON (t1.id = t2.id) FULL JOIN t3 ON t1.id = t3.id > > That previous line would not join together two rows from t2 and t3. This topic is the same as the previous discussions(see the following URL). It seems to be still impossible to merge all the tables by that query. http://groups.google.com/groups?hl=en&threadm=3507.1006111223%40sss.pgh.pa.us&rnum=1&prev=/groups%3Fq%3DCOALESCE(t1.name,t2.name)%26hl%3Den%26selm%3D3507.1006111223%2540sss.pgh.pa.us%26rnum%3D1 But, if using a COALESCE(), you'll be able to merge. t1.id: 1,2, 4,5 t2.id: 1, 3,4 t3.id: 2,3, 5,6 SELECT t1.id AS id1, t2.id AS id2, t3.id AS id3 FROM t1 FULL JOIN t2 ON (t1.id = t2.id) FULL JOIN t3 ON (t1.id = t3.id); id1 | id2 | id3 -----+-----+----- 1 | 1 | 2 | | 2 | 3 | | | 3 4 | 4 | 5 | | 5 | | 6 (7 rows) explain analyze SELECT t.id1, t.id2 , t3.id AS id3 FROM (SELECT COALESCE(t1.id, t2.id) AS id12, t1.id AS id1, t2.id AS id2 FROM t1 FULL JOIN t2 ON (t1.id = t2.id) ) AS t FULL JOIN t3 ON (t.id12 = t3.id); id1 | id2 | id3 -----+-----+----- 1 | 1 | 2 | | 2 | 3 | 3 <-- being merged 4 | 4 | 5 | | 5 | | 6 (6 rows) Regards, Masaru Sugawara
Thanks for the link Masaru. We're exploring a different design that will be more scalable and also uses LEFT JOINs. On Fri, 5 Apr 2002, Masaru Sugawara wrote: > Date: Fri, 05 Apr 2002 00:01:46 +0900 > From: Masaru Sugawara <rk73@sea.plala.or.jp> > To: Michael Adler <adler@glimpser.org> > Cc: pgsql-sql@postgresql.org > Subject: Re: [SQL] FULL JOIN with 3 or more tables > > On Wed, 3 Apr 2002 15:58:55 -0500 (EST) > Michael Adler <adler@glimpser.org> wrote: > > > > I can do full joins just fine on two tables at a time: > > > > FROM t1 FULL JOIN t2 ON (t1.id = t2.id) > > > > Now, how would I full joint in a third table? > > > > FROM t1 FULL JOIN t2 ON (t1.id = t2.id) FULL JOIN t3 ON t1.id = t3.id > > > > That previous line would not join together two rows from t2 and t3. > > > This topic is the same as the previous discussions(see the following URL). > It seems to be still impossible to merge all the tables by that query. > > http://groups.google.com/groups?hl=en&threadm=3507.1006111223%40sss.pgh.pa.us&rnum=1&prev=/groups%3Fq%3DCOALESCE(t1.name,t2.name)%26hl%3Den%26selm%3D3507.1006111223%2540sss.pgh.pa.us%26rnum%3D1 > > > > But, if using a COALESCE(), you'll be able to merge. > > t1.id: 1,2, 4,5 > t2.id: 1, 3,4 > t3.id: 2,3, 5,6 > > SELECT t1.id AS id1, t2.id AS id2, t3.id AS id3 > FROM t1 FULL JOIN t2 ON (t1.id = t2.id) > FULL JOIN t3 ON (t1.id = t3.id); > > id1 | id2 | id3 > -----+-----+----- > 1 | 1 | > 2 | | 2 > | 3 | > | | 3 > 4 | 4 | > 5 | | 5 > | | 6 > (7 rows) > explain analyze > SELECT t.id1, t.id2 , t3.id AS id3 > FROM (SELECT COALESCE(t1.id, t2.id) AS id12, > t1.id AS id1, t2.id AS id2 > FROM t1 FULL JOIN t2 ON (t1.id = t2.id) > ) AS t FULL JOIN t3 ON (t.id12 = t3.id); > > id1 | id2 | id3 > -----+-----+----- > 1 | 1 | > 2 | | 2 > | 3 | 3 <-- being merged > 4 | 4 | > 5 | | 5 > | | 6 > (6 rows) > > > Regards, > Masaru Sugawara > > Mike