Thread: Storing images in PG?
I am creating a DB which is going to have to store a lot of small images. Nothing will be over 10k. This is with PG7.2. I have a few questions: Should I use blob or bytea (neither seem to be well documented), and how do I access that binary information? For instance, if I have this table in PG: CREATE TABLE imagetable ( .... image BYTEA, .... ); and then I do this in PHP: $result = pg_exec($db, "SELECT image FROM imagetable WHERE ..."); $ary = pg_fetch_array($result, 0); $imagedata = $ary["image"]; Will all of the above work? If I can handle images in the DB, this is going to be the coolest wireless app. Thanks
On Thu, Aug 16, 2001 at 06:02:00AM -0000, Dr. Evil wrote: > > > I am creating a DB which is going to have to store a lot of small > images. Nothing will be over 10k. This is with PG7.2. I have a few > questions: Should I use blob or bytea (neither seem to be well > documented), and how do I access that binary information? For > instance, if I have this table in PG: > > CREATE TABLE imagetable ( > .... > image BYTEA, > .... > ); > > and then I do this in PHP: > > $result = pg_exec($db, "SELECT image FROM imagetable WHERE ..."); > $ary = pg_fetch_array($result, 0); > $imagedata = $ary["image"]; > > Will all of the above work? Yes, but good is encode binary data (image) to same ascii safe encoding like base64, else you will have problem in INSERT/UPDATE queries. Karel -- Karel Zak <zakkr@zf.jcu.cz> http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
On Thu, 16 Aug 2001, Karel Zak wrote: > On Thu, Aug 16, 2001 at 06:02:00AM -0000, Dr. Evil wrote: > > > CREATE TABLE imagetable ( > > .... > > image BYTEA, > > .... > > ); > > > > and then I do this in PHP: > > > > $result = pg_exec($db, "SELECT image FROM imagetable WHERE ..."); > > $ary = pg_fetch_array($result, 0); > > $imagedata = $ary["image"]; > > > > Will all of the above work? > > Yes, but good is encode binary data (image) to same ascii safe encoding > like base64, else you will have problem in INSERT/UPDATE queries. If you encode with base64 you can use use varchar or text to store the data. Then what is the point of the 'bytea' type? Encoding/decoding base64 is a lot of overhead for large images (both in cpu time and disk space) -- is there no way to store and retrieve raw binary data for bytea columns? -- Tod McQuillin
> Yes, but good is encode binary data (image) to same ascii safe > encoding like base64, else you will have problem in INSERT/UPDATE > queries. Correct me if I'm wrong, but the only thing "lztext" can't store is NULLs, which could escaped somehow with other (rare) characters. Still overhead, but you shouldn't be storing large pieces of binary data this way, I think. - Andrew
On Thu, Aug 16, 2001 at 06:52:32PM +1000, Andrew SNow wrote: > > > Yes, but good is encode binary data (image) to same ascii safe > > encoding like base64, else you will have problem in INSERT/UPDATE > > queries. > > Correct me if I'm wrong, but the only thing "lztext" can't store is > NULLs, which could escaped somehow with other (rare) characters. Still > overhead, but you shouldn't be storing large pieces of binary data this > way, I think. Here is not a problem with storing binary data to some datetype (like bytea), but problem is with other parts of PG. For example libpq functions read queries as string, other thing is query parsing where is needful set correct chars (like quote), etc. IMHO if you want to work with queries with no problems and limits is better prevent all by some encoding. This solution is also portable to other SQLs because depend on your frontend application only. I haven't care about CPU time for this, the base64-algorithm is bit operation only, a problem can be 30% grow of data... (you can use "lztext" that is compressed datype:-). Karel -- Karel Zak <zakkr@zf.jcu.cz> http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
> only, a problem can be 30% grow of data... (you can use > "lztext" that is compressed datype:-). Alternativly, you can store "pointers" to the images in the database. Such as a local filepath or url (file:///usr/local/myimages/foo.gif or ftp://user:passwd@host/pub/myimages/foo.gif). Then you could use wget to get the file bytes for you. The nice thing about doing this is that you don't need to store images and data on the same server. It gives you room to expand your storage. It also improves performance b/c I'm sure FTP or a local file copy is more efficient than the Postgres network protocol. Plus, no need to encode and constantly decode the bytes. Also, if you're not going to be searching the bytes of the file (which I'm sure you're not), why put it in the database? eric
> > Correct me if I'm wrong, but the only thing "lztext" can't store is > > NULLs, which could escaped somehow with other (rare) characters. > > Still overhead, but you shouldn't be storing large pieces of binary > > data this way, I think. > > Here is not a problem with storing binary data to some > datetype (like bytea), but problem is with other parts of PG. > For example libpq > functions read queries as string, other thing is query > parsing where is needful set correct chars (like quote), etc. > IMHO if you want to work with queries with no problems and limits is Right, and I'm saying you can save that space by simply escaping all your characters in the normal way, e.g. SELECT '\377' is the ASCII character 0xFF. This would work for all characters except NULL, which you could represent by \377\377 (and you could represent a real \377 by \377\376 or something). Then the space wastage would be minimal compared to using base 64. I wrote a perl module which wraps around Pg, in which you can use parameters in queries, and it automatically escapes extended ascii characters as per above, however I have never tried storing binary data using the whole 0x01 -> 0xFF range as I just described, but I think it would work. - Andrew
* "Eric Ridge" <ebr@tcdi.com> wrote: | | Also, if you're not going to be searching the bytes of the file (which | I'm sure you're not), why put it in the database? It would be convenient to have fast access to binary data in the database that is similar to what other databases do. This would allow more applications to run on PostgreSQL. Not everybody will have the opportunity to access the local filesystem, e.g. ISPs providing servlet hosting will likely turn of access to the filesystem. It is also more work to write code to manage files in the filesystem. Why store pointers to the actual files in the database, if I can store the files themselves there. As for speed of access you could always provide a cache for the images in your application. I've done the pointer to filesystem type of implementation in a CMS system before and that turned out be messy to manage. E.g. If the disk is full when adding a image, rollback transaction. What if some ignorant user messes up the files ? Access control to the images, for that we need to go through the application anyway. -- Gunnar Rønning - gunnar@polygnosis.com Senior Consultant, Polygnosis AS, http://www.polygnosis.com/
> It would be convenient to have fast access to binary data in > the database > that is similar to what other databases do. This would allow more > applications to run on PostgreSQL. For small sets of files you're probably right. When you start to reach millions of files, it makes much more sense to separate things. I live in an environment where we manage just over 18 million images. We do the pointer thing and have a completely separate "image farm" to store the images. > Not everybody will have the opportunity to access the local > filesystem, e.g. > ISPs providing servlet hosting will likely turn of access to > the filesystem. This is very true. But if you do have access to the local filesystem, why not use it? It's far more efficient that executing a query against the database and forcing the database to chunk in the data. > It is also more work to write code to manage files in the > filesystem. Why > store pointers to the actual files in the database, if I can store the > files themselves there. In my mind, databases are for storing things that you want to search. It makes no sense to search the bytes of an image, but it does make sense to search the database to find where the image is located. Plus, if you do the pointer thing and if the pointers are URL's, images can live anywhere: local fileystem, remote ftp server, remote web server, remote samba server, whatever. > As for speed of access you could always provide a cache for the images > in your application. true. You'd probably want to do that regardless of where the images are physically located. > > I've done the pointer to filesystem type of implementation in a CMS > system before and that turned out be messy to manage. E.g. If > the disk is > full when adding a image, rollback transaction. What if some > ignorant user > messes up the files ? Access control to the images, for that > we need to go through the application anyway. If you're managing large systems things are going to be complicated, whether you're tracking images or chickens. You've got to find the solution that will give you the most long-term flexibility, even if that means more complication. eric
Eric Ridge writes: > [files in or out of the DB? ] > For small sets of files you're probably right. When you start to > reach millions of files, it makes much more sense to separate > things. I think it's probably got more to do with the purpose of the database and how desirable it is to have it self contained. Many installations just use a single database but personally the software I'm involved in has lots of databases. It makes a lot of sense to have everything contained in a single location so backups can be done effortlessly using standard tools. Lee.
* "Eric Ridge" <ebr@tcdi.com> wrote: | | If you're managing large systems things are going to be complicated, | whether you're tracking images or chickens. You've got to find the | solution that will give you the most long-term flexibility, even if that | means more complication. True, but we actually lost some flexibility in our CMS product with this approach as some potential customers just didn't want to buy the product because it depended on filesystem storage for images... The reason we went with filesystem storage in the first place was however performance, but I don't think we would make the same choice the next time around... regards, Gunnar -- Gunnar Rønning - gunnar@polygnosis.com Senior Consultant, Polygnosis AS, http://www.polygnosis.com/
> I am creating a DB which is going to have to store a lot of small > images. Nothing will be over 10k. This is with PG7.2. I have a few > questions: Should I use blob or bytea (neither seem to be well > documented), and how do I access that binary information? For > instance, if I have this table in PG: > > CREATE TABLE imagetable ( > .... > image BYTEA, > .... > ); > > and then I do this in PHP: > > $result = pg_exec($db, "SELECT image FROM imagetable WHERE ..."); > $ary = pg_fetch_array($result, 0); > $imagedata = $ary["image"]; > > Will all of the above work? > > If I can handle images in the DB, this is going to be the coolest > wireless app. I'll stay out of the images should be stored in the database vs filesystem discussion -- I've seen it many times and it always gets religious ;-) But regardless of why you want to store binary data in PostgreSQL, here's some info based on recent experience. I have found (and confirmed by studying the PostgreSQL source) that to reliably insert arbitrary binary data into a bytea column there are only 3 characters which need to be escaped: \000, \047 ( ' ), \134 ( \ ). Here's the PHP function that I've been using: function sqlesc($ct) { $buf = ""; for ($i = 0; $i < strlen($ct); $i++) { if (ord($ct[$i]) == 0) $buf .= "\\\\000"; else if (ord($ct[$i]) == 39) $buf .= "\\\\047"; else if (ord($ct[$i]) == 92) $buf .= "\\\\134"; else $buf .= $ct[$i]; } return $buf; } As far as storage is concerned, all escaped characters get converted back into their single byte equivilent for storage, so using bytea is the most efficient way to store binary in the database. Encoding to base64 or hex and storing as text wastes cpu cycles and storage space, at least on INSERT. However on retrieval, PostgreSQL will escape all "nonprintable" characters (based on the C "isprint()" function), which is quite a few. Your PHP app will have to unescape all of the nonprintable characters. I haven't written a function for that yet, but it shouldn't be too hard. I guess another option would be to encode the retrieved data as base64, like this (works in 7.2devel *only*): SELECT encode(image, 'hex') FROM imagetable WHERE ... And then convert from hex back to bin: function hex2bin($data) { $data = trim($data); $len = strlen($data); return pack("H" . $len, $data); } If PHP could open a binary cursor (which I don't think it can, but could be wrong), the retrieval conversion step could be avoided also (I think). I might just try to hack PHP and add that capability, but no promises ;) Hope this helps, -- Joe
> I have found (and confirmed by studying the PostgreSQL > source) that to reliably insert arbitrary binary data into a > bytea column there are only 3 characters which need to be > escaped: \000, \047 ( ' ), \134 ( \ ). Here's the PHP > function that I've been using: Postgresl, in treating things as strings, handles \000 as NULL as an end of string. select 'abc\000def' as hehehe; hehehe -------- abc (1 row) How do you get the data back from a query, with all the nulls in it? - Andrew
> I guess another option would be to encode the retrieved data as base64, like > this (works in 7.2devel *only*): > > SELECT encode(image, 'hex') FROM imagetable WHERE ... > > And then convert from hex back to bin: > > function hex2bin($data) > { > $data = trim($data); > $len = strlen($data); > return pack("H" . $len, $data); > } > Sorry, I meant to say "to encode the retrieved data as hex, like . . ." above. But you could also use base64: SELECT encode(image, 'base64') FROM imagetable WHERE ... function base64_to_bin($data) { $data = trim($data); return base64_decode($data); } -- Joe
> > I have found (and confirmed by studying the PostgreSQL > > source) that to reliably insert arbitrary binary data into a > > bytea column there are only 3 characters which need to be > > escaped: \000, \047 ( ' ), \134 ( \ ). Here's the PHP > > function that I've been using: > > > Postgresl, in treating things as strings, handles \000 as NULL as an end > of string. > > select 'abc\000def' as hehehe; > hehehe > -------- > abc > (1 row) > > > How do you get the data back from a query, with all the nulls in it? Try this: test=# select 'abc\\000def' as hehehe; hehehe ------------ abc\000def (1 row) Notice that in the PHP function: { if (ord($ct[$i]) == 0) $buf .= "\\\\000"; The reason for 4 '\'s (\\\\) in the function is that PHP interprets '\\' as an escaped '\', and turns the result into '\\000'. Postgres then sees the '\\' as an escaped '\' and converts this into '\000' ( I think this happens in the backend, not sure exactly which stage, but prior to byteain()). Finally, byteain() looks for '\###', and interprets it as an escaped octal value, and converts it into a single binary character. On the way back to psql (or whatever your client app is), the binary data stored in the heap is run through byteaout() which converts "non-printable" characters back to escaped octal equivilents before sending the data to the client (i.e. this all happens in the backend). The client receives only the escaped version of the data, hence the '\000' in the example above. In 7.2devel there are two new functions (actually, I think they exist in release versions as part of pgcrypto) called encode and decode. So you can use encode to convert the binary data to either hex or base64 in the backend, before it gets sent to the client. This may be more convenient for you than the octal escaped form if your client already has hex-to-binary or base64-to-binary functions available to it. As I said in the previous post, I think if your client uses a binary cursor to retrieve data, the bytea data will be returned to the client in its original binary form (i.e. unescaped and unencoded), which avoids unnecessary (and cpu cycle wasting) conversions. I'll be trying this myself soon, so I guess I'll find out ;) -- Joe
> Postgresl, in treating things as strings, handles \000 as NULL as an end > of string. > > select 'abc\000def' as hehehe; > hehehe > -------- > abc > (1 row) > By the way, the '\000' string in the select statement above does get converted to '\0' prior to byteain(), and that is precisely why the value returned is truncated at that point. Take a look at the following snipit of code: <snip> Datum byteain(PG_FUNCTION_ARGS) { char *inputText = PG_GETARG_CSTRING(0); char *tp; char *rp; int byte; bytea *result; for (byte = 0, tp = inputText; *tp != '\0'; byte++) </snip> Notice that byteain() gives up as soon as it hits a '\0' in the input string. So the '\\000' on the client end turns into '\000' by the time it hits byteain(), and byteain converts it to a single character '\0'. Hope this all makes sense. -- Joe
Andrew Snow wrote: > > > > I have found (and confirmed by studying the PostgreSQL > > source) that to reliably insert arbitrary binary data into a > > bytea column there are only 3 characters which need to be > > escaped: \000, \047 ( ' ), \134 ( \ ). Here's the PHP > > function that I've been using: > > > Postgresl, in treating things as strings, handles \000 as NULL as an end > of string. > > select 'abc\000def' as hehehe; > hehehe > -------- > abc > (1 row) You forgot to quote the \ and cast it to bytea. > > > How do you get the data back from a query, with all the nulls in it? > > > - Andrew > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Well, thanks for the sugestions to all who responded to my question. After reading all the sugestions, I have come to the conclusion that I'm going to store everything as base64 encoded TEXT. The main reason for this is that all the images are going to be be small icons or things which can display on WAP phones. I looked at them in PNG and WBPM format, and the images end up between 1k and 3k. For images this small, PG should be able to handle it with no difficulty. There is some CPU cost in doing the base64 conversion on the fly, but CPU is cheap and I'm expensive, so we'll buy CPU. In fact, they may be stored as PNGs and even converted on the fly to WBPM as needed. I guess if I were storing 1mb images, I might consider using the FS instead of PG, but in this case, it seems reasonable to use PG, just for simplicity. Thanks