Thread: Storing images in PG?

Storing images in PG?

From

"Dr. Evil"

Date:

16 August 2001, 02:07:05


I am creating a DB which is going to have to store a lot of small
images.  Nothing will be over 10k.  This is with PG7.2.  I have a few
questions: Should I use blob or bytea (neither seem to be well
documented), and how do I access that binary information?  For
instance, if I have this table in PG:

CREATE TABLE imagetable (
       ....
       image BYTEA,
       ....
);

and then I do this in PHP:

$result = pg_exec($db, "SELECT image FROM imagetable WHERE ...");
$ary = pg_fetch_array($result, 0);
$imagedata = $ary["image"];

Will all of the above work?

If I can handle images in the DB, this is going to be the coolest
wireless app.

Thanks

Re: Storing images in PG?

From

Karel Zak

Date:

16 August 2001, 04:11:14

On Thu, Aug 16, 2001 at 06:02:00AM -0000, Dr. Evil wrote:
>
>
> I am creating a DB which is going to have to store a lot of small
> images.  Nothing will be over 10k.  This is with PG7.2.  I have a few
> questions: Should I use blob or bytea (neither seem to be well
> documented), and how do I access that binary information?  For
> instance, if I have this table in PG:
>
> CREATE TABLE imagetable (
>        ....
>        image BYTEA,
>        ....
> );
>
> and then I do this in PHP:
>
> $result = pg_exec($db, "SELECT image FROM imagetable WHERE ...");
> $ary = pg_fetch_array($result, 0);
> $imagedata = $ary["image"];
>
> Will all of the above work?

 Yes, but good is encode binary data (image) to same ascii safe encoding
like base64, else you will have problem in INSERT/UPDATE queries.

            Karel

--
 Karel Zak  <zakkr@zf.jcu.cz>
 http://home.zf.jcu.cz/~zakkr/

 C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

Re: Re: Storing images in PG?

From

Tod McQuillin

Date:

16 August 2001, 04:41:52

On Thu, 16 Aug 2001, Karel Zak wrote:

> On Thu, Aug 16, 2001 at 06:02:00AM -0000, Dr. Evil wrote:
>
> > CREATE TABLE imagetable (
> >        ....
> >        image BYTEA,
> >        ....
> > );
> >
> > and then I do this in PHP:
> >
> > $result = pg_exec($db, "SELECT image FROM imagetable WHERE ...");
> > $ary = pg_fetch_array($result, 0);
> > $imagedata = $ary["image"];
> >
> > Will all of the above work?
>
>  Yes, but good is encode binary data (image) to same ascii safe encoding
> like base64, else you will have problem in INSERT/UPDATE queries.

If you encode with base64 you can use use varchar or text to store the
data.  Then what is the point of the 'bytea' type?

Encoding/decoding base64 is a lot of overhead for large images (both in
cpu time and disk space) -- is there no way to store and retrieve raw
binary data for bytea columns?
--
Tod McQuillin

RE: Re: Storing images in PG?

From

"Andrew SNow"

Date:

16 August 2001, 04:52:55

>  Yes, but good is encode binary data (image) to same ascii safe
> encoding like base64, else you will have problem in INSERT/UPDATE
> queries.

Correct me if I'm wrong, but the only thing "lztext" can't store is
NULLs, which could escaped somehow with other (rare) characters.  Still
overhead, but you shouldn't be storing large pieces of binary data this
way, I think.


- Andrew

Re: Re: Storing images in PG?

From

Karel Zak

Date:

16 August 2001, 05:17:44

On Thu, Aug 16, 2001 at 06:52:32PM +1000, Andrew SNow wrote:
>
> >  Yes, but good is encode binary data (image) to same ascii safe
> > encoding like base64, else you will have problem in INSERT/UPDATE
> > queries.
>
> Correct me if I'm wrong, but the only thing "lztext" can't store is
> NULLs, which could escaped somehow with other (rare) characters.  Still
> overhead, but you shouldn't be storing large pieces of binary data this
> way, I think.

 Here is not a problem with storing binary data to some datetype (like
bytea), but problem is with other parts of PG. For example libpq
functions read queries as string, other thing is query parsing where
is needful set correct chars (like quote), etc.
 IMHO if you want to work with queries with no problems and limits is
better prevent all by some encoding. This solution is also portable to
other SQLs because depend on your frontend application only. I haven't
care about CPU time for this, the base64-algorithm is bit operation
only, a problem can be 30% grow of data... (you can use "lztext" that is
compressed datype:-).

             Karel

--
 Karel Zak  <zakkr@zf.jcu.cz>
 http://home.zf.jcu.cz/~zakkr/

 C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

RE: Re: Re: Storing images in PG?

From

"Eric Ridge"

Date:

16 August 2001, 05:42:19

> only, a problem can be 30% grow of data... (you can use
> "lztext" that is compressed datype:-).

Alternativly, you can store "pointers" to the images in the database.
Such as a local filepath or url (file:///usr/local/myimages/foo.gif or
ftp://user:passwd@host/pub/myimages/foo.gif).  Then you could use wget
to get the file bytes for you.

The nice thing about doing this is that you don't need to store images
and data on the same server.  It gives you room to expand your storage.
It also improves performance b/c I'm sure FTP or a local file copy is
more efficient than the Postgres network protocol.  Plus, no need to
encode and constantly decode the bytes.

Also, if you're not going to be searching the bytes of the file (which
I'm sure you're not), why put it in the database?

eric

RE: Re: Storing images in PG?

From

"Andrew SNow"

Date:

16 August 2001, 05:59:17

> > Correct me if I'm wrong, but the only thing "lztext" can't store is
> > NULLs, which could escaped somehow with other (rare) characters.
> > Still overhead, but you shouldn't be storing large pieces of binary
> > data this way, I think.
>
>  Here is not a problem with storing binary data to some
> datetype (like bytea), but problem is with other parts of PG.
> For example libpq
> functions read queries as string, other thing is query
> parsing where is needful set correct chars (like quote), etc.
>  IMHO if you want to work with queries with no problems and limits is

Right, and I'm saying you can save that space by simply escaping all
your characters in the normal way, e.g. SELECT '\377'  is the ASCII
character 0xFF.  This would work for all characters except NULL, which
you could represent by \377\377  (and you could represent a real \377 by
\377\376 or something).  Then the space wastage would be minimal
compared to using base 64.

I wrote a perl module which wraps around Pg, in which you can use
parameters in queries, and it automatically escapes extended ascii
characters as per above, however I have never tried storing binary data
using the whole 0x01 -> 0xFF range as I just described, but I think it
would work.


- Andrew

Re: Re: Re: Storing images in PG?

From

Gunnar Rønning

Date:

16 August 2001, 07:26:18

* "Eric Ridge" <ebr@tcdi.com> wrote:

|
| Also, if you're not going to be searching the bytes of the file (which
| I'm sure you're not), why put it in the database?

It would be convenient to have fast access to binary data in the database
that is similar to what other databases do. This would allow more
applications to run on PostgreSQL.

Not everybody will have the opportunity to access the local filesystem, e.g.
ISPs providing servlet hosting will likely turn of access to the filesystem.

It is also more work to write code to manage files in the filesystem. Why
store pointers to the actual files in the database, if I can store the
files themselves there.

As for speed of access you could always provide a cache for the images
in your application.


I've done the pointer to filesystem type of implementation in a CMS
system before and that turned out be messy to manage. E.g. If the disk is
full when adding a image, rollback transaction. What if some ignorant user
messes up the files ? Access control to the images, for that we need to
go through the application anyway.



--
Gunnar Rønning - gunnar@polygnosis.com
Senior Consultant, Polygnosis AS, http://www.polygnosis.com/

RE: Re: Re: Storing images in PG?

From

"Eric Ridge"

Date:

16 August 2001, 09:07:29

> It would be convenient to have fast access to binary data in
> the database
> that is similar to what other databases do. This would allow more
> applications to run on PostgreSQL.

For small sets of files you're probably right.  When you start to reach
millions of files, it makes much more sense to separate things.

I live in an environment where we manage just over 18 million images.
We do the pointer thing and have a completely separate "image farm" to
store the images.

> Not everybody will have the opportunity to access the local
> filesystem, e.g.
> ISPs providing servlet hosting will likely turn of access to
> the filesystem.

This is very true.  But if you do have access to the local filesystem,
why not use it?  It's far more efficient that executing a query against
the database and forcing the database to chunk in the data.

> It is also more work to write code to manage files in the
> filesystem. Why
> store pointers to the actual files in the database, if I can store the
> files themselves there.

In my mind, databases are for storing things that you want to search.
It makes no sense to search the bytes of an image, but it does make
sense to search the database to find where the image is located.

Plus, if you do the pointer thing and if the pointers are URL's, images
can live anywhere: local fileystem, remote ftp server, remote web
server, remote samba server, whatever.

> As for speed of access you could always provide a cache for the images
> in your application.

true.  You'd probably want to do that regardless of where the images are
physically located.

>
> I've done the pointer to filesystem type of implementation in a CMS
> system before and that turned out be messy to manage. E.g. If
> the disk is
> full when adding a image, rollback transaction. What if some
> ignorant user
> messes up the files ? Access control to the images, for that
> we need to go through the application anyway.

If you're managing large systems things are going to be complicated,
whether you're tracking images or chickens.  You've got to find the
solution that will give you the most long-term flexibility, even if that
means more complication.

eric

RE: Re: Re: Storing images in PG?

From

Lee Kindness

Date:

16 August 2001, 10:08:50

Eric Ridge writes:
 > [files in or out of the DB? ]
 > For small sets of files you're probably right. When you start to
 > reach millions of files, it makes much more sense to separate
 > things.

I think it's probably got more to do with the purpose of the database
and how desirable it is to have it self contained. Many installations
just use a single database but personally the software I'm involved in
has lots of databases. It makes a lot of sense to have everything
contained in a single location so backups can be done effortlessly
using standard tools.

Lee.

Re: Re: Re: Storing images in PG?

From

Gunnar Rønning

Date:

16 August 2001, 10:22:35

* "Eric Ridge" <ebr@tcdi.com> wrote:

|
| If you're managing large systems things are going to be complicated,
| whether you're tracking images or chickens.  You've got to find the
| solution that will give you the most long-term flexibility, even if that
| means more complication.

True, but we actually lost some flexibility in our CMS product with this
approach as some potential customers just didn't want to buy the product
because it depended on filesystem storage for images...

The reason we went with filesystem storage in the first place was however
performance,  but I don't think we would make the same choice the next
time around...

regards,

        Gunnar
--
Gunnar Rønning - gunnar@polygnosis.com
Senior Consultant, Polygnosis AS, http://www.polygnosis.com/

Re: Storing images in PG?

From

"Joe Conway"

Date:

16 August 2001, 11:52:49

> I am creating a DB which is going to have to store a lot of small
> images.  Nothing will be over 10k.  This is with PG7.2.  I have a few
> questions: Should I use blob or bytea (neither seem to be well
> documented), and how do I access that binary information?  For
> instance, if I have this table in PG:
>
> CREATE TABLE imagetable (
>        ....
>        image BYTEA,
>        ....
> );
>
> and then I do this in PHP:
>
> $result = pg_exec($db, "SELECT image FROM imagetable WHERE ...");
> $ary = pg_fetch_array($result, 0);
> $imagedata = $ary["image"];
>
> Will all of the above work?
>
> If I can handle images in the DB, this is going to be the coolest
> wireless app.

I'll stay out of the images should be stored in the database vs filesystem
discussion -- I've seen it many times and it always gets religious ;-) But
regardless of why you want to store binary data in PostgreSQL, here's some
info based on recent experience.

I have found (and confirmed by studying the PostgreSQL source) that to
reliably insert arbitrary binary data into a bytea column there are only 3
characters which need to be escaped: \000, \047 ( ' ), \134 ( \ ). Here's
the PHP function that I've been using:

function sqlesc($ct)
{
 $buf = "";
 for ($i = 0; $i < strlen($ct); $i++)
 {
  if (ord($ct[$i]) == 0)
   $buf .= "\\\\000";
  else if (ord($ct[$i]) == 39)
   $buf .= "\\\\047";
  else if (ord($ct[$i]) == 92)
   $buf .= "\\\\134";
  else
   $buf .= $ct[$i];
 }
 return $buf;
}

As far as storage is concerned, all escaped characters get converted back
into their single byte equivilent for storage, so using bytea is the most
efficient way to store binary in the database. Encoding to base64 or hex and
storing as text wastes cpu cycles and storage space, at least on INSERT.
However on retrieval, PostgreSQL will escape all "nonprintable" characters
(based on the C "isprint()" function), which is quite a few. Your PHP app
will have to unescape all of the nonprintable characters. I haven't written
a function for that yet, but it shouldn't be too hard.

I guess another option would be to encode the retrieved data as base64, like
this (works in 7.2devel *only*):

    SELECT encode(image, 'hex') FROM imagetable WHERE ...

And then convert from hex back to bin:

function hex2bin($data)
{
 $data = trim($data);
 $len = strlen($data);
 return pack("H" . $len, $data);
}


If PHP could open a binary cursor (which I don't think it can, but could be
wrong), the retrieval conversion step could be avoided also (I think). I
might just try to hack PHP and add that capability, but no promises ;)

Hope this helps,

-- Joe

RE: Storing images in PG?

From

"Andrew Snow"

Date:

16 August 2001, 12:38:42


> I have found (and confirmed by studying the PostgreSQL
> source) that to reliably insert arbitrary binary data into a
> bytea column there are only 3 characters which need to be
> escaped: \000, \047 ( ' ), \134 ( \ ). Here's the PHP
> function that I've been using:


Postgresl, in treating things as strings, handles \000 as NULL as an end
of string.

 select 'abc\000def' as hehehe;
 hehehe
--------
 abc
(1 row)


How do you get the data back from a query, with all the nulls in it?


- Andrew

Re: Storing images in PG?

From

"Joe Conway"

Date:

16 August 2001, 13:38:18

> I guess another option would be to encode the retrieved data as base64,
like
> this (works in 7.2devel *only*):
>
>     SELECT encode(image, 'hex') FROM imagetable WHERE ...
>
> And then convert from hex back to bin:
>
> function hex2bin($data)
> {
>  $data = trim($data);
>  $len = strlen($data);
>  return pack("H" . $len, $data);
> }
>

Sorry, I meant to say "to encode the retrieved data as hex, like . . ."
above. But you could also use base64:

     SELECT encode(image, 'base64') FROM imagetable WHERE ...

function base64_to_bin($data)
{
 $data = trim($data);
 return base64_decode($data);
}

-- Joe

Re: Storing images in PG?

From

"Joe Conway"

Date:

16 August 2001, 14:26:47

> > I have found (and confirmed by studying the PostgreSQL
> > source) that to reliably insert arbitrary binary data into a
> > bytea column there are only 3 characters which need to be
> > escaped: \000, \047 ( ' ), \134 ( \ ). Here's the PHP
> > function that I've been using:
>
>
> Postgresl, in treating things as strings, handles \000 as NULL as an end
> of string.
>
>  select 'abc\000def' as hehehe;
>  hehehe
> --------
>  abc
> (1 row)
>
>
> How do you get the data back from a query, with all the nulls in it?

Try this:

test=# select 'abc\\000def' as hehehe;
   hehehe
------------
 abc\000def
(1 row)

Notice that in the PHP function:
{
  if (ord($ct[$i]) == 0)
   $buf .= "\\\\000";

The reason for 4 '\'s (\\\\) in the function is that PHP interprets '\\' as
an escaped '\', and turns the result into '\\000'. Postgres then sees the
'\\' as an escaped '\' and converts this into '\000' ( I think this happens
in the backend, not sure exactly which stage, but prior to byteain()).
Finally, byteain() looks for '\###', and interprets it as an escaped octal
value, and converts it into a single binary character.

On the way back to psql (or whatever your client app is), the binary data
stored in the heap is run through byteaout() which converts "non-printable"
characters back to escaped octal equivilents before sending the data to the
client (i.e. this all happens in the backend). The client receives only the
escaped version of the data, hence the '\000' in the example above.

In 7.2devel there are two new functions (actually, I think they exist in
release versions as part of pgcrypto) called encode and decode. So you can
use encode to convert the binary data to either hex or base64 in the
backend, before it gets sent to the client. This may be more convenient for
you than the octal escaped form if your client already has hex-to-binary or
base64-to-binary functions available to it.

As I said in the previous post, I think if your client uses a binary cursor
to retrieve data, the bytea data will be returned to the client in its
original binary form (i.e. unescaped and unencoded), which avoids
unnecessary (and cpu cycle wasting) conversions. I'll be trying this myself
soon, so I guess I'll find out ;)

-- Joe

Re: Storing images in PG?

From

"Joe Conway"

Date:

16 August 2001, 14:52:00

> Postgresl, in treating things as strings, handles \000 as NULL as an end
> of string.
>
>  select 'abc\000def' as hehehe;
>  hehehe
> --------
>  abc
> (1 row)
>

By the way, the '\000' string in the select statement above does get
converted to '\0' prior to byteain(), and that is precisely why the value
returned is truncated at that point. Take a look at the following snipit of
code:

<snip>
Datum
byteain(PG_FUNCTION_ARGS)
{
 char    *inputText = PG_GETARG_CSTRING(0);
 char    *tp;
 char    *rp;
 int   byte;
 bytea    *result;

 for (byte = 0, tp = inputText; *tp != '\0'; byte++)
</snip>

Notice that byteain() gives up as soon as it hits a '\0' in the input
string. So the '\\000' on the client end turns into '\000' by the time it
hits byteain(), and byteain converts it to a single character '\0'. Hope
this all makes sense.

-- Joe

Re: Storing images in PG?

From

Jan Wieck

Date:

16 August 2001, 15:08:31

Andrew Snow wrote:
>
>
> > I have found (and confirmed by studying the PostgreSQL
> > source) that to reliably insert arbitrary binary data into a
> > bytea column there are only 3 characters which need to be
> > escaped: \000, \047 ( ' ), \134 ( \ ). Here's the PHP
> > function that I've been using:
>
>
> Postgresl, in treating things as strings, handles \000 as NULL as an end
> of string.
>
>  select 'abc\000def' as hehehe;
>  hehehe
> --------
>  abc
> (1 row)

    You forgot to quote the \ and cast it to bytea.

>
>
> How do you get the data back from a query, with all the nulls in it?
>
>
> - Andrew
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

Re: Storing images in PG?

From

"Dr. Evil"

Date:

16 August 2001, 17:03:33

Well, thanks for the sugestions to all who responded to my question.
After reading all the sugestions, I have come to the conclusion that
I'm going to store everything as base64 encoded TEXT.  The main reason
for this is that all the images are going to be be small icons or
things which can display on WAP phones.  I looked at them in PNG and
WBPM format, and the images end up between 1k and 3k.  For images this
small, PG should be able to handle it with no difficulty.  There is
some CPU cost in doing the base64 conversion on the fly, but CPU is
cheap and I'm expensive, so we'll buy CPU.  In fact, they may be
stored as PNGs and even converted on the fly to WBPM as needed.

I guess if I were storing 1mb images, I might consider using the FS
instead of PG, but in this case, it seems reasonable to use PG, just
for simplicity.

Thanks