Thread: Storing images as BYTEA or large objects

Storing images as BYTEA or large objects

From
Koen Vermeer
Date:
Hi,

I would like to store binary data in a PostgreSQL database. The size of
the data is about 2 to 20 MB and is always stored or retrieved as a
block (i.e., I do not need to get only part of the data). As I
understand, I have two options for storing this data: As BYTEA or as
large objects. As I understand, the disadvantage of the first method is
that I need to write a custom routine to escape some binary values - or
is there some code available to do this? The disadvantage of large
objects is their existence outside of the main database, which may be
problematic when backing up a database. In addition, I need special
routines to store/retrieve the data.

My two questions are: Is this summary correct? And: Which method should
I choose?

Best,
Koen


Re: Storing images as BYTEA or large objects

From
"Leonel Nunez"
Date:
> Hi,
>
> I would like to store binary data in a PostgreSQL database. The size of
> the data is about 2 to 20 MB and is always stored or retrieved as a
> block (i.e., I do not need to get only part of the data). As I
> understand, I have two options for storing this data: As BYTEA or as
> large objects. As I understand, the disadvantage of the first method is
> that I need to write a custom routine to escape some binary values - or
> is there some code available to do this? The disadvantage of large
> objects is their existence outside of the main database, which may be
> problematic when backing up a database. In addition, I need special
> routines to store/retrieve the data.
>
> My two questions are: Is this summary correct? And: Which method should
> I choose?
>
> Best,
> Koen
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>


With Java , Python , Perl  you've got functions that  escapes the data for
you


Leonel



Re: Storing images as BYTEA or large objects

From
Andy Colson
Date:
Koen Vermeer wrote:
> Hi,
>
> I would like to store binary data in a PostgreSQL database. The size of
> the data is about 2 to 20 MB and is always stored or retrieved as a
> block (i.e., I do not need to get only part of the data). As I
> understand, I have two options for storing this data: As BYTEA or as
> large objects. As I understand, the disadvantage of the first method is
> that I need to write a custom routine to escape some binary values - or
> is there some code available to do this? The disadvantage of large
> objects is their existence outside of the main database, which may be
> problematic when backing up a database. In addition, I need special
> routines to store/retrieve the data.
>
> My two questions are: Is this summary correct? And: Which method should
> I choose?
>
> Best,
> Koen

Having used the large objects, I can tell you they do backup (pg_dump,
etc) and they are not hard to use.  There is even a contrib that helps
you hook them up to a table so they get deleted/etc at appropriate times
(I have not used it though so not sure 100%)  (they were easy enough to
use even without the contrib).

The only reason I used lo was to avoid the escaping.  I'm not sure how
slow escaping a 20 meg file would be, but it just sounded slow.  And you
have to do it twice, once to send it, and once when you get it back.

I'd love to hear from others about that... is it something I should even
worry about?  We store lots of photos in the db, they usually run a meg
or less, but we store lots of 'em.

-Andy

Re: Storing images as BYTEA or large objects

From
"Adam Rich"
Date:
> > I have two options for storing this data: As BYTEA or as large objects.

Is it true that if you update a row containing a large BYTEA value, (even if
you're not updating the BYTEA field itself, just another field), it requires

the entire BYTEA value to be copied to a new row (because of MVCC) ?  Or is
this not true because of TOAST?

If true, would this have an impact on the buffer cache and/or checkpoints ?
(You could always separate out the BYTEA values to their own table by
themselves to avoid this drawback)







Re: Storing images as BYTEA or large objects

From
Tom Lane
Date:
"Adam Rich" <adam.r@sbcglobal.net> writes:
>>> I have two options for storing this data: As BYTEA or as large objects.

> Is it true that if you update a row containing a large BYTEA value, (even if
> you're not updating the BYTEA field itself, just another field), it requires
> the entire BYTEA value to be copied to a new row (because of MVCC) ?  Or is
> this not true because of TOAST?

It is not true.  As long as you don't change the toasted value, it
contributes nothing much to the cost of updating the row.  All that
has to be copied is a 20-byte pointer structure.

However, if you *do* change the toasted value, it is always updated as a
unit.  So if you have use for writing into portions of a large value,
large objects are what you want.

            regards, tom lane

Re: Storing images as BYTEA or large objects

From
"Gevik Babakhani"
Date:
Hi,

Which programming language are you using?

Regards,
Gevik Babakhani
------------------------------------------------
PostgreSQL NL       http://www.postgresql.nl
TrueSoftware BV     http://www.truesoftware.nl
------------------------------------------------


> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Koen Vermeer
> Sent: Tuesday, February 12, 2008 11:41 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Storing images as BYTEA or large objects
>
> Hi,
>
> I would like to store binary data in a PostgreSQL database.
> The size of the data is about 2 to 20 MB and is always stored
> or retrieved as a block (i.e., I do not need to get only part
> of the data). As I understand, I have two options for storing
> this data: As BYTEA or as large objects. As I understand, the
> disadvantage of the first method is that I need to write a
> custom routine to escape some binary values - or is there
> some code available to do this? The disadvantage of large
> objects is their existence outside of the main database,
> which may be problematic when backing up a database. In
> addition, I need special routines to store/retrieve the data.
>
> My two questions are: Is this summary correct? And: Which
> method should I choose?
>
> Best,
> Koen
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>


Re: Storing images as BYTEA or large objects

From
Koen Vermeer
Date:
On Wed, 2008-02-13 at 07:37 +0100, Gevik Babakhani wrote:
> Which programming language are you using?

That would be C++ for storing and both C++ and PHP for retrieving the
data. Maybe also PL/SQL for retrieval (in addition to or instead of
PHP).

Koen


Re: Storing images as BYTEA or large objects

From
Koen Vermeer
Date:
On Tue, 2008-02-12 at 17:16 -0700, Leonel Nunez wrote:
> > My two questions are: Is this summary correct? And: Which method should
> > I choose?
> With Java , Python , Perl  you've got functions that  escapes the data for
> you

What about C++ and PHP?

Koen


Re: Storing images as BYTEA or large objects

From
Koen Vermeer
Date:
On Tue, 2008-02-12 at 21:14 -0600, Andy Colson wrote:
> Having used the large objects, I can tell you they do backup (pg_dump,
> etc) and they are not hard to use.  There is even a contrib that helps
> you hook them up to a table so they get deleted/etc at appropriate times
> (I have not used it though so not sure 100%)  (they were easy enough to
> use even without the contrib).

In my case, I do not really worry about cleaning them up, as I expect
not to delete the objects (or only very rarely) anyway. But it's good to
know (I hope I still remember it in a few months) that there is a
contrib that handles this.

> The only reason I used lo was to avoid the escaping.  I'm not sure how
> slow escaping a 20 meg file would be, but it just sounded slow.  And you
> have to do it twice, once to send it, and once when you get it back.

I thought about that too, and didn't find conclusive arguments to choose
one or the other. So I decided to just ask, but the jury is still out...

Best,
Koen


Re: Storing images as BYTEA or large objects

From
"Gevik Babakhani"
Date:
In hour case we where switching between databases so what I have done in the
past was:

For inserting:

1. create a TEXT column in my table. (In PG this can be 1GB in size)
2. read file contents in a buffer/string and Base64 encode that string.
3. write the string into db.

For reading:

1. read the text column into a buffer/string Base64 decode to get your
original image/binary data.

The approach above worked for us. This was a PHP5 website and C# frontend.

Regards,
Gevik Babakhani
------------------------------------------------
PostgreSQL NL       http://www.postgresql.nl
TrueSoftware BV     http://www.truesoftware.nl
------------------------------------------------



> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Koen Vermeer
> Sent: Wednesday, February 13, 2008 9:43 AM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Storing images as BYTEA or large objects
>
> On Tue, 2008-02-12 at 17:16 -0700, Leonel Nunez wrote:
> > > My two questions are: Is this summary correct? And: Which method
> > > should I choose?
> > With Java , Python , Perl  you've got functions that
> escapes the data
> > for you
>
> What about C++ and PHP?
>
> Koen
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


Re: Storing images as BYTEA or large objects

From
Koen Vermeer
Date:
On Wed, 2008-02-13 at 09:57 +0100, Gevik Babakhani wrote:
> In hour case we where switching between databases so what I have done in the
> past was:
> For inserting:
> 1. create a TEXT column in my table. (In PG this can be 1GB in size)
> 2. read file contents in a buffer/string and Base64 encode that string.
> 3. write the string into db.
> For reading:
> 1. read the text column into a buffer/string Base64 decode to get your
> original image/binary data.
> The approach above worked for us. This was a PHP5 website and C# frontend.


Instead of base64 encoding, I guess it would be easier to just escape
the required bytes and store them in a bytea.

But, just to be clear: I think I can get both methods (lo or bytea) to
work. All I was trying to find out is which option would be better in my
case.

Best,
Koen


Re: Storing images as BYTEA or large objects

From
Peter Wilson
Date:
Koen Vermeer wrote:
> Hi,
>
> I would like to store binary data in a PostgreSQL database. The size of
> the data is about 2 to 20 MB and is always stored or retrieved as a
> block (i.e., I do not need to get only part of the data). As I
> understand, I have two options for storing this data: As BYTEA or as
> large objects. As I understand, the disadvantage of the first method is
> that I need to write a custom routine to escape some binary values - or
> is there some code available to do this? The disadvantage of large
> objects is their existence outside of the main database, which may be
> problematic when backing up a database. In addition, I need special
> routines to store/retrieve the data.
>
I've used both methods. The only real problem is that none of the
trigger based replication schemes
such as Slony can't deal with large objects.

Depending on what programming language you're using you do *not* need to
escape the binary
data for BYTEA. Using libpq from C/C++ you can pass the binary data
straight into the database. See
PQexecParams :


http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html#LIBPQ-EXEC-MAIN

We use server-side JavaScript here that makes use of that interface so I
can take an image directly
from the web-server and move it into the database with no escape overhead.

My preference : if I don't need the file-like interface to large objects
I'd use BYTEA every time.

Pete
--
http://www.whitebeam.org - JavaScript web application server.
http://www.yellowhawk.co.uk
> My two questions are: Is this summary correct? And: Which method should
> I choose?
>
> Best,
> Koen
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>
>


Re: Storing images as BYTEA or large objects

From
Alvaro Herrera
Date:
Koen Vermeer wrote:

> Instead of base64 encoding, I guess it would be easier to just escape
> the required bytes and store them in a bytea.

Actually, if you have access to the pqExecParams() call, you can pass
the bytes to a bytea column unescaped, which AFAIK saves some processing
on both the client and server.

To get the bytes back unescaped, I think you could use a binary cursor.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Storing images as BYTEA or large objects

From
Koen Vermeer
Date:
On Wed, 2008-02-13 at 09:35 +0000, Peter Wilson wrote:
> I've used both methods. The only real problem is that none of the
> trigger based replication schemes
> such as Slony can't deal with large objects.

I can live with that for now. If the project ever gets that big, I
probably need to rethink some design choices anyway.

> Depending on what programming language you're using you do *not* need to
> escape the binary
> data for BYTEA. Using libpq from C/C++ you can pass the binary data
> straight into the database. See
> PQexecParams :
> http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html#LIBPQ-EXEC-MAIN
> We use server-side JavaScript here that makes use of that interface so I
> can take an image directly
> from the web-server and move it into the database with no escape overhead.

Ah, I see, that seems to be a better way indeed. I can probably do
something like that with PHP as well.

> My preference : if I don't need the file-like interface to large objects
> I'd use BYTEA every time.

Right, so that basically means that when 'large objects' are files,
which should be saved and restored as a whole, it may be more natural to
use the large objects. I guess that applies to some uses of media
storage (music, photos, video).

The large-objects-are-actually-files thing applies to my situation, so
unless there is some 'large objects are / will be deprecated' argument,
I guess I stick with large objects.

Thanks!

Koen


Re: Storing images as BYTEA or large objects

From
Alvaro Herrera
Date:
Koen Vermeer wrote:

> The large-objects-are-actually-files thing applies to my situation, so
> unless there is some 'large objects are / will be deprecated' argument,
> I guess I stick with large objects.

Certainly there is no such argument.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Storing images as BYTEA or large objects

From
Peter Wilson
Date:
Koen Vermeer wrote:
> On Wed, 2008-02-13 at 09:35 +0000, Peter Wilson wrote:
>
>> My preference : if I don't need the file-like interface to large objects
>> I'd use BYTEA every time.
>>
>
> Right, so that basically means that when 'large objects' are files,
> which should be saved and restored as a whole, it may be more natural to
> use the large objects. I guess that applies to some uses of media
> storage (music, photos, video).
>
No - I meant the Postgres large object interface allows you to read and
write sections of a
large object. It provides a seek/read/write interface.

If you're only going to read or write the whole contents as a single
block then use BYTEA. In
my case I store uploaded images as BYTEA - I only every need to
read/write the image as a whole.
If you were dealing with very large images/music/video in a web
environment then I could see a
web application wanting to read a chunk - write to the web client - read
next chunk etc and thus
avoid the overhead of the entire contents being in memory at one time.
That probably doesn't
help with upload though.

Pete
--
http://www.whitebeam.org - JavaScript web application server.
http://www.yellowhawk.co.uk
> The large-objects-are-actually-files thing applies to my situation, so
> unless there is some 'large objects are / will be deprecated' argument,
> I guess I stick with large objects.
>
> Thanks!
>
> Koen
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>
>


--
------------------------------------------------------------------------
Peter Wilson
T: 01707 891840
M: 07796 656566
http://www.yellowhawk.co.uk     The information in this email is
confidential and is intended for the addressee/s only. Access to this
email by anyone else is unauthorised. If you are not the intended
recipient, you must not read, use or disseminate the information
contained in or attached to this email.


Re: Storing images as BYTEA or large objects

From
Koen Vermeer
Date:
Op woensdag 13-02-2008 om 15:21 uur [tijdzone +0000], schreef Peter
Wilson:
> > Right, so that basically means that when 'large objects' are files,
> > which should be saved and restored as a whole, it may be more natural to
> > use the large objects. I guess that applies to some uses of media
> > storage (music, photos, video).
> No - I meant the Postgres large object interface allows you to read and
> write sections of a
> large object. It provides a seek/read/write interface.
> If you're only going to read or write the whole contents as a single
> block then use BYTEA. In
> my case I store uploaded images as BYTEA - I only every need to
> read/write the image as a whole.
> If you were dealing with very large images/music/video in a web
> environment then I could see a
> web application wanting to read a chunk - write to the web client - read
> next chunk etc and thus
> avoid the overhead of the entire contents being in memory at one time.
> That probably doesn't
> help with upload though.

In my case, the web application only downloads the data. Upload is done
through a custom, local application. So I guess BYTEA would work fine
for me. I'll do some more reading to see if any of its other properties
wouldn't match my needs and if both BYTEA and large objects seem to
suite me, I'll go for BYTEA.

Thanks for the help!

Koen


Re: Storing images as BYTEA or large objects

From
Koen Vermeer
Date:
Op woensdag 13-02-2008 om 10:45 uur [tijdzone -0300], schreef Alvaro
Herrera:
> > Instead of base64 encoding, I guess it would be easier to just escape
> > the required bytes and store them in a bytea.
> Actually, if you have access to the pqExecParams() call, you can pass
> the bytes to a bytea column unescaped, which AFAIK saves some processing
> on both the client and server.
> To get the bytes back unescaped, I think you could use a binary cursor.

I'll check to see what the options are for reading in the data in PHP.
Thanks for the help!

Best,
Koen


Re: Storing images as BYTEA or large objects

From
John DeSoi
Date:
On Feb 13, 2008, at 2:53 PM, Koen Vermeer wrote:

> I'll check to see what the options are for reading in the data in PHP.
> Thanks for the help!


If you use prepared statements, you don't need to do anything special
at all for bytea with PHP. No worries about escaping and all that.

Using the schema below and a simple prepared statement API (http://pgedit.com/resource/php/pgfuncall
), I can insert/load documents with a single line like:

$db->blob_insert($content);

$content = $db->blob_content($this->object_ref);



John DeSoi, Ph.D.



--
-- blobs
--
create table blob (
    dbid serial primary key,
    content bytea
);


create or replace function blob_insert(p_content bytea)
returns integer as $$
declare
    new_dbid integer = nextval(pg_get_serial_sequence('blob', 'dbid'));
begin
    insert into blob (dbid, content) values (new_dbid, p_content);
    return new_dbid;
end;
$$ language plpgsql;



create or replace function blob_update(p_dbid integer, p_content bytea)
returns integer as $$
begin
    update blob set content = p_content where dbid = p_dbid;
    if found then
        return 1;
    else
        return 0;
    end if;
end;
$$ language plpgsql;



create or replace function blob_content(p_dbid integer)
returns bytea as $$
declare
    v_content bytea;
begin
    select into v_content content from blob where dbid = p_dbid;
    return v_content;
end;
$$ language plpgsql;