Thread: bytea or blobs?

bytea or blobs?

From
beyaNet Consultancy
Date:
Hi,
what I am trying to do is to be able to store images in my database. 
What I wanted to know is this:

1. Would it be better to have the image field type as a bytea or a 
blob? I have heard it mentioned that bytea would be better as doing 
data dumps would also insure that the image was saved as well!

2. Would it be better to make reference to mp3 files (i.e. storing the 
address of the image /images/*.jpg)  or is it feasible to store the mp3 
in the database as bytea or blobs as well?

many thanks in adavance



Re: bytea or blobs?

From
Dana Hudes
Date:
I'm in the same situation. I struggled with it for days
considering various implementation issues for my application
and what I hoped to achieve. I opted to use bytea but getting the
data in and out is going to be interesting . have to use encode/decode.
However I don't have to worry about making triggers and associated
to invoke lo_import/export/unlink. 

There is no true BLOB in Postgresql.
There is use of OID column and lo_import et al and there is bytea.

My objective to assure integrity of the database lead me to want bytea.
I am not by any means done with the situation and would welcome 
discussion.
I'm using Perl for my application language.

On Wed, 11 Feb 2004, beyaNet Consultancy wrote:

> Hi,
> what I am trying to do is to be able to store images in my database. 
> What I wanted to know is this:
> 
> 1. Would it be better to have the image field type as a bytea or a 
> blob? I have heard it mentioned that bytea would be better as doing 
> data dumps would also insure that the image was saved as well!
> 
> 2. Would it be better to make reference to mp3 files (i.e. storing the 
> address of the image /images/*.jpg)  or is it feasible to store the mp3 
> in the database as bytea or blobs as well?
> 
> many thanks in adavance
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
> 


Re: bytea or blobs?

From
Jan Wieck
Date:
beyaNet Consultancy wrote:

> Hi,
> what I am trying to do is to be able to store images in my database. 
> What I wanted to know is this:
> 
> 1. Would it be better to have the image field type as a bytea or a 
> blob? I have heard it mentioned that bytea would be better as doing 
> data dumps would also insure that the image was saved as well!
> 
> 2. Would it be better to make reference to mp3 files (i.e. storing the 
> address of the image /images/*.jpg)  or is it feasible to store the mp3 
> in the database as bytea or blobs as well?
> 
> many thanks in adavance

If you want the same access and data protection (including transactional 
semantics and network access) as for your other data, it has to be 
inside the database. Now unless you're going for video streams, I think 
most databases (even MySQL as of 4.0) can handle multi-megabyte columns 
just fine, and as long as they contain just some 7bit ascii you'll be 
absolutely portable. Storing the data in Postgres in regular tables will 
give you the least amount of backup etc. problems, as they just don't 
exist in that case.

To achieve this, I'd recommend to let the application convert the binary 
data to and from base64, which is a well defined and not too bloated 
standard. It is reasonably fast too. That will let you easily embed any 
binary data into a text or varchar column. You don't even need to quote 
it any more when inserting it into the query string.

To get the ultimate out of Postgres' storage capabilities then, I would 
create a data table with a bytea column, hidden behind a view and 
rewrite rules that use encode(data, 'base64') and decode(data, 'base64') 
when rewriting the queries. The bytea column of that table will be 
configured without toast compression if the intended data usually is 
compressed, like jpeg or mp3.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: bytea or blobs?

From
"Jeremy Smith"
Date:
On this subject,  isn't it actually better to just store image names in the
database and pull the image itself from a directory?  That's what I do on my
site because I didn't want to bloat up my database unnecessarily.  Are there
additional benefits to storing the image information in the database that
I'm missing?

Thanks,
Jeremy

-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Jan Wieck
Sent: Tuesday, February 17, 2004 10:08 AM
To: beyaNet Consultancy
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] bytea or blobs?


beyaNet Consultancy wrote:

> Hi,
> what I am trying to do is to be able to store images in my database.
> What I wanted to know is this:
>
> 1. Would it be better to have the image field type as a bytea or a
> blob? I have heard it mentioned that bytea would be better as doing
> data dumps would also insure that the image was saved as well!
>
> 2. Would it be better to make reference to mp3 files (i.e. storing the
> address of the image /images/*.jpg)  or is it feasible to store the mp3
> in the database as bytea or blobs as well?
>
> many thanks in adavance

If you want the same access and data protection (including transactional
semantics and network access) as for your other data, it has to be
inside the database. Now unless you're going for video streams, I think
most databases (even MySQL as of 4.0) can handle multi-megabyte columns
just fine, and as long as they contain just some 7bit ascii you'll be
absolutely portable. Storing the data in Postgres in regular tables will
give you the least amount of backup etc. problems, as they just don't
exist in that case.

To achieve this, I'd recommend to let the application convert the binary
data to and from base64, which is a well defined and not too bloated
standard. It is reasonably fast too. That will let you easily embed any
binary data into a text or varchar column. You don't even need to quote
it any more when inserting it into the query string.

To get the ultimate out of Postgres' storage capabilities then, I would
create a data table with a bytea column, hidden behind a view and
rewrite rules that use encode(data, 'base64') and decode(data, 'base64')
when rewriting the queries. The bytea column of that table will be
configured without toast compression if the intended data usually is
compressed, like jpeg or mp3.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster




Re: bytea or blobs?

From
Jan Wieck
Date:
Jeremy Smith wrote:
> On this subject,  isn't it actually better to just store image names in the
> database and pull the image itself from a directory?  That's what I do on my
> site because I didn't want to bloat up my database unnecessarily.  Are there
> additional benefits to storing the image information in the database that
> I'm missing?

Sure, you don't backup/restore the images together (and in a consistent 
snapshot) with the rest of the data and you cannot access the images 
through the same, authenticated, database connection in a transactional 
way.

If you for example not only store the path, but the dimensions of the 
image and let's say an imagemap for a clickable image on a web page as 
well and now replace the image. The new image data in the directory and 
the change to the meta information in the database will not change for 
other transactions at the same time of a transaction boundary. And the 
changes to the image file will not roll back if something goes wrong 
before you can commit the transaction. That can lead to funny effects on 
said website.


Jan

> 
> Thanks,
> Jeremy
> 
> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Jan Wieck
> Sent: Tuesday, February 17, 2004 10:08 AM
> To: beyaNet Consultancy
> Cc: pgsql-sql@postgresql.org
> Subject: Re: [SQL] bytea or blobs?
> 
> 
> beyaNet Consultancy wrote:
> 
>> Hi,
>> what I am trying to do is to be able to store images in my database.
>> What I wanted to know is this:
>>
>> 1. Would it be better to have the image field type as a bytea or a
>> blob? I have heard it mentioned that bytea would be better as doing
>> data dumps would also insure that the image was saved as well!
>>
>> 2. Would it be better to make reference to mp3 files (i.e. storing the
>> address of the image /images/*.jpg)  or is it feasible to store the mp3
>> in the database as bytea or blobs as well?
>>
>> many thanks in adavance
> 
> If you want the same access and data protection (including transactional
> semantics and network access) as for your other data, it has to be
> inside the database. Now unless you're going for video streams, I think
> most databases (even MySQL as of 4.0) can handle multi-megabyte columns
> just fine, and as long as they contain just some 7bit ascii you'll be
> absolutely portable. Storing the data in Postgres in regular tables will
> give you the least amount of backup etc. problems, as they just don't
> exist in that case.
> 
> To achieve this, I'd recommend to let the application convert the binary
> data to and from base64, which is a well defined and not too bloated
> standard. It is reasonably fast too. That will let you easily embed any
> binary data into a text or varchar column. You don't even need to quote
> it any more when inserting it into the query string.
> 
> To get the ultimate out of Postgres' storage capabilities then, I would
> create a data table with a bytea column, hidden behind a view and
> rewrite rules that use encode(data, 'base64') and decode(data, 'base64')
> when rewriting the queries. The bytea column of that table will be
> configured without toast compression if the intended data usually is
> compressed, like jpeg or mp3.
> 
> 
> Jan
> 
> --
> #======================================================================#
> # It's easier to get forgiveness for being wrong than for being right. #
> # Let's break this rule - forgive me.                                  #
> #================================================== JanWieck@Yahoo.com #
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: bytea or blobs?

From
sad
Date:
On Tuesday 17 February 2004 18:08, you wrote:

> I'd recommend to let the application convert the binary
> data to and from base64,

Don't, please don't !

Since you have the good bytea rule to convert so called "binary" data into so
called "text". You have no need another encoding at all.

Generally, the problem is to represent zero (0x00) when input/output. Any
other byte might be stored, dumped, input, output without any problem. Then
why to avoid 8bit chars ?

Bytea notation rule completely resolve the problem of zeroes.
(and also apostrophes :-) naturally)






Re: bytea or blobs?

From
Dana Hudes
Date:
The documentation indicates that bytea will -store- binary data
The problem is getting the data into the column through SQL.
The Postgresql SQL is character data only (side note: what about unicode? 
is this USASCII only?). You cannot just wrap your binary stream in a pair
of quotes and off you go. You must encode before saving and decode after 
retrieving. The encoded form is stored in the column.

What I have not figured out is how to pass my data to encode.
This is a builtin function but it doesn't take a filename so how can
you use it!
At least with base64 I have ample libraries and can convert my data
before sending to sql or after receiving from sql. It becomes my
application's issue. Mind, this bloats the data considerably.
escape is less bloat but I have to recreate the encode/decode in my app,
so far as I see.


On Wed, 18 
Feb 2004, sad wrote:

> On Tuesday 17 February 2004 18:08, you wrote:
> 
> > I'd recommend to let the application convert the binary
> > data to and from base64,
> 
> Don't, please don't !
> 
> Since you have the good bytea rule to convert so called "binary" data into so 
> called "text". You have no need another encoding at all.
> 
> Generally, the problem is to represent zero (0x00) when input/output. Any 
> other byte might be stored, dumped, input, output without any problem. Then 
> why to avoid 8bit chars ?
> 
> Bytea notation rule completely resolve the problem of zeroes.
> (and also apostrophes :-) naturally) 
> 
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
> 


Re: bytea or blobs?

From
Achilleus Mantzios
Date:
Generally this is the task of various drivers.
The postgresql jdbc for instance makes this task
easy with the ResultSet.getBytes() method.

The servlet code to display the contents of any mimetype
bytea column looks like:

PreparedStatement st = 
con.prepareStatement("select mimetype,image from images where id=?");

st.setInt(1,id);
ResultSet rs = st.executeQuery();
if (!rs.next()) throw new Exception("Error Getting this image. Contact IT DEPT.");
String mimetype = rs.getString(1);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] img = rs.getBytes(2);        
baos.write(img);
res.setContentType(mimetype);        
baos.writeTo(out);
out.flush();
out.close();

Besides backups, a reason for sticking with bytea fields
is the ability to even replicate an image column across
databases in the usual manner that holds for the rest
of the datatypes, but then again these decisions are
dominated by the special conditions of each case.

Maybe if bandwidth is a restriction the base64 solution
saves some bandwith, since base64 file is ~ 1.3 times larger
than the original, whereas the escaped octal representation
will be ~ 4 times larger.

O kyrios Dana Hudes egrapse stis Feb 18, 2004 :

> The documentation indicates that bytea will -store- binary data
> The problem is getting the data into the column through SQL.
> The Postgresql SQL is character data only (side note: what about unicode? 
> is this USASCII only?). You cannot just wrap your binary stream in a pair
> of quotes and off you go. You must encode before saving and decode after 
> retrieving. The encoded form is stored in the column.
> 
> What I have not figured out is how to pass my data to encode.
> This is a builtin function but it doesn't take a filename so how can
> you use it!
> At least with base64 I have ample libraries and can convert my data
> before sending to sql or after receiving from sql. It becomes my
> application's issue. Mind, this bloats the data considerably.
> escape is less bloat but I have to recreate the encode/decode in my app,
> so far as I see.
> 
> 
> On Wed, 18 
> Feb 2004, sad wrote:
> 
> > On Tuesday 17 February 2004 18:08, you wrote:
> > 
> > > I'd recommend to let the application convert the binary
> > > data to and from base64,
> > 
> > Don't, please don't !
> > 
> > Since you have the good bytea rule to convert so called "binary" data into so 
> > called "text". You have no need another encoding at all.
> > 
> > Generally, the problem is to represent zero (0x00) when input/output. Any 
> > other byte might be stored, dumped, input, output without any problem. Then 
> > why to avoid 8bit chars ?
> > 
> > Bytea notation rule completely resolve the problem of zeroes.
> > (and also apostrophes :-) naturally) 
> > 
> > 
> > 
> > 
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: the planner will ignore your desire to choose an index scan if your
> >       joining column's datatypes do not match
> > 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 

-- 
-Achilleus



Re: bytea or blobs?

From
Richard Huxton
Date:
On Wednesday 18 February 2004 06:44, Dana Hudes wrote:
>
> At least with base64 I have ample libraries and can convert my data
> before sending to sql or after receiving from sql. It becomes my
> application's issue. Mind, this bloats the data considerably.
> escape is less bloat but I have to recreate the encode/decode in my app,
> so far as I see.

Less bloat than you might expect - large values are TOASTed and compressed. 
I'm guessing a lot of your redundancy will be eliminated.

Having said that, bytea's purpose in life is to store your binary data.

--  Richard Huxton Archonet Ltd


Re: bytea or blobs?

From
Dana Hudes
Date:
How can one measure the result of the compression -- can I see this in
some table or with some pgsql command?
At what threshold does it take place, I think its 8192?
The nasty bit is not one picture of 100kb.
Its 20 pictures of 5kb.

On Wed, 18 Feb 2004, Richard Huxton wrote:

> On Wednesday 18 February 2004 06:44, Dana Hudes wrote:
> >
> > At least with base64 I have ample libraries and can convert my data
> > before sending to sql or after receiving from sql. It becomes my
> > application's issue. Mind, this bloats the data considerably.
> > escape is less bloat but I have to recreate the encode/decode in my app,
> > so far as I see.
> 
> Less bloat than you might expect - large values are TOASTed and compressed. 
> I'm guessing a lot of your redundancy will be eliminated.
> 
> Having said that, bytea's purpose in life is to store your binary data.
> 
> 


Re: bytea or blobs?

From
Richard Huxton
Date:
On Wednesday 18 February 2004 15:17, Dana Hudes wrote:
> How can one measure the result of the compression -- can I see this in
> some table or with some pgsql command?

Hmm - not so far as I know.

> At what threshold does it take place, I think its 8192?
> The nasty bit is not one picture of 100kb.
> Its 20 pictures of 5kb.

I'd have thought 5KB would trigger it - 8192 is the limit for a row (which 
TOASTing is designed to remove). There are/were some technical notes on TOAST 
when it was being built/introduced - googling might well find them.

What I'd suggest is grab some suitably representative images, base64 encode 
them and see how much it takes to store 100,000 copies of them.
--  Richard Huxton Archonet Ltd


Re: bytea or blobs?

From
Igor Shevchenko
Date:
On Wednesday 18 February 2004 09:18, you wrote:
> Maybe if bandwidth is a restriction the base64 solution
> saves some bandwith, since base64 file is ~ 1.3 times larger
> than the original, whereas the escaped octal representation
> will be ~ 4 times larger.

If you use libpq's v3 protocol with binary format (PostgreSQL 7.4+), the 
overhead is minimal both ways.

-- 
Best regards,
Igor Shevchenko


Re: bytea or blobs?

From
Richard Huxton
Date:
beyaNet Consultancy wrote:
> Hi,
> what I am trying to do is to be able to store images in my database. 
> What I wanted to know is this:
> 
> 1. Would it be better to have the image field type as a bytea or a blob? 
> I have heard it mentioned that bytea would be better as doing data dumps 
> would also insure that the image was saved as well!
> 
> 2. Would it be better to make reference to mp3 files (i.e. storing the 
> address of the image /images/*.jpg)  or is it feasible to store the mp3 
> in the database as bytea or blobs as well?

Depends on the precise details of your situation.

Check the mailing-list archives (this list and general would be a good 
start) for plenty of discussion on these.

--   Richard Huxton  Archonet Ltd