Thread: bytea or blobs?
Hi, what I am trying to do is to be able to store images in my database. What I wanted to know is this: 1. Would it be better to have the image field type as a bytea or a blob? I have heard it mentioned that bytea would be better as doing data dumps would also insure that the image was saved as well! 2. Would it be better to make reference to mp3 files (i.e. storing the address of the image /images/*.jpg) or is it feasible to store the mp3 in the database as bytea or blobs as well? many thanks in adavance
I'm in the same situation. I struggled with it for days considering various implementation issues for my application and what I hoped to achieve. I opted to use bytea but getting the data in and out is going to be interesting . have to use encode/decode. However I don't have to worry about making triggers and associated to invoke lo_import/export/unlink. There is no true BLOB in Postgresql. There is use of OID column and lo_import et al and there is bytea. My objective to assure integrity of the database lead me to want bytea. I am not by any means done with the situation and would welcome discussion. I'm using Perl for my application language. On Wed, 11 Feb 2004, beyaNet Consultancy wrote: > Hi, > what I am trying to do is to be able to store images in my database. > What I wanted to know is this: > > 1. Would it be better to have the image field type as a bytea or a > blob? I have heard it mentioned that bytea would be better as doing > data dumps would also insure that the image was saved as well! > > 2. Would it be better to make reference to mp3 files (i.e. storing the > address of the image /images/*.jpg) or is it feasible to store the mp3 > in the database as bytea or blobs as well? > > many thanks in adavance > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend >
beyaNet Consultancy wrote: > Hi, > what I am trying to do is to be able to store images in my database. > What I wanted to know is this: > > 1. Would it be better to have the image field type as a bytea or a > blob? I have heard it mentioned that bytea would be better as doing > data dumps would also insure that the image was saved as well! > > 2. Would it be better to make reference to mp3 files (i.e. storing the > address of the image /images/*.jpg) or is it feasible to store the mp3 > in the database as bytea or blobs as well? > > many thanks in adavance If you want the same access and data protection (including transactional semantics and network access) as for your other data, it has to be inside the database. Now unless you're going for video streams, I think most databases (even MySQL as of 4.0) can handle multi-megabyte columns just fine, and as long as they contain just some 7bit ascii you'll be absolutely portable. Storing the data in Postgres in regular tables will give you the least amount of backup etc. problems, as they just don't exist in that case. To achieve this, I'd recommend to let the application convert the binary data to and from base64, which is a well defined and not too bloated standard. It is reasonably fast too. That will let you easily embed any binary data into a text or varchar column. You don't even need to quote it any more when inserting it into the query string. To get the ultimate out of Postgres' storage capabilities then, I would create a data table with a bytea column, hidden behind a view and rewrite rules that use encode(data, 'base64') and decode(data, 'base64') when rewriting the queries. The bytea column of that table will be configured without toast compression if the intended data usually is compressed, like jpeg or mp3. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On this subject, isn't it actually better to just store image names in the database and pull the image itself from a directory? That's what I do on my site because I didn't want to bloat up my database unnecessarily. Are there additional benefits to storing the image information in the database that I'm missing? Thanks, Jeremy -----Original Message----- From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Jan Wieck Sent: Tuesday, February 17, 2004 10:08 AM To: beyaNet Consultancy Cc: pgsql-sql@postgresql.org Subject: Re: [SQL] bytea or blobs? beyaNet Consultancy wrote: > Hi, > what I am trying to do is to be able to store images in my database. > What I wanted to know is this: > > 1. Would it be better to have the image field type as a bytea or a > blob? I have heard it mentioned that bytea would be better as doing > data dumps would also insure that the image was saved as well! > > 2. Would it be better to make reference to mp3 files (i.e. storing the > address of the image /images/*.jpg) or is it feasible to store the mp3 > in the database as bytea or blobs as well? > > many thanks in adavance If you want the same access and data protection (including transactional semantics and network access) as for your other data, it has to be inside the database. Now unless you're going for video streams, I think most databases (even MySQL as of 4.0) can handle multi-megabyte columns just fine, and as long as they contain just some 7bit ascii you'll be absolutely portable. Storing the data in Postgres in regular tables will give you the least amount of backup etc. problems, as they just don't exist in that case. To achieve this, I'd recommend to let the application convert the binary data to and from base64, which is a well defined and not too bloated standard. It is reasonably fast too. That will let you easily embed any binary data into a text or varchar column. You don't even need to quote it any more when inserting it into the query string. To get the ultimate out of Postgres' storage capabilities then, I would create a data table with a bytea column, hidden behind a view and rewrite rules that use encode(data, 'base64') and decode(data, 'base64') when rewriting the queries. The bytea column of that table will be configured without toast compression if the intended data usually is compressed, like jpeg or mp3. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
Jeremy Smith wrote: > On this subject, isn't it actually better to just store image names in the > database and pull the image itself from a directory? That's what I do on my > site because I didn't want to bloat up my database unnecessarily. Are there > additional benefits to storing the image information in the database that > I'm missing? Sure, you don't backup/restore the images together (and in a consistent snapshot) with the rest of the data and you cannot access the images through the same, authenticated, database connection in a transactional way. If you for example not only store the path, but the dimensions of the image and let's say an imagemap for a clickable image on a web page as well and now replace the image. The new image data in the directory and the change to the meta information in the database will not change for other transactions at the same time of a transaction boundary. And the changes to the image file will not roll back if something goes wrong before you can commit the transaction. That can lead to funny effects on said website. Jan > > Thanks, > Jeremy > > -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Jan Wieck > Sent: Tuesday, February 17, 2004 10:08 AM > To: beyaNet Consultancy > Cc: pgsql-sql@postgresql.org > Subject: Re: [SQL] bytea or blobs? > > > beyaNet Consultancy wrote: > >> Hi, >> what I am trying to do is to be able to store images in my database. >> What I wanted to know is this: >> >> 1. Would it be better to have the image field type as a bytea or a >> blob? I have heard it mentioned that bytea would be better as doing >> data dumps would also insure that the image was saved as well! >> >> 2. Would it be better to make reference to mp3 files (i.e. storing the >> address of the image /images/*.jpg) or is it feasible to store the mp3 >> in the database as bytea or blobs as well? >> >> many thanks in adavance > > If you want the same access and data protection (including transactional > semantics and network access) as for your other data, it has to be > inside the database. Now unless you're going for video streams, I think > most databases (even MySQL as of 4.0) can handle multi-megabyte columns > just fine, and as long as they contain just some 7bit ascii you'll be > absolutely portable. Storing the data in Postgres in regular tables will > give you the least amount of backup etc. problems, as they just don't > exist in that case. > > To achieve this, I'd recommend to let the application convert the binary > data to and from base64, which is a well defined and not too bloated > standard. It is reasonably fast too. That will let you easily embed any > binary data into a text or varchar column. You don't even need to quote > it any more when inserting it into the query string. > > To get the ultimate out of Postgres' storage capabilities then, I would > create a data table with a bytea column, hidden behind a view and > rewrite rules that use encode(data, 'base64') and decode(data, 'base64') > when rewriting the queries. The bytea column of that table will be > configured without toast compression if the intended data usually is > compressed, like jpeg or mp3. > > > Jan > > -- > #======================================================================# > # It's easier to get forgiveness for being wrong than for being right. # > # Let's break this rule - forgive me. # > #================================================== JanWieck@Yahoo.com # > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Tuesday 17 February 2004 18:08, you wrote: > I'd recommend to let the application convert the binary > data to and from base64, Don't, please don't ! Since you have the good bytea rule to convert so called "binary" data into so called "text". You have no need another encoding at all. Generally, the problem is to represent zero (0x00) when input/output. Any other byte might be stored, dumped, input, output without any problem. Then why to avoid 8bit chars ? Bytea notation rule completely resolve the problem of zeroes. (and also apostrophes :-) naturally)
The documentation indicates that bytea will -store- binary data The problem is getting the data into the column through SQL. The Postgresql SQL is character data only (side note: what about unicode? is this USASCII only?). You cannot just wrap your binary stream in a pair of quotes and off you go. You must encode before saving and decode after retrieving. The encoded form is stored in the column. What I have not figured out is how to pass my data to encode. This is a builtin function but it doesn't take a filename so how can you use it! At least with base64 I have ample libraries and can convert my data before sending to sql or after receiving from sql. It becomes my application's issue. Mind, this bloats the data considerably. escape is less bloat but I have to recreate the encode/decode in my app, so far as I see. On Wed, 18 Feb 2004, sad wrote: > On Tuesday 17 February 2004 18:08, you wrote: > > > I'd recommend to let the application convert the binary > > data to and from base64, > > Don't, please don't ! > > Since you have the good bytea rule to convert so called "binary" data into so > called "text". You have no need another encoding at all. > > Generally, the problem is to represent zero (0x00) when input/output. Any > other byte might be stored, dumped, input, output without any problem. Then > why to avoid 8bit chars ? > > Bytea notation rule completely resolve the problem of zeroes. > (and also apostrophes :-) naturally) > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match >
Generally this is the task of various drivers. The postgresql jdbc for instance makes this task easy with the ResultSet.getBytes() method. The servlet code to display the contents of any mimetype bytea column looks like: PreparedStatement st = con.prepareStatement("select mimetype,image from images where id=?"); st.setInt(1,id); ResultSet rs = st.executeQuery(); if (!rs.next()) throw new Exception("Error Getting this image. Contact IT DEPT."); String mimetype = rs.getString(1); ByteArrayOutputStream baos = new ByteArrayOutputStream(); byte[] img = rs.getBytes(2); baos.write(img); res.setContentType(mimetype); baos.writeTo(out); out.flush(); out.close(); Besides backups, a reason for sticking with bytea fields is the ability to even replicate an image column across databases in the usual manner that holds for the rest of the datatypes, but then again these decisions are dominated by the special conditions of each case. Maybe if bandwidth is a restriction the base64 solution saves some bandwith, since base64 file is ~ 1.3 times larger than the original, whereas the escaped octal representation will be ~ 4 times larger. O kyrios Dana Hudes egrapse stis Feb 18, 2004 : > The documentation indicates that bytea will -store- binary data > The problem is getting the data into the column through SQL. > The Postgresql SQL is character data only (side note: what about unicode? > is this USASCII only?). You cannot just wrap your binary stream in a pair > of quotes and off you go. You must encode before saving and decode after > retrieving. The encoded form is stored in the column. > > What I have not figured out is how to pass my data to encode. > This is a builtin function but it doesn't take a filename so how can > you use it! > At least with base64 I have ample libraries and can convert my data > before sending to sql or after receiving from sql. It becomes my > application's issue. Mind, this bloats the data considerably. > escape is less bloat but I have to recreate the encode/decode in my app, > so far as I see. > > > On Wed, 18 > Feb 2004, sad wrote: > > > On Tuesday 17 February 2004 18:08, you wrote: > > > > > I'd recommend to let the application convert the binary > > > data to and from base64, > > > > Don't, please don't ! > > > > Since you have the good bytea rule to convert so called "binary" data into so > > called "text". You have no need another encoding at all. > > > > Generally, the problem is to represent zero (0x00) when input/output. Any > > other byte might be stored, dumped, input, output without any problem. Then > > why to avoid 8bit chars ? > > > > Bytea notation rule completely resolve the problem of zeroes. > > (and also apostrophes :-) naturally) > > > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 9: the planner will ignore your desire to choose an index scan if your > > joining column's datatypes do not match > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- -Achilleus
On Wednesday 18 February 2004 06:44, Dana Hudes wrote: > > At least with base64 I have ample libraries and can convert my data > before sending to sql or after receiving from sql. It becomes my > application's issue. Mind, this bloats the data considerably. > escape is less bloat but I have to recreate the encode/decode in my app, > so far as I see. Less bloat than you might expect - large values are TOASTed and compressed. I'm guessing a lot of your redundancy will be eliminated. Having said that, bytea's purpose in life is to store your binary data. -- Richard Huxton Archonet Ltd
How can one measure the result of the compression -- can I see this in some table or with some pgsql command? At what threshold does it take place, I think its 8192? The nasty bit is not one picture of 100kb. Its 20 pictures of 5kb. On Wed, 18 Feb 2004, Richard Huxton wrote: > On Wednesday 18 February 2004 06:44, Dana Hudes wrote: > > > > At least with base64 I have ample libraries and can convert my data > > before sending to sql or after receiving from sql. It becomes my > > application's issue. Mind, this bloats the data considerably. > > escape is less bloat but I have to recreate the encode/decode in my app, > > so far as I see. > > Less bloat than you might expect - large values are TOASTed and compressed. > I'm guessing a lot of your redundancy will be eliminated. > > Having said that, bytea's purpose in life is to store your binary data. > >
On Wednesday 18 February 2004 15:17, Dana Hudes wrote: > How can one measure the result of the compression -- can I see this in > some table or with some pgsql command? Hmm - not so far as I know. > At what threshold does it take place, I think its 8192? > The nasty bit is not one picture of 100kb. > Its 20 pictures of 5kb. I'd have thought 5KB would trigger it - 8192 is the limit for a row (which TOASTing is designed to remove). There are/were some technical notes on TOAST when it was being built/introduced - googling might well find them. What I'd suggest is grab some suitably representative images, base64 encode them and see how much it takes to store 100,000 copies of them. -- Richard Huxton Archonet Ltd
On Wednesday 18 February 2004 09:18, you wrote: > Maybe if bandwidth is a restriction the base64 solution > saves some bandwith, since base64 file is ~ 1.3 times larger > than the original, whereas the escaped octal representation > will be ~ 4 times larger. If you use libpq's v3 protocol with binary format (PostgreSQL 7.4+), the overhead is minimal both ways. -- Best regards, Igor Shevchenko
beyaNet Consultancy wrote: > Hi, > what I am trying to do is to be able to store images in my database. > What I wanted to know is this: > > 1. Would it be better to have the image field type as a bytea or a blob? > I have heard it mentioned that bytea would be better as doing data dumps > would also insure that the image was saved as well! > > 2. Would it be better to make reference to mp3 files (i.e. storing the > address of the image /images/*.jpg) or is it feasible to store the mp3 > in the database as bytea or blobs as well? Depends on the precise details of your situation. Check the mailing-list archives (this list and general would be a good start) for plenty of discussion on these. -- Richard Huxton Archonet Ltd