Thread: Storing images as BYTEA or large objects
Hi, I would like to store binary data in a PostgreSQL database. The size of the data is about 2 to 20 MB and is always stored or retrieved as a block (i.e., I do not need to get only part of the data). As I understand, I have two options for storing this data: As BYTEA or as large objects. As I understand, the disadvantage of the first method is that I need to write a custom routine to escape some binary values - or is there some code available to do this? The disadvantage of large objects is their existence outside of the main database, which may be problematic when backing up a database. In addition, I need special routines to store/retrieve the data. My two questions are: Is this summary correct? And: Which method should I choose? Best, Koen
> Hi, > > I would like to store binary data in a PostgreSQL database. The size of > the data is about 2 to 20 MB and is always stored or retrieved as a > block (i.e., I do not need to get only part of the data). As I > understand, I have two options for storing this data: As BYTEA or as > large objects. As I understand, the disadvantage of the first method is > that I need to write a custom routine to escape some binary values - or > is there some code available to do this? The disadvantage of large > objects is their existence outside of the main database, which may be > problematic when backing up a database. In addition, I need special > routines to store/retrieve the data. > > My two questions are: Is this summary correct? And: Which method should > I choose? > > Best, > Koen > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/ > With Java , Python , Perl you've got functions that escapes the data for you Leonel
Koen Vermeer wrote: > Hi, > > I would like to store binary data in a PostgreSQL database. The size of > the data is about 2 to 20 MB and is always stored or retrieved as a > block (i.e., I do not need to get only part of the data). As I > understand, I have two options for storing this data: As BYTEA or as > large objects. As I understand, the disadvantage of the first method is > that I need to write a custom routine to escape some binary values - or > is there some code available to do this? The disadvantage of large > objects is their existence outside of the main database, which may be > problematic when backing up a database. In addition, I need special > routines to store/retrieve the data. > > My two questions are: Is this summary correct? And: Which method should > I choose? > > Best, > Koen Having used the large objects, I can tell you they do backup (pg_dump, etc) and they are not hard to use. There is even a contrib that helps you hook them up to a table so they get deleted/etc at appropriate times (I have not used it though so not sure 100%) (they were easy enough to use even without the contrib). The only reason I used lo was to avoid the escaping. I'm not sure how slow escaping a 20 meg file would be, but it just sounded slow. And you have to do it twice, once to send it, and once when you get it back. I'd love to hear from others about that... is it something I should even worry about? We store lots of photos in the db, they usually run a meg or less, but we store lots of 'em. -Andy
> > I have two options for storing this data: As BYTEA or as large objects. Is it true that if you update a row containing a large BYTEA value, (even if you're not updating the BYTEA field itself, just another field), it requires the entire BYTEA value to be copied to a new row (because of MVCC) ? Or is this not true because of TOAST? If true, would this have an impact on the buffer cache and/or checkpoints ? (You could always separate out the BYTEA values to their own table by themselves to avoid this drawback)
"Adam Rich" <adam.r@sbcglobal.net> writes: >>> I have two options for storing this data: As BYTEA or as large objects. > Is it true that if you update a row containing a large BYTEA value, (even if > you're not updating the BYTEA field itself, just another field), it requires > the entire BYTEA value to be copied to a new row (because of MVCC) ? Or is > this not true because of TOAST? It is not true. As long as you don't change the toasted value, it contributes nothing much to the cost of updating the row. All that has to be copied is a 20-byte pointer structure. However, if you *do* change the toasted value, it is always updated as a unit. So if you have use for writing into portions of a large value, large objects are what you want. regards, tom lane
Hi, Which programming language are you using? Regards, Gevik Babakhani ------------------------------------------------ PostgreSQL NL http://www.postgresql.nl TrueSoftware BV http://www.truesoftware.nl ------------------------------------------------ > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Koen Vermeer > Sent: Tuesday, February 12, 2008 11:41 PM > To: pgsql-general@postgresql.org > Subject: [GENERAL] Storing images as BYTEA or large objects > > Hi, > > I would like to store binary data in a PostgreSQL database. > The size of the data is about 2 to 20 MB and is always stored > or retrieved as a block (i.e., I do not need to get only part > of the data). As I understand, I have two options for storing > this data: As BYTEA or as large objects. As I understand, the > disadvantage of the first method is that I need to write a > custom routine to escape some binary values - or is there > some code available to do this? The disadvantage of large > objects is their existence outside of the main database, > which may be problematic when backing up a database. In > addition, I need special routines to store/retrieve the data. > > My two questions are: Is this summary correct? And: Which > method should I choose? > > Best, > Koen > > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/ >
On Wed, 2008-02-13 at 07:37 +0100, Gevik Babakhani wrote: > Which programming language are you using? That would be C++ for storing and both C++ and PHP for retrieving the data. Maybe also PL/SQL for retrieval (in addition to or instead of PHP). Koen
On Tue, 2008-02-12 at 17:16 -0700, Leonel Nunez wrote: > > My two questions are: Is this summary correct? And: Which method should > > I choose? > With Java , Python , Perl you've got functions that escapes the data for > you What about C++ and PHP? Koen
On Tue, 2008-02-12 at 21:14 -0600, Andy Colson wrote: > Having used the large objects, I can tell you they do backup (pg_dump, > etc) and they are not hard to use. There is even a contrib that helps > you hook them up to a table so they get deleted/etc at appropriate times > (I have not used it though so not sure 100%) (they were easy enough to > use even without the contrib). In my case, I do not really worry about cleaning them up, as I expect not to delete the objects (or only very rarely) anyway. But it's good to know (I hope I still remember it in a few months) that there is a contrib that handles this. > The only reason I used lo was to avoid the escaping. I'm not sure how > slow escaping a 20 meg file would be, but it just sounded slow. And you > have to do it twice, once to send it, and once when you get it back. I thought about that too, and didn't find conclusive arguments to choose one or the other. So I decided to just ask, but the jury is still out... Best, Koen
In hour case we where switching between databases so what I have done in the past was: For inserting: 1. create a TEXT column in my table. (In PG this can be 1GB in size) 2. read file contents in a buffer/string and Base64 encode that string. 3. write the string into db. For reading: 1. read the text column into a buffer/string Base64 decode to get your original image/binary data. The approach above worked for us. This was a PHP5 website and C# frontend. Regards, Gevik Babakhani ------------------------------------------------ PostgreSQL NL http://www.postgresql.nl TrueSoftware BV http://www.truesoftware.nl ------------------------------------------------ > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Koen Vermeer > Sent: Wednesday, February 13, 2008 9:43 AM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Storing images as BYTEA or large objects > > On Tue, 2008-02-12 at 17:16 -0700, Leonel Nunez wrote: > > > My two questions are: Is this summary correct? And: Which method > > > should I choose? > > With Java , Python , Perl you've got functions that > escapes the data > > for you > > What about C++ and PHP? > > Koen > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
On Wed, 2008-02-13 at 09:57 +0100, Gevik Babakhani wrote: > In hour case we where switching between databases so what I have done in the > past was: > For inserting: > 1. create a TEXT column in my table. (In PG this can be 1GB in size) > 2. read file contents in a buffer/string and Base64 encode that string. > 3. write the string into db. > For reading: > 1. read the text column into a buffer/string Base64 decode to get your > original image/binary data. > The approach above worked for us. This was a PHP5 website and C# frontend. Instead of base64 encoding, I guess it would be easier to just escape the required bytes and store them in a bytea. But, just to be clear: I think I can get both methods (lo or bytea) to work. All I was trying to find out is which option would be better in my case. Best, Koen
Koen Vermeer wrote: > Hi, > > I would like to store binary data in a PostgreSQL database. The size of > the data is about 2 to 20 MB and is always stored or retrieved as a > block (i.e., I do not need to get only part of the data). As I > understand, I have two options for storing this data: As BYTEA or as > large objects. As I understand, the disadvantage of the first method is > that I need to write a custom routine to escape some binary values - or > is there some code available to do this? The disadvantage of large > objects is their existence outside of the main database, which may be > problematic when backing up a database. In addition, I need special > routines to store/retrieve the data. > I've used both methods. The only real problem is that none of the trigger based replication schemes such as Slony can't deal with large objects. Depending on what programming language you're using you do *not* need to escape the binary data for BYTEA. Using libpq from C/C++ you can pass the binary data straight into the database. See PQexecParams : http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html#LIBPQ-EXEC-MAIN We use server-side JavaScript here that makes use of that interface so I can take an image directly from the web-server and move it into the database with no escape overhead. My preference : if I don't need the file-like interface to large objects I'd use BYTEA every time. Pete -- http://www.whitebeam.org - JavaScript web application server. http://www.yellowhawk.co.uk > My two questions are: Is this summary correct? And: Which method should > I choose? > > Best, > Koen > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/ > >
Koen Vermeer wrote: > Instead of base64 encoding, I guess it would be easier to just escape > the required bytes and store them in a bytea. Actually, if you have access to the pqExecParams() call, you can pass the bytes to a bytea column unescaped, which AFAIK saves some processing on both the client and server. To get the bytes back unescaped, I think you could use a binary cursor. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, 2008-02-13 at 09:35 +0000, Peter Wilson wrote: > I've used both methods. The only real problem is that none of the > trigger based replication schemes > such as Slony can't deal with large objects. I can live with that for now. If the project ever gets that big, I probably need to rethink some design choices anyway. > Depending on what programming language you're using you do *not* need to > escape the binary > data for BYTEA. Using libpq from C/C++ you can pass the binary data > straight into the database. See > PQexecParams : > http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html#LIBPQ-EXEC-MAIN > We use server-side JavaScript here that makes use of that interface so I > can take an image directly > from the web-server and move it into the database with no escape overhead. Ah, I see, that seems to be a better way indeed. I can probably do something like that with PHP as well. > My preference : if I don't need the file-like interface to large objects > I'd use BYTEA every time. Right, so that basically means that when 'large objects' are files, which should be saved and restored as a whole, it may be more natural to use the large objects. I guess that applies to some uses of media storage (music, photos, video). The large-objects-are-actually-files thing applies to my situation, so unless there is some 'large objects are / will be deprecated' argument, I guess I stick with large objects. Thanks! Koen
Koen Vermeer wrote: > The large-objects-are-actually-files thing applies to my situation, so > unless there is some 'large objects are / will be deprecated' argument, > I guess I stick with large objects. Certainly there is no such argument. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Koen Vermeer wrote: > On Wed, 2008-02-13 at 09:35 +0000, Peter Wilson wrote: > >> My preference : if I don't need the file-like interface to large objects >> I'd use BYTEA every time. >> > > Right, so that basically means that when 'large objects' are files, > which should be saved and restored as a whole, it may be more natural to > use the large objects. I guess that applies to some uses of media > storage (music, photos, video). > No - I meant the Postgres large object interface allows you to read and write sections of a large object. It provides a seek/read/write interface. If you're only going to read or write the whole contents as a single block then use BYTEA. In my case I store uploaded images as BYTEA - I only every need to read/write the image as a whole. If you were dealing with very large images/music/video in a web environment then I could see a web application wanting to read a chunk - write to the web client - read next chunk etc and thus avoid the overhead of the entire contents being in memory at one time. That probably doesn't help with upload though. Pete -- http://www.whitebeam.org - JavaScript web application server. http://www.yellowhawk.co.uk > The large-objects-are-actually-files thing applies to my situation, so > unless there is some 'large objects are / will be deprecated' argument, > I guess I stick with large objects. > > Thanks! > > Koen > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/ > > -- ------------------------------------------------------------------------ Peter Wilson T: 01707 891840 M: 07796 656566 http://www.yellowhawk.co.uk The information in this email is confidential and is intended for the addressee/s only. Access to this email by anyone else is unauthorised. If you are not the intended recipient, you must not read, use or disseminate the information contained in or attached to this email.
Op woensdag 13-02-2008 om 15:21 uur [tijdzone +0000], schreef Peter Wilson: > > Right, so that basically means that when 'large objects' are files, > > which should be saved and restored as a whole, it may be more natural to > > use the large objects. I guess that applies to some uses of media > > storage (music, photos, video). > No - I meant the Postgres large object interface allows you to read and > write sections of a > large object. It provides a seek/read/write interface. > If you're only going to read or write the whole contents as a single > block then use BYTEA. In > my case I store uploaded images as BYTEA - I only every need to > read/write the image as a whole. > If you were dealing with very large images/music/video in a web > environment then I could see a > web application wanting to read a chunk - write to the web client - read > next chunk etc and thus > avoid the overhead of the entire contents being in memory at one time. > That probably doesn't > help with upload though. In my case, the web application only downloads the data. Upload is done through a custom, local application. So I guess BYTEA would work fine for me. I'll do some more reading to see if any of its other properties wouldn't match my needs and if both BYTEA and large objects seem to suite me, I'll go for BYTEA. Thanks for the help! Koen
Op woensdag 13-02-2008 om 10:45 uur [tijdzone -0300], schreef Alvaro Herrera: > > Instead of base64 encoding, I guess it would be easier to just escape > > the required bytes and store them in a bytea. > Actually, if you have access to the pqExecParams() call, you can pass > the bytes to a bytea column unescaped, which AFAIK saves some processing > on both the client and server. > To get the bytes back unescaped, I think you could use a binary cursor. I'll check to see what the options are for reading in the data in PHP. Thanks for the help! Best, Koen
On Feb 13, 2008, at 2:53 PM, Koen Vermeer wrote: > I'll check to see what the options are for reading in the data in PHP. > Thanks for the help! If you use prepared statements, you don't need to do anything special at all for bytea with PHP. No worries about escaping and all that. Using the schema below and a simple prepared statement API (http://pgedit.com/resource/php/pgfuncall ), I can insert/load documents with a single line like: $db->blob_insert($content); $content = $db->blob_content($this->object_ref); John DeSoi, Ph.D. -- -- blobs -- create table blob ( dbid serial primary key, content bytea ); create or replace function blob_insert(p_content bytea) returns integer as $$ declare new_dbid integer = nextval(pg_get_serial_sequence('blob', 'dbid')); begin insert into blob (dbid, content) values (new_dbid, p_content); return new_dbid; end; $$ language plpgsql; create or replace function blob_update(p_dbid integer, p_content bytea) returns integer as $$ begin update blob set content = p_content where dbid = p_dbid; if found then return 1; else return 0; end if; end; $$ language plpgsql; create or replace function blob_content(p_dbid integer) returns bytea as $$ declare v_content bytea; begin select into v_content content from blob where dbid = p_dbid; return v_content; end; $$ language plpgsql;