Thread: Images in Database
Hi all, Is it possible to have an image object in a database, like a gif, bmp, jpeg, no matter which format? How? Couldn't find its datatype on the docs. Cheers, -- Paulo J. Matos : pocm [_at_] mega . ist . utl . pt Instituto Superior Tecnico - Lisbon Computer and Software Eng. - A.I. - > http://mega.ist.utl.pt/~pocm --- -> God had a deadline... So, he wrote it all in Lisp!
Attachment
Hi, Possible, and this is called BLOB data type. Binary large object. Some people use it with the possible cost in db size, the images as data in the db. Some prefer to store images as paths ie. text instead. BR, Aarni On Friday 19 March 2004 14:06, you wrote: > Hi all, > > Is it possible to have an image object in a database, like a gif, bmp, > jpeg, no matter which format? > > How? Couldn't find its datatype on the docs. > > Cheers, -- ------------------------------------------------- Aarni Ruuhimäki | Megative Tmi | KYMI.com Pääsintie 26 | 45100 Kouvola | FINLAND www.kymi.com | cfm.kymi.com aarni.ruuhimaki@kymi.com | info@kymi.com +358-5-3755 035 | +358-50-4910 037 ------------------------------------------------- This is a bugfree broadcast to you from a linux system.
You have 2 possibilities, the bytea data type or the large object mechanism. The bytea is generally the best because it is the most flexible. For a discussion comparing the two you can read the relevant part of the pgSQL4RB manual which you can get here: <http://aliacta.com/download> Marc >Possible, and this is called BLOB data type. Binary large object. > > Is it possible to have an image object in a database, like a gif, bmp, > > jpeg, no matter which format?
I think unless the DB provides some image-searching capabilities, it's better to store them as paths to save the space. A lot of storage will be wasted otherwise. Isn't that so?
Aarni Ruuhimäki wrote:
-- Sincerely, Reshat. --- If you see my certificate with this message, you should be able to send me encrypted e-mail. Please consult your e-mail client for details if you would like to do that.
Aarni Ruuhimäki wrote:
Hi, Possible, and this is called BLOB data type. Binary large object. Some people use it with the possible cost in db size, the images as data in the db. Some prefer to store images as paths ie. text instead. BR, Aarni On Friday 19 March 2004 14:06, you wrote:Hi all, Is it possible to have an image object in a database, like a gif, bmp, jpeg, no matter which format? How? Couldn't find its datatype on the docs. Cheers,
Attachment
At 12:34 PM -0600 3/27/04, Reshat Sabiq wrote: >I think unless the DB provides some image-searching capabilities, >it's better to store them as paths to save the space. A lot of >storage will be wasted otherwise. Isn't that so? PostgreSQL has no limit on storage except for the hard disk's limit. You shouldn't worry about that. The trouble with paths is that you must be able to resolve them from any client that connects to your database. It's also harder to move the entire database afterwards if you must, or to otherwise reorganize your directories. Having everything in your database is much much cleaner and will save you a lot of headache when you modify your solution in a next programming round. Cheers, Marc
I couldn't agree more. I used to subscribe to the notion that there wasn't a benefit to storing images in the DB. After some heartache, I've had to eat my words. We have a solution where we receive hourly updates from an external source. The updates include CSV files that are parsed into the DB and tar'd jpgs. The original procedure called for the images to be stored in the filesystem, and that worked fine for awhile. Unfortunately, it didn't scale very well. Now, we have multiple clients using the same data/images on their websites. For now, all sites are served off the same server so a symbolic link to the directory where the images exist for each site would work, but we'll most probably not have all the sites on one web server as we grow. Now, we're talking about over 4GB of images and growing. Maintaining multiple copies would be a nightmare. Moving the images to the RDBMS was the only scalable solution. Furthermore, having the images in the DB means they get backed up with the DB. Since the web site pages/scripts/etc are very static, we can do with less frequent backups of that system now that the images are gone from there. > -----Original Message----- > From: pgsql-novice-owner@postgresql.org > [mailto:pgsql-novice-owner@postgresql.org]On Behalf Of M. Bastin > Sent: Saturday, March 27, 2004 2:12 PM > To: Reshat Sabiq > Cc: pgsql-novice@postgresql.org > Subject: Re: [NOVICE] Images in Database > > > At 12:34 PM -0600 3/27/04, Reshat Sabiq wrote: > >I think unless the DB provides some image-searching capabilities, > >it's better to store them as paths to save the space. A lot of > >storage will be wasted otherwise. Isn't that so? > > PostgreSQL has no limit on storage except for the hard disk's limit. > You shouldn't worry about that. > > The trouble with paths is that you must be able to resolve them from > any client that connects to your database. It's also harder to move > the entire database afterwards if you must, or to otherwise > reorganize your directories. > > Having everything in your database is much much cleaner and will save > you a lot of headache when you modify your solution in a next > programming round. > > Cheers, > > Marc > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > >
Valid point. Maintenance might be easier (although i don't think you can dump and move images easily to another DB; you'd probably have to do some kind of direct connection to another DB to move them). When in a filesystem, you could bzip everything, and move them easily. There could also be an URL-prefix field for each client, followed by a URL suffix field for each image. Thus maintenance most of the time would be as easy as changing the prefix.
However, no matter how small an image is, it takes the same amount of BLOB space, doesn't it. This, IMHO, means a lot of wasted storage. Not sure if that also affects performance to some small degree. Storage is cheap, but still it costs money.
So, still it looks to me storing multi-media w/o additional benefits isn't quite worthwhile. But if there was something like find image LIKE another image, then i'd change my opinion in a sec. :)
But then again, i'm not against it. I just think it doesn't buy much, and wastes storage space.
Rod K wrote:
However, no matter how small an image is, it takes the same amount of BLOB space, doesn't it. This, IMHO, means a lot of wasted storage. Not sure if that also affects performance to some small degree. Storage is cheap, but still it costs money.
So, still it looks to me storing multi-media w/o additional benefits isn't quite worthwhile. But if there was something like find image LIKE another image, then i'd change my opinion in a sec. :)
But then again, i'm not against it. I just think it doesn't buy much, and wastes storage space.
-- Sincerely, Reshat. --- If you see my certificate with this message, you should be able to send me encrypted e-mail. Please consult your e-mail client for details if you would like to do that.
Rod K wrote:
I couldn't agree more. I used to subscribe to the notion that there wasn't a benefit to storing images in the DB. After some heartache, I've had to eat my words. We have a solution where we receive hourly updates from an external source. The updates include CSV files that are parsed into the DB and tar'd jpgs. The original procedure called for the images to be stored in the filesystem, and that worked fine for awhile. Unfortunately, it didn't scale very well. Now, we have multiple clients using the same data/images on their websites. For now, all sites are served off the same server so a symbolic link to the directory where the images exist for each site would work, but we'll most probably not have all the sites on one web server as we grow. Now, we're talking about over 4GB of images and growing. Maintaining multiple copies would be a nightmare. Moving the images to the RDBMS was the only scalable solution. Furthermore, having the images in the DB means they get backed up with the DB. Since the web site pages/scripts/etc are very static, we can do with less frequent backups of that system now that the images are gone from there.-----Original Message----- From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-owner@postgresql.org]On Behalf Of M. Bastin Sent: Saturday, March 27, 2004 2:12 PM To: Reshat Sabiq Cc: pgsql-novice@postgresql.org Subject: Re: [NOVICE] Images in Database At 12:34 PM -0600 3/27/04, Reshat Sabiq wrote:I think unless the DB provides some image-searching capabilities, it's better to store them as paths to save the space. A lot of storage will be wasted otherwise. Isn't that so?PostgreSQL has no limit on storage except for the hard disk's limit. You shouldn't worry about that. The trouble with paths is that you must be able to resolve them from any client that connects to your database. It's also harder to move the entire database afterwards if you must, or to otherwise reorganize your directories. Having everything in your database is much much cleaner and will save you a lot of headache when you modify your solution in a next programming round. Cheers, Marc ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Attachment
Yes, they can be dumped just fine.
I also don't understand what information you have that leads you to make the claim that space is wasted. I guess it is due to your assumption that a BLOB is used. A bytea type takes 4 extra bytes than the actual size of the data. Furthermore, the field becomes part of a bigger file and therefore will have less space loss due to differences between block size and file size. Since that difference is a maxium per file, the fewer filesystem files you have, the less space is lost.
As far as image searching, there are utilities that compare image files already. If you need this for your app, I'm sure the procedure could be duplicated in the RDBMS.
-----Original Message-----
From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-owner@postgresql.org]On Behalf Of Reshat Sabiq
Sent: Saturday, March 27, 2004 6:33 PM
To: Rod K
Cc: M. Bastin; pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Images in DatabaseValid point. Maintenance might be easier (although i don't think you can dump and move images easily to another DB; you'd probably have to do some kind of direct connection to another DB to move them). When in a filesystem, you could bzip everything, and move them easily. There could also be an URL-prefix field for each client, followed by a URL suffix field for each image. Thus maintenance most of the time would be as easy as changing the prefix.
However, no matter how small an image is, it takes the same amount of BLOB space, doesn't it. This, IMHO, means a lot of wasted storage. Not sure if that also affects performance to some small degree. Storage is cheap, but still it costs money.
So, still it looks to me storing multi-media w/o additional benefits isn't quite worthwhile. But if there was something like find image LIKE another image, then i'd change my opinion in a sec. :)
But then again, i'm not against it. I just think it doesn't buy much, and wastes storage space.-- Sincerely, Reshat.
"Rod K" <rod@23net.net> writes: > I also don't understand what information you have that leads you to make the > claim that space is wasted. I guess it is due to your assumption that a > BLOB is used. A bytea type takes 4 extra bytes than the actual size of the > data. Even when using BLOBs, there's not very significant overhead --- certainly much less than the overhead of putting a small image into a file, which is probably going to occupy at least 1K on disk. I would guess that the separate-files approach would be the most bulky of the three choices. regards, tom lane
At 5:33 PM -0600 3/27/04, Reshat Sabiq wrote: >Valid point. Maintenance might be easier (although i don't think you >can dump and move images easily to another DB; Yes you can with COPY or pg_dump. > you'd probably have to do some kind of direct connection to another >DB to move them). When in a filesystem, you could bzip everything, >and move them easily. There could also be an URL-prefix field for >each client, followed by a URL suffix field for each image. Thus >maintenance most of the time would be as easy as changing the prefix. Like you say: "most of the time" and of course you'll run into the other situations sooner or later... >However, no matter how small an image is, it takes the same amount >of BLOB space, doesn't it. This, IMHO, means a lot of wasted storage. There is no wasted storage. You'll use up as much disk space whether you store separate files or whether you put those bytes inside the database. There's no minimum size for bytea data. > Not sure if that also affects performance to some small degree. Nope, your searches are done on indexes (which don't contain the image data). Marc
I guess i had wrong assumptions. Since the following points raised are indeed true (although i'm not sure about x.a's), then yes storing them in a DB will not waste storage, probably even save some:
1. Images use a bytea type, which is not fixed size and is proportional (and close) to the image size. I see that it uses 4 bytes more than the value size (apparently to store the size in them).
1.a. Even a BLOB type, is not fixed size, and is proportional to the value stored.
2. Regular DB dump utilities can be used to transfer databases containing bytea fields. I see that it uses byte escapes to accomplish that.
2.a. I assume the same is true for BLOBs, although apparently it's a little more difficult.
Then the only reasons i can see to not store images in a DB, and the following:
i. If the same images also need to be accessed from a file system, such as by a web server, as regular URLs.
ii. If there are values that exceed the type limit (1GB for bytea); this is unlikely.
iii. I'm also seeing that BLOBs have some security concerns, which probably can be worked around, but an alternative would be to rely on the file system for permissions, etc.
Of these 3, only i. appears to be a realistic concern for images. It's a lot easier to list a URL on an [X][HT]ML page. But fat clients would probably be fine either way.
Thanks for correcting me. Learning never stops. :)
Rod K wrote:
1. Images use a bytea type, which is not fixed size and is proportional (and close) to the image size. I see that it uses 4 bytes more than the value size (apparently to store the size in them).
1.a. Even a BLOB type, is not fixed size, and is proportional to the value stored.
2. Regular DB dump utilities can be used to transfer databases containing bytea fields. I see that it uses byte escapes to accomplish that.
2.a. I assume the same is true for BLOBs, although apparently it's a little more difficult.
Then the only reasons i can see to not store images in a DB, and the following:
i. If the same images also need to be accessed from a file system, such as by a web server, as regular URLs.
ii. If there are values that exceed the type limit (1GB for bytea); this is unlikely.
iii. I'm also seeing that BLOBs have some security concerns, which probably can be worked around, but an alternative would be to rely on the file system for permissions, etc.
Of these 3, only i. appears to be a realistic concern for images. It's a lot easier to list a URL on an [X][HT]ML page. But fat clients would probably be fine either way.
Thanks for correcting me. Learning never stops. :)
-- Sincerely, Reshat. --- If you see my certificate with this message, you should be able to send me encrypted e-mail. Please consult your e-mail client for details if you would like to do that.
Rod K wrote:
Yes, they can be dumped just fine.
I also don't understand what information you have that leads you to make the claim that space is wasted. I guess it is due to your assumption that a BLOB is used. A bytea type takes 4 extra bytes than the actual size of the data. Furthermore, the field becomes part of a bigger file and therefore will have less space loss due to differences between block size and file size. Since that difference is a maxium per file, the fewer filesystem files you have, the less space is lost.
As far as image searching, there are utilities that compare image files already. If you need this for your app, I'm sure the procedure could be duplicated in the RDBMS.
-----Original Message-----
From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-owner@postgresql.org]On Behalf Of Reshat Sabiq
Sent: Saturday, March 27, 2004 6:33 PM
To: Rod K
Cc: M. Bastin; pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Images in DatabaseValid point. Maintenance might be easier (although i don't think you can dump and move images easily to another DB; you'd probably have to do some kind of direct connection to another DB to move them). When in a filesystem, you could bzip everything, and move them easily. There could also be an URL-prefix field for each client, followed by a URL suffix field for each image. Thus maintenance most of the time would be as easy as changing the prefix.
However, no matter how small an image is, it takes the same amount of BLOB space, doesn't it. This, IMHO, means a lot of wasted storage. Not sure if that also affects performance to some small degree. Storage is cheap, but still it costs money.
So, still it looks to me storing multi-media w/o additional benefits isn't quite worthwhile. But if there was something like find image LIKE another image, then i'd change my opinion in a sec. :)
But then again, i'm not against it. I just think it doesn't buy much, and wastes storage space.-- Sincerely, Reshat.
Attachment
>Then the only reasons i can see to not store images in a DB, and the >following: >... >ii. If there are values that exceed the type limit (1GB for bytea); >this is unlikely. >iii. I'm also seeing that BLOBs have some security concerns, which >probably can be worked around, but an alternative would be to rely >on the file system for permissions, etc. ii isn't really a concern since you can store your bytea data in several chunks over several related records. Personally I use chunks of 500k to minimize RAM consumption when escaping. For the same reason I don't recommend using the large object mechanism since it is so easily imitated with bytea in chunks spread over several related records, which avoids the security concerns--and has a few more advantages. Both of these remarks are in the pgSQL4RB manual (aliacta.com) in the discussion on binary data. Didn't I start out on this thread by recommending you to read that? ;-) (Well, the long way works too.) Marc
Reshat Sabiq wrote: > [...] > Then the only reasons i can see to not store images in a DB, and the > following: > i. If the same images also need to be accessed from a file system, such > as by a web server, as regular URLs. > [...] > Of these 3, only i. appears to be a realistic concern for images. It's a > lot easier to list a URL on an [X][HT]ML page. That's not really a problem. You can write a trivial CGI program in your favourite language (Python, PHP, Perl or whatever, even /bin/sh) that fetches an image from the DB, so you can write URLs like http://.../imgfetch.cgi?foo.jpg. You're even a lot more flexible that way, because you can fetch images by various criteria, not just by filename. Best regards Oliver PS: I've trimmed the "Cc:" list a bit. -- Oliver Fromme, secnetix GmbH & Co KG, Oettingenstr. 2, 80538 München Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way. "Unix gives you just enough rope to hang yourself -- and then a couple of more feet, just to be sure." -- Eric Allman
Oliver Fromme wrote: >Reshat Sabiq wrote: > > [...] > > Then the only reasons i can see to not store images in a DB, and the > > following: > > i. If the same images also need to be accessed from a file system, such > > as by a web server, as regular URLs. > > [...] > > Of these 3, only i. appears to be a realistic concern for images. It's a > > lot easier to list a URL on an [X][HT]ML page. > >That's not really a problem. You can write a trivial CGI >program in your favourite language (Python, PHP, Perl or >whatever, even /bin/sh) that fetches an image from the DB, >so you can write URLs like http://.../imgfetch.cgi?foo.jpg. > >You're even a lot more flexible that way, because you can >fetch images by various criteria, not just by filename. > >Best regards > Oliver > >PS: I've trimmed the "Cc:" list a bit. > > > Thanks. Good point. However, this would has a performance penalty. I'd prefer not to use this approach for images accessed frequently, especially if they are large, and even more so if the DB is not on the same box as the web server. But in general, you guys are right, most of the time storing images in the DB is better. IMHO, there are however exceptions to that approach, where referring to the file system is better. -- Sincerely, Reshat. --- My email address has changed to sabiq@csociety.org. Current one, sabiq@purdue.edu, will be discontinued sometime after April5, 2004. --- If you see my certificate with this message, you should be able to send me encrypted e-mail. Please consult your e-mail client for details if you would like to do that.
Attachment
Reshat Sabiq wrote: > Oliver Fromme wrote: > > That's not really a problem. You can write a trivial CGI > > program in your favourite language (Python, PHP, Perl or > > whatever, even /bin/sh) that fetches an image from the DB, > > so you can write URLs like http://.../imgfetch.cgi?foo.jpg. > > > > You're even a lot more flexible that way, because you can > > fetch images by various criteria, not just by filename. > > Thanks. Good point. However, this would has a performance penalty. No, not necessarily. If you do it right, then it can be as fast as filesystem access, or even faster. You should take into account that a filesystem (like UFS) is some kind of a "database", too. It is accessed via path names, which can be more or less efficient than indices inside a "real" database (such as PostgreSQL). Apart from that, if you require the performance, then you can put reverse proxies in front of your web servers to accelerate them, or let your CGI cache the images. That's common practice. > I'd > prefer not to use this approach for images accessed frequently, > especially if they are large, and even more so if the DB is not on the > same box as the web server. Nope. If they're accesses frequently, a reverse proxy will help, and also your CGI can easily cache such images (also take into account that a database performs some caching on its own). It shouldn't matter at all whether they're large or not. The database operation which takes the most time is to find the actual location of the object (image). That's also true for a filesystem. Once the object is located, it can be fetched at whatever speed the hardware allows. It also doesn't make a big difference whether the DB is on the same physical machine as the web server. On seriously big installations, the storage is on a separate file server anyway (e.g. a NetApp Filer or some other RAID box). The network overhead is small if you use gigabit ethernet, which is standard nowadays. > But in general, you guys are right, most of the time storing images in > the DB is better. IMHO, there are however exceptions to that approach, > where referring to the file system is better. Well, everything has advantages as well as disadvantages, and everyone has to make his own decision. But in this particular case I think the disadvantages (of not storing images in the DB itself) are negligible, or can be worked around without problems. Just my two cents, and YMMV. :-) Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co KG, Oettingenstr. 2, 80538 München Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way. Passwords are like underwear. You don't share them, you don't hang them on your monitor or under your keyboard, you don't email them, or put them on a web site, and you must change them very often.