Thread: Storing images in PostgreSQL databases (again)
Hello, I think I'm about to ask a traditional (almost religious) question, but I haven't been able to find a crystal clear answer in the mailing lists so far. Thus, here is my question: I need to store a large number of images in a PostgreSQL database. In my application, this represents a few hundreds of thousands of images. The size of each image is about 100-200 Ko. There is a large turnover in my database, i.e. each image stays about 1 week in the database, then it is deleted. Of course, I need to have a relatively fast access to each one of these images. But more importantly, I need to periodically delete a large number of images in batch process. Moreover, the disk space that is used on the hard-disk to store the images should be kept as small as possible: Precisely, after the aforementioned batch deletions, the table that contains the images should be immediately compacted (I cannot afford the internal use of a "to be deleted" flag, because of the large amount of disk space my database requires). I have three possible implementation choices in PostgreSQL: 1) Storing the images directly on the disk, and storing an URI in the database tables (but this would require a more tricky implementation, and ACID-ity would be difficult to ensure -- after all, a database should abstract the internal storage of data, may it be images). 2) Storing the images in a "bytea" column (but what about the access times, and the batch deletion process?). 3) Storing the images as large objects (this sounds like the best solution to me, but the documentation lacks clarity about the scope of these large objects). Finally, my question is what method would you recommend to me? I thank you much in advance for your answers! ___________________________________________________________________________ Yahoo! Mail réinvente le mail ! Découvrez le nouveau Yahoo! Mail et son interface révolutionnaire. http://fr.mail.yahoo.com
> Hello, > > I think I'm about to ask a traditional (almost > religious) question, but I haven't been able to find a > crystal clear answer in the mailing lists so far. > Thus, here is my question: > > I need to store a large number of images in a > PostgreSQL database. In my application, this > represents a few hundreds of thousands of images. The > size of each image is about 100-200 Ko. There is a > large turnover in my database, i.e. each image stays > about 1 week in the database, then it is deleted. > > Of course, I need to have a relatively fast access to > each one of these images. But more importantly, I need > to periodically delete a large number of images in > batch process. Moreover, the disk space that is used > on the hard-disk to store the images should be kept as > small as possible: Precisely, after the aforementioned > batch deletions, the table that contains the images > should be immediately compacted (I cannot afford the > internal use of a "to be deleted" flag, because of the > large amount of disk space my database requires). > > I have three possible implementation choices in > PostgreSQL: > > 1) Storing the images directly on the disk, and > storing an URI in the database tables (but this would > require a more tricky implementation, and ACID-ity > would be difficult to ensure -- after all, a database > should abstract the internal storage of data, may it > be images). > > 2) Storing the images in a "bytea" column (but what > about the access times, and the batch deletion > process?). > > 3) Storing the images as large objects (this sounds > like the best solution to me, but the documentation > lacks clarity about the scope of these large objects). > > Finally, my question is what method would you > recommend to me? > > I thank you much in advance for your answers! > > > > > > > ___________________________________________________________________________ > Yahoo! Mail réinvente le mail ! Découvrez le nouveau Yahoo! Mail et son > interface révolutionnaire. > http://fr.mail.yahoo.com > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > Encode the image on base64 and inseert on a text field if you use Bytea it needs to be encoded and the size stored will be more than base64 encoded if you store the image on disk you need to keep the consistency between the database and the file system leonel
TIJod wrote: > I need to store a large number of images in a > PostgreSQL database. In my application, this > represents a few hundreds of thousands of images. The > size of each image is about 100-200 Ko. There is a > large turnover in my database, i.e. each image stays > about 1 week in the database, then it is deleted. I see little value to storing the images in the database. For me that's a general statement (I'm sure others will disagree); but especially in your case, where you have a high volume and only want to store them for a couple days. Why incur all the overhead of putting them in the DB? You can't search on them or sort on them. I would just store them in the file system and put a reference in the DB. > but this wouldrequire a more tricky implementation, and ACID-ity > would be difficult to ensure -- after all, a database > should abstract the internal storage of data, may it > be images). I can't get excited about this. First, given the amount of overhead you'll be avoiding, checking the return code from storing the image in the file system seems relatively trivial. Store the image first, and if you get a failure code, don't store the rest of the data in the DB; you've just implemented data consistency. That assumes, of course, that the image is the only meaningful data you have, which in most situations is not the case. Meaning you'd want to store the rest of the data anyway with a messages saying "image not available." -- Guy Rouillier
On Oct 4, 2006, at 12:56 PM, Guy Rouillier wrote: > TIJod wrote: >> I need to store a large number of images in a >> PostgreSQL database. In my application, this >> represents a few hundreds of thousands of images. The >> size of each image is about 100-200 Ko. There is a >> large turnover in my database, i.e. each image stays >> about 1 week in the database, then it is deleted. > > I see little value to storing the images in the database. For me > that's > a general statement (I'm sure others will disagree); but especially in > your case, where you have a high volume and only want to store them > for > a couple days. Why incur all the overhead of putting them in the DB? > You can't search on them or sort on them. I would just store them in > the file system and put a reference in the DB. > >> but this wouldrequire a more tricky implementation, and ACID-ity >> would be difficult to ensure -- after all, a database >> should abstract the internal storage of data, may it >> be images). > > I can't get excited about this. First, given the amount of overhead > you'll be avoiding, checking the return code from storing the image in > the file system seems relatively trivial. Store the image first, > and if > you get a failure code, don't store the rest of the data in the DB; > you've just implemented data consistency. That assumes, of course, > that > the image is the only meaningful data you have, which in most > situations > is not the case. Meaning you'd want to store the rest of the data > anyway with a messages saying "image not available." Combine that with an on delete trigger that adds the filename to a deletion queue (within the transaction) and a separate process that runs through the deletion queue occasionally and you get something quite useable, while still being able to use sendfile() to throw the image over the wire rather than squeezing all that data through the database. Cheers, Steve
On 10/4/06, TIJod <tijod@yahoo.fr> wrote: > I think I'm about to ask a traditional (almost > religious) question, but I haven't been able to find a > crystal clear answer in the mailing lists so far. I think the key in deciding this, in your case, is your requirement for space reclamation: > There is a > large turnover in my database, i.e. each image stays > about 1 week in the database, then it is deleted. > ... But more importantly, I need > to periodically delete a large number of images in > batch process. Moreover, the disk space that is used > on the hard-disk to store the images should be kept as > small as possible: Precisely, after the aforementioned > batch deletions, the table that contains the images > should be immediately compacted (I cannot afford the > internal use of a "to be deleted" flag, because of the > large amount of disk space my database requires). If I understand what postgresql is doing, then DELETE will not reclaim the space immediately. What happens internally is not all that different from marking the space as deleted. A VACUUM will allow that space to be reused, (assuming your free space map is big enough), and a VACUUM FULL would be necessary to compress the space away. All of these seem incompatible with your requirements. I agree with another responder who suggested using the filesystem for your images. Jack Orenstein
Hi. I can provide some "insight" on the difference between the two interfaces. AFAIK, the difference is in size of the file you can store, and in the interface you have when you want to access. The size is not important (I think), since you are far below the limit. For the interface, the bytea gives you a "query" based interfaces, while largeobject are able to provide a file based interface. With Large Object, you can avoid reading the whole object with one read, or you can even move inside the Large Object, which can be useful if you have large files stored. I think there are differences also in how the space is reclaimed, but my PostgreSQL - Fu stops here. Regards Marco -- Marco Bizzarri http://notenotturne.blogspot.com/
On 10/4/06, Guy Rouillier <guyr@masergy.com> wrote: > TIJod wrote: > > I need to store a large number of images in a > > PostgreSQL database. In my application, this > > represents a few hundreds of thousands of images. The > > size of each image is about 100-200 Ko. There is a > > large turnover in my database, i.e. each image stays > > about 1 week in the database, then it is deleted. > > I see little value to storing the images in the database. For me that's > a general statement (I'm sure others will disagree); but especially in > your case, where you have a high volume and only want to store them for > a couple days. Why incur all the overhead of putting them in the DB? > You can't search on them or sort on them. I would just store them in > the file system and put a reference in the DB. no, you can't search or sort on them but you can put metadata on fields and search on that, and you can do things like use RI to delete images that are associated with other things, etc. this would probably fit the OP's methodogy quite nicely. > > but this wouldrequire a more tricky implementation, and ACID-ity > > would be difficult to ensure -- after all, a database > > should abstract the internal storage of data, may it > > be images). > > I can't get excited about this. First, given the amount of overhead > you'll be avoiding, checking the return code from storing the image in > the file system seems relatively trivial. Store the image first, and if > you get a failure code, don't store the rest of the data in the DB; > you've just implemented data consistency. That assumes, of course, that > the image is the only meaningful data you have, which in most situations > is not the case. Meaning you'd want to store the rest of the data > anyway with a messages saying "image not available." i think this topic is interesting and deserves better treatment than assumptions. postgresql will toast all images over a cerain size which is actually pretty efficient although can be a problem if your images are really big. on the downside you have more vacuuming overhead and postgresql can't match filesystem speed for raw writing. also you can pretty much forget decent performance if your code that does the actual insertion is not in c/c++ and uses the paramaterized api. on the flip side, you have a central interface, single point of failure and you don't have to deal with thousands or millions of image files which can become it's own problem (although solvable). also you don't have to write plumbing code to get something like atomicity. PostgreSQL is getting more and more efficeint at moving large streams in and out of the database and the answer here is not as cut and try as you might think (historically, it was insane to even attempt it). i'm wondering if anybody has ever attempted to manage large collections of binary objects inside the database and has advice here. merlin
"Merlin Moncure" <mmoncure@gmail.com> writes: > ... postgresql will toast all images over a cerain size which > is actually pretty efficient although can be a problem if your images > are really big. But any reasonable image format is compressed already (or at least, if you are using an uncompressed format while worried about disk space then you need some time at image processing boot camp). TOAST isn't going to accomplish anything. I think the arguments for keeping stuff inside the database are (a) far easier to maintain transactional semantics for insert/delete, and (b) easier to serve the data out to clients that aren't on the same machine. You aren't going to find a performance win though. regards, tom lane
On Oct 5, 2006, at 16:18 , Merlin Moncure wrote: >> I see little value to storing the images in the database. For me >> that's >> a general statement (I'm sure others will disagree); but >> especially in >> your case, where you have a high volume and only want to store >> them for >> a couple days. Why incur all the overhead of putting them in the DB? >> You can't search on them or sort on them. I would just store them in >> the file system and put a reference in the DB. > > no, you can't search or sort on them but you can put metadata on > fields and search on that, and you can do things like use RI to delete > images that are associated with other things, etc. this would > probably fit the OP's methodogy quite nicely. I second this sentiment; there is a lot to be said for keeping your data together in a unified storage/retrieval system with ACID semantics. There is nothing inherently wrong about this model. [...] > i'm wondering if anybody has ever attempted to manage large > collections of binary objects inside the database and has advice here. We have a production system containing 10,000 images (JPEG and PNG of various sizes) totaling roughly 4GBs. We have Lighttpd running against a couple of Rails processes which crop, scale and convert images on the fly using ImageMagick; converted images are cached in the file system and subsequently served directly by Lighttpd. Functionally I have absolutely no quibbles with this system; PostgreSQL stores the data smoothly and everything works as designed. Performance-wise, I'm not sure; the amount of data seems to put a certain load on the database server, though it's impossible to tell how much. Backups are hell, taking hours and hours to do just a single dump of the database. Rails' PostgreSQL adapter uses SQL for inserts and quotes every byte as an octal escape sequence; storing a single image can take several seconds. Single-image retrieval is similarly slow, but since the adapter uses bindings that talk directly to libpq4, I believe it's caused by the overall load on the database. Because of this, we see no recourse but to move the images into the file system. Since our cluster consists of three separate machines all running the same Rails application, with no dedicated box handling the image storage, such a solution requires the use of NFS or other type of shared storage for centralized image storage; we're not sure yet about what we will end up with. Alexander.
On 10/5/06, Alexander Staubo <alex@purefiction.net> wrote: > On Oct 5, 2006, at 16:18 , Merlin Moncure wrote: > > i'm wondering if anybody has ever attempted to manage large > > collections of binary objects inside the database and has advice here. > > We have a production system containing 10,000 images (JPEG and PNG of > various sizes) totaling roughly 4GBs. We have Lighttpd running > against a couple of Rails processes which crop, scale and convert > images on the fly using ImageMagick; converted images are cached in > the file system and subsequently served directly by Lighttpd. > Functionally I have absolutely no quibbles with this system; > PostgreSQL stores the data smoothly and everything works as designed. > > Performance-wise, I'm not sure; the amount of data seems to put a > certain load on the database server, though it's impossible to tell > how much. Backups are hell, taking hours and hours to do just a i admit, backups could be a problem. maybe pitr is the answer. (dump style backups are a problem for any big database) > single dump of the database. Rails' PostgreSQL adapter uses SQL for > inserts and quotes every byte as an octal escape sequence; storing a ouch...the only way to do this quickly imo is to send in raw binary data directly to the database using parameterized...this eliminates both the escaping and the unescaping step. likewise the data should be pulled out binary (this will liekly be several times faster). > single image can take several seconds. Single-image retrieval is > similarly slow, but since the adapter uses bindings that talk > directly to libpq4, I believe it's caused by the overall load on the > database. > > Because of this, we see no recourse but to move the images into the > file system. Since our cluster consists of three separate machines > all running the same Rails application, with no dedicated box > handling the image storage, such a solution requires the use of NFS > or other type of shared storage for centralized image storage; we're > not sure yet about what we will end up with. cant fault you for that decision, web applications are usually pretty aggressive on caching. they also ususally fit pretty well in the various replication technlogies as well...something to consider. merlin
marco.bizzarri@gmail.com ("Marco Bizzarri") writes: > Hi. > > I can provide some "insight" on the difference between the two interfaces. > > AFAIK, the difference is in size of the file you can store, and in the > interface you have when you want to access. > > The size is not important (I think), since you are far below the limit. > > For the interface, the bytea gives you a "query" based interfaces, > while largeobject are able to provide a file based interface. > > With Large Object, you can avoid reading the whole object with one > read, or you can even move inside the Large Object, which can be > useful if you have large files stored. > > I think there are differences also in how the space is reclaimed, but > my PostgreSQL - Fu stops here. That seems a reasonable explanation... There is another thing that is worth observing for the "store data as an ordinary column" idea... Very Large Columns are stored in what is called a TOAST table. Consider the following table: tbig@[local]:5832=# \d foo Table "public.foo" Column | Type | Modifiers ----------+---------+-------------------------------------------------- id | integer | not null default nextval('foo_id_seq'::regclass) name | text | contents | text | Indexes: "foo_pkey" PRIMARY KEY, btree (id) tbig@[local]:5832=# vacuum verbose foo; INFO: vacuuming "public.foo" INFO: index "foo_pkey" now contains 3 row versions in 2 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "foo": found 0 removable, 3 nonremovable row versions in 1 pages DETAIL: 0 dead row versions cannot be removed yet. There were 12 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming "pg_toast.pg_toast_113203" INFO: index "pg_toast_113203_index" now contains 54 row versions in 2 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_113203": found 0 removable, 54 nonremovable row versions in 14 pages DETAIL: 0 dead row versions cannot be removed yet. There were 5 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM tbig@[local]:5832=# select id, name, length(contents) from foo; id | name | length ----+-------+--------- 1 | file1 | 1860342 2 | file2 | 1860342 3 | file3 | 1860342 (3 rows) The "contents" columns contain 1.8MB of data. Note that if you run a query that doesn't access the "contents" columns, they never get drawn in. What the table 'physically' looks like is rather like: id | name | toast pointer ----+-------+-------------- 1 | file1 | 1341234 2 | file2 | 3241341 3 | file3 | 3421892 [where those pointers point into the "toast" table]. You can get decent efficiency out of that... -- (reverse (concatenate 'string "ofni.sesabatadxunil" "@" "enworbbc")) http://cbbrowne.com/info/linuxdistributions.html "...[Linux's] capacity to talk via any medium except smoke signals." -- Dr. Greg Wettstein, Roger Maris Cancer Center
> "Merlin Moncure" <mmoncure@gmail.com> writes: >> ... postgresql will toast all images over a cerain size which >> is actually pretty efficient although can be a problem if your images >> are really big. > > But any reasonable image format is compressed already (or at least, if > you are using an uncompressed format while worried about disk space then > you need some time at image processing boot camp). TOAST isn't going to > accomplish anything. > > I think the arguments for keeping stuff inside the database are > (a) far easier to maintain transactional semantics for insert/delete, > and (b) easier to serve the data out to clients that aren't on the same > machine. You aren't going to find a performance win though. > (c) easy to replicate (d) easy to load balancing leonel
I appologize for duplicate posting, but I am not sure if this is getting posted to the news group. Problem: PostgreSQL "Service" is not runing on a Linux Box, but I have a database on the Linux Box, that I want to relocate to another machine, Windows or Linux. Question: What files do I need to transfer to get this to work?
Brian J. Erickson <ericksbr@infocon-inc.com> schrieb: > I appologize for duplicate posting, but I am not sure > if this is getting posted to the news group. > > Problem: > PostgreSQL "Service" is not runing on a Linux Box, Why not? > but I have a database on the Linux Box, that I want > to relocate to another machine, Windows or Linux. > > Question: > What files do I need to transfer to get this to work? Start the database, do a pg_dumpall, and build on the new machine the db from this dump. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Merlin Moncure wrote: > i'm wondering if anybody has ever attempted to manage large > collections of binary objects inside the database and has advice here. We have designed and built an image library using Postgres and NFS servers which currently holds 1.4 million images totalling more than 250Gb. Not having the images in the database keeps the database lightweight, and allows us to use Slony to replicate - something we could not do with blob data (I believe). If you want to scale up, I think it is worth keeping the images out of the database. Just use the database to store the filename/location and meta data associated with each image. Otherwise the images bloat the database... Backups are small (the meta data is lightweight), we can use slony for data redundancy. The NFS servers are rsnyc'd to physically back the images up. This setup means that we have to have a layer above the database which keeps the database and images on the file system in sync. The database and java layer above with the NFS servers have effectively allowed us to build a virtualized file system for storing images and meta data. We can plug in more NFS shares as our storage requirements grow, and the database keeps tracks of disk usage within each physical disk volume. This setup appears to give us good performance and hasn't been too difficult to maintain or administer. For a setup this size which is growing daily, storing the images in the database was not really a sensible option. Hoever, with a smaller setup, keeping the images in the database definitely keeps things simpler though... John
I have seen several posts pertaining to the "overhead" difference in storing in a db table versus the file system. What is this difference? I also think the decision as store in a db table or file system is looking at how the files will be accessed. If all the clients are on the same network as the server using the servers file system is viable. If though your clients are on different networks then storing in the db may be a better option. -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Guy Rouillier Sent: Wednesday, October 04, 2006 3:57 PM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Storing images in PostgreSQL databases (again) TIJod wrote: > I need to store a large number of images in a > PostgreSQL database. In my application, this > represents a few hundreds of thousands of images. The > size of each image is about 100-200 Ko. There is a > large turnover in my database, i.e. each image stays > about 1 week in the database, then it is deleted. I see little value to storing the images in the database. For me that's a general statement (I'm sure others will disagree); but especially in your case, where you have a high volume and only want to store them for a couple days. Why incur all the overhead of putting them in the DB? You can't search on them or sort on them. I would just store them in the file system and put a reference in the DB. > but this wouldrequire a more tricky implementation, and ACID-ity > would be difficult to ensure -- after all, a database > should abstract the internal storage of data, may it > be images). I can't get excited about this. First, given the amount of overhead you'll be avoiding, checking the return code from storing the image in the file system seems relatively trivial. Store the image first, and if you get a failure code, don't store the rest of the data in the DB; you've just implemented data consistency. That assumes, of course, that the image is the only meaningful data you have, which in most situations is not the case. Meaning you'd want to store the rest of the data anyway with a messages saying "image not available." -- Guy Rouillier ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings
>> Problem: >> PostgreSQL "Service" is not runing on a Linux Box, > Why not? The file "/etc/mtab" has been corrupted, when start postgresql the "Service" dies because of an "input/output" error. Therefore, "pg_dumpall' won't work. ----- Original Message ----- From: "Andreas Kretschmer" <akretschmer@spamfence.net> To: <pgsql-general@postgresql.org> Sent: Thursday, October 05, 2006 8:52 AM Subject: Re: [GENERAL] PostgreSQL Database Transfer between machines(again) Brian J. Erickson <ericksbr@infocon-inc.com> schrieb: > I appologize for duplicate posting, but I am not sure > if this is getting posted to the news group. > > Problem: > PostgreSQL "Service" is not runing on a Linux Box, Why not? > but I have a database on the Linux Box, that I want > to relocate to another machine, Windows or Linux. > > Question: > What files do I need to transfer to get this to work? Start the database, do a pg_dumpall, and build on the new machine the db from this dump. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/05/06 14:50, Brian J. Erickson wrote: >>> Problem: >>> PostgreSQL "Service" is not runing on a Linux Box, >> Why not? > The file "/etc/mtab" has been corrupted, when start > postgresql the "Service" dies because of an "input/output" > error. > > Therefore, "pg_dumpall' won't work. Doesn't /etc/mtab get recreated at boot? And since it's a text file, can't someone fix it with $EDITOR? - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFJXFbS9HxQb37XmcRAqSZAJsG2iHXySUdiooXVun5hJsX6zfEPwCggAEb OjDEKQ5VErzaRv+UcH0P/Kk= =vs7r -----END PGP SIGNATURE-----
Leonel Nunez wrote: >> I think the arguments for keeping stuff inside the database are >> (a) far easier to maintain transactional semantics for insert/delete, >> and (b) easier to serve the data out to clients that aren't on the >> same machine. You aren't going to find a performance win though. >> > > (c) easy to replicate I don't follow that. Suppose your database minus images is 3 GB, and your images are another 50 gigabytes. Which is easier to replicate, 3 or 53? Put the images on a file server, separate from the DBs - no need to replicate them. And if you do want to copy (e.g., your replicated DB is in a remote location), you can do a simple file system copy to the corresponding remote file server. > (d) easy to load balancing If you're load balancing, both databases are in the same location, right? In which case you only need one set of images on a central file server. -- Guy Rouillier
On Oct 5, 2006, at 19:47 , DEV wrote: > I have seen several posts pertaining to the "overhead" difference > in storing > in a db table versus the file system. What is this difference? Well, there's not much space overhead to speak of. I tested with a bunch of JPEG files: $ find files | wc -l 2724 $ du -hs files 213M files With an empty database and the following schema: create table files (id serial, data bytea); alter table files alter column data set storage external; When loaded into the database: $ du -hs /opt/local/var/db/postgresql/base/16386 223M /opt/local/var/db/postgresql/base/16386 On my MacIntel with PostgreSQL from DarwinPorts -- a configuration/ port where PostgreSQL performance does *not* shine, incidentally -- PostgreSQL can insert the image data at a pretty stable 2.5MB/s. It's still around 30 times slower than the file system at reading the data. (I would love to run a benchmark to provide detailed timings, but that would tie up my laptop for too long.) Alexander.
> Leonel Nunez wrote: >>> I think the arguments for keeping stuff inside the database are >>> (a) far easier to maintain transactional semantics for insert/delete, >>> and (b) easier to serve the data out to clients that aren't on the >>> same machine. You aren't going to find a performance win though. >>> >> >> (c) easy to replicate > > I don't follow that. Suppose your database minus images is 3 GB, and > your images are another 50 gigabytes. Which is easier to replicate, 3 > or 53? Put the images on a file server, separate from the DBs - no need > to replicate them. yes 3GB are *faster* han 53 gb but is the same as easy as 3 or 100 > And if you do want to copy (e.g., your replicated DB > is in a remote location), you can do a simple file system copy to the > corresponding remote file server. this is done with automatic replication as the data is inserted, deleted or updated. When I say data I mean records , images, pdfs and all the objects I use > >> (d) easy to load balancing > > If you're load balancing, both databases are in the same location, > right? In which case you only need one set of images on a central file > server. > the bigger data are the images so 1 set of images is not a real solution for me. with all the data stored on the database gives me more scalability, consistency, flexibility than putting the "files" on the disk and "data" on the database > -- > Guy Rouillier > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > Leonel
> And since it's a text file, can't someone fix it with $EDITOR? I tried to edit the file, but I get the Input/Output error. The recommendatation was to re-install the OS. However, I DO NOT want to lose my database, so I am tring to backup the database. ----- Original Message ----- From: "Ron Johnson" <ron.l.johnson@cox.net> To: <pgsql-general@postgresql.org> Sent: Thursday, October 05, 2006 1:55 PM Subject: Re: [GENERAL] PostgreSQL Database Transfer between machines(again) > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 10/05/06 14:50, Brian J. Erickson wrote: > >>> Problem: > >>> PostgreSQL "Service" is not runing on a Linux Box, > >> Why not? > > The file "/etc/mtab" has been corrupted, when start > > postgresql the "Service" dies because of an "input/output" > > error. > > > > Therefore, "pg_dumpall' won't work. > > Doesn't /etc/mtab get recreated at boot? > > And since it's a text file, can't someone fix it with $EDITOR? > > - -- > Ron Johnson, Jr. > Jefferson LA USA > > Is "common sense" really valid? > For example, it is "common sense" to white-power racists that > whites are superior to blacks, and that those with brown skins > are mud people. > However, that "common sense" is obviously wrong. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.5 (GNU/Linux) > > iD8DBQFFJXFbS9HxQb37XmcRAqSZAJsG2iHXySUdiooXVun5hJsX6zfEPwCggAEb > OjDEKQ5VErzaRv+UcH0P/Kk= > =vs7r > -----END PGP SIGNATURE----- > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/05/06 18:10, Brian J. Erickson wrote: >> And since it's a text file, can't someone fix it with $EDITOR? > > I tried to edit the file, but I get the Input/Output error. > > The recommendatation was to re-install the OS. > However, I DO NOT want to lose my database, > so I am tring to backup the database. Booting from Knoppix should let you fix it. - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFJZEoS9HxQb37XmcRAhGVAJ0YQzP6+5jDJO4/kwdw44GxVRig4QCfYGNh /mfxBi9tNKKjMtkeYsbU4S0= =e1v8 -----END PGP SIGNATURE-----
On Oct 5, 2006, at 19:10 , Brian J. Erickson wrote: >> And since it's a text file, can't someone fix it with $EDITOR? > > I tried to edit the file, but I get the Input/Output error. > > The recommendatation was to re-install the OS. > However, I DO NOT want to lose my database, > so I am tring to backup the database. You can boot from any rescue CD, mount the partition, copy the database directory away and then copy it back once you have reinstalled. This is safe because it is on the same machine. It is not safe to copy the database to some arbitrary computer and expect it to run. Make sure to match the database version. Good luck! -M
If the database had built-in functions to manipulate images (make a thumbnail, add text ont it.., make a montage of two pictures) and I could write something like
select thumbnail(image_field, 100, 100) from images_table
that would be a good reason to go the db route versus the filesystem route. A database does more then storing data, it makes convenient to play with them. Once my pictures are stored in the database, how do I make thumbnails for instance? Maybe the solution already exists; I am curious here. Is there a way to integrate ImageMagick into a PostgreSQL workflow?
By the way, is it practical to set a bytea column (containing pictures) as primary key? That would severely slow down many operations I guess.
JCR
From: Alexander Staubo <alex@purefiction.net>
To: pgsql-general@postgresql.org
Cc: DEV <dev@umpa-us.com>
Sent: Thursday, October 5, 2006 6:30:07 PM
Subject: Re: [GENERAL] Storing images in PostgreSQL databases (again)
> I have seen several posts pertaining to the "overhead" difference
> in storing
> in a db table versus the file system. What is this difference?
Well, there's not much space overhead to speak of. I tested with a
bunch of JPEG files:
$ find files | wc -l
2724
$ du -hs files
213M files
With an empty database and the following schema:
create table files (id serial, data bytea);
alter table files alter column data set storage external;
When loaded into the database:
$ du -hs /opt/local/var/db/postgresql/base/16386
223M /opt/local/var/db/postgresql/base/16386
On my MacIntel with PostgreSQL from DarwinPorts -- a configuration/
port where PostgreSQL performance does *not* shine, incidentally --
PostgreSQL can insert the image data at a pretty stable 2.5MB/s. It's
still around 30 times slower than the file system at reading the
data. (I would love to run a benchmark to provide detailed timings,
but that would tie up my laptop for too long.)
Alexander.
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
On Oct 6, 2006, at 01:29 , Jean-Christophe Roux wrote: > By the way, is it practical to set a bytea column (containing > pictures) as primary key? That would severely slow down many > operations I guess. Why would you? It's possible, but completely impractical, since image data typically exceeds the index page size. Moreover, are you really going to retrieve an image row by its image data? Alexander.
> Hi, > If the database had built-in functions to manipulate images (make a > thumbnail, add text ont it.., make a montage of two pictures) and I could > write something like > select thumbnail(image_field, 100, 100) from images_table > that would be a good reason to go the db route versus the filesystem > route. A database does more then storing data, it makes convenient to > play with them. Once my pictures are stored in the database, how do I make > thumbnails for instance? Maybe the solution already exists; I am curious > here. Is there a way to integrate ImageMagick into a PostgreSQL workflow? > By the way, is it practical to set a bytea column (containing pictures) as > primary key? That would severely slow down many operations I guess. > JCR > > With Python and the python imaging library you can do this : image is a bytea field curs = conn.cursor () curs.execute( "select image from images where name = %s" ,(thename, )) row = curs.fetchone() if row: im = Image.open (StringIO.StringIO(row[0])) im.thumbnail (160,120 ) imagetmp = StringIO.StringIO() im.save ( imagetmp , "JPEG") print ("Content-type: image/jpeg\n\n") print ( imagetmp.getvalue()) with this you get your image to the browser thumbnailed without touch the filesystem that's all Leonel
"Leonel Nunez" <lnunez@enelserver.com> wrote: > > > If the database had built-in functions to manipulate images (make a > > thumbnail, add text ont it.., make a montage of two pictures) and I could > > write something like > > select thumbnail(image_field, 100, 100) from images_table > > that would be a good reason to go the db route versus the filesystem > > route. <snip> > With Python and the python imaging library you can do this : > > image is a bytea field > > curs = conn.cursor () > curs.execute( "select image from images where name = %s" ,(thename, )) > row = curs.fetchone() > if row: > im = Image.open (StringIO.StringIO(row[0])) > im.thumbnail (160,120 ) > imagetmp = StringIO.StringIO() > im.save ( imagetmp , "JPEG") > print ("Content-type: image/jpeg\n\n") > print ( imagetmp.getvalue()) I think part of the point, which you missed, is the convenience of having the thumbnailing as part of the SQL language by making it a stored procedure. I did a presentation for WPLUG not too long ago where I created C functions in Postgres compiled against the GSOAP library that allowed you to make simple SOAP calls in SQL within PostgreSQL. Neat stuff. The problem with creating those kinds of functions is the CPU overhead. We'll be generating the thumbnails and storing them in a "thumbnail" field in the record, so we don't have to regenerate the thumbnail each time it's needed. BTW: our reason for keeping the thumbnails in fields is so they can be replicated with Slony along with the rest of the database. -- Bill Moran We meddle. People don't like to be meddled with. We tell them what to do, what to think. Don't run, don't walk. We're in their homes and in their heads and we haven't the right. River Tam
AgentM wrote: > > On Oct 5, 2006, at 19:10 , Brian J. Erickson wrote: > >>> And since it's a text file, can't someone fix it with $EDITOR? >> >> I tried to edit the file, but I get the Input/Output error. >> O.k. hold on... are you getting any errors in /var/log/messages? Joshua D. Drake >> The recommendatation was to re-install the OS. >> However, I DO NOT want to lose my database, >> so I am tring to backup the database. > > You can boot from any rescue CD, mount the partition, copy the database > directory away and then copy it back once you have reinstalled. This is > safe because it is on the same machine. It is not safe to copy the > database to some arbitrary computer and expect it to run. Make sure to > match the database version. Good luck! > > -M > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
> "Leonel Nunez" <lnunez@enelserver.com> wrote: >> >> > If the database had built-in functions to manipulate images (make a >> > thumbnail, add text ont it.., make a montage of two pictures) and I >> could >> > write something like >> > select thumbnail(image_field, 100, 100) from images_table >> > that would be a good reason to go the db route versus the filesystem >> > route. > > <snip> > >> With Python and the python imaging library you can do this : >> >> image is a bytea field >> >> curs = conn.cursor () >> curs.execute( "select image from images where name = %s" ,(thename, )) >> row = curs.fetchone() >> if row: >> im = Image.open (StringIO.StringIO(row[0])) >> im.thumbnail (160,120 ) >> imagetmp = StringIO.StringIO() >> im.save ( imagetmp , "JPEG") >> print ("Content-type: image/jpeg\n\n") >> print ( imagetmp.getvalue()) > > I think part of the point, which you missed, is the convenience of having > the thumbnailing as part of the SQL language by making it a stored > procedure. > > I did a presentation for WPLUG not too long ago where I created C > functions in Postgres compiled against the GSOAP library that allowed > you to make simple SOAP calls in SQL within PostgreSQL. Neat stuff. > > The problem with creating those kinds of functions is the CPU overhead. > We'll be generating the thumbnails and storing them in a "thumbnail" > field in the record, so we don't have to regenerate the thumbnail each > time it's needed. > > BTW: our reason for keeping the thumbnails in fields is so they can be > replicated with Slony along with the rest of the database. > > -- > Bill Moran > > We meddle. People don't like to be meddled with. We tell them what to > do, > what to think. Don't run, don't walk. We're in their homes and in their > heads and we haven't the right. > > River Tam > > you are 100% right Leonel
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/05/06 19:41, Bill Moran wrote: > "Leonel Nunez" <lnunez@enelserver.com> wrote: >>> If the database had built-in functions to manipulate images (make a >>> thumbnail, add text ont it.., make a montage of two pictures) and I could >>> write something like >>> select thumbnail(image_field, 100, 100) from images_table >>> that would be a good reason to go the db route versus the filesystem >>> route. > > <snip> > >> With Python and the python imaging library you can do this : >> >> image is a bytea field >> >> curs = conn.cursor () >> curs.execute( "select image from images where name = %s" ,(thename, )) >> row = curs.fetchone() >> if row: >> im = Image.open (StringIO.StringIO(row[0])) >> im.thumbnail (160,120 ) >> imagetmp = StringIO.StringIO() >> im.save ( imagetmp , "JPEG") >> print ("Content-type: image/jpeg\n\n") >> print ( imagetmp.getvalue()) > > I think part of the point, which you missed, is the convenience of having > the thumbnailing as part of the SQL language by making it a stored > procedure. Would untrusted pl/python be able to do this? > I did a presentation for WPLUG not too long ago where I created C > functions in Postgres compiled against the GSOAP library that allowed > you to make simple SOAP calls in SQL within PostgreSQL. Neat stuff. > > The problem with creating those kinds of functions is the CPU overhead. > We'll be generating the thumbnails and storing them in a "thumbnail" > field in the record, so we don't have to regenerate the thumbnail each > time it's needed. > > BTW: our reason for keeping the thumbnails in fields is so they can be > replicated with Slony along with the rest of the database. > - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFJbVOS9HxQb37XmcRAoFXAKCy8+MIMuWCAaxeJyvijAVGP/RwhACgzO3T Q1pruQgrFSvsdiUEwtLvgDk= =XZD2 -----END PGP SIGNATURE-----
FWIW, the company I work for stores its terrabytes of imagery on disk, using a database to track them (spatial coordinates,metadata, location, etc.); I have worked on projects in which we stored images in a database (blobs in Informix)and it worked fine. Both approaches can have their merits. Personally, I'd do thumbnails on intake and handle them on their own, either on disk on in the db. But I have preferencefor a preprocessing data so runtime response is maximized. Assuming you don't have access to a blade/suite of functions that allow you to use the image in the database as a usefuldata type (Informix at least used ot have a blade that did this), you can still use informtation about the image asa primary key, to wit, a sufficiently large hash (MD5 for instance). Of course, there's time to create the hash which mightbe an issue in a high volume system. Extending a hash with some other data (date ?) can considerably decrease the chanceof collisions. It's still a longish key, but workable I suspect (untested, we used an artificial key, a serial). $0.02 worth ... Greg Williamson DBA GlobeXplorer LLC -----Original Message----- From: pgsql-general-owner@postgresql.org on behalf of Jean-Christophe Roux Sent: Thu 10/5/2006 4:29 PM To: pgsql-general@postgresql.org Cc: Subject: Re: [GENERAL] Storing images in PostgreSQL databases (again) Hi, If the database had built-in functions to manipulate images (make a thumbnail, add text ont it.., make a montage of two pictures)and I could write something like select thumbnail(image_field, 100, 100) from images_table that would be a good reason to go the db route versus the filesystem route. A database does more then storing data, it makesconvenient to play with them. Once my pictures are stored in the database, how do I make thumbnails for instance? Maybethe solution already exists; I am curious here. Is there a way to integrate ImageMagick into a PostgreSQL workflow? By the way, is it practical to set a bytea column (containing pictures) as primary key? That would severely slow down manyoperations I guess. JCR ----- Original Message ---- From: Alexander Staubo <alex@purefiction.net> To: pgsql-general@postgresql.org Cc: DEV <dev@umpa-us.com> Sent: Thursday, October 5, 2006 6:30:07 PM Subject: Re: [GENERAL] Storing images in PostgreSQL databases (again) On Oct 5, 2006, at 19:47 , DEV wrote: > I have seen several posts pertaining to the "overhead" difference > in storing > in a db table versus the file system. What is this difference? Well, there's not much space overhead to speak of. I tested with a bunch of JPEG files: $ find files | wc -l 2724 $ du -hs files 213M files With an empty database and the following schema: create table files (id serial, data bytea); alter table files alter column data set storage external; When loaded into the database: $ du -hs /opt/local/var/db/postgresql/base/16386 223M /opt/local/var/db/postgresql/base/16386 On my MacIntel with PostgreSQL from DarwinPorts -- a configuration/ port where PostgreSQL performance does *not* shine, incidentally -- PostgreSQL can insert the image data at a pretty stable 2.5MB/s. It's still around 30 times slower than the file system at reading the data. (I would love to run a benchmark to provide detailed timings, but that would tie up my laptop for too long.) Alexander. ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ------------------------------------------------------- Click link below if it is SPAM gsw@globexplorer.com "https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=452593f911951950113718&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1" !DSPAM:452593f911951950113718! -------------------------------------------------------
> > You can boot from any rescue CD, mount the partition, copy the database > > directory away and then copy it back once you have reinstalled. This is > > safe because it is on the same machine. It is not safe to copy the > > database to some arbitrary computer and expect it to run. That is basically the plan but I want to make sure that I have all of the data. > O.k. hold on... are you getting any errors in /var/log/messages? Here is some of the "/var/log/messages" file -------------------------------------------Begin---------------------------- ---------------------- Oct 6 07:57:27 Info1A kernel: PCI: Using configuration type 1 Oct 6 07:57:27 Info1A kernel: PCI: Probing PCI hardware Oct 6 07:57:27 Info1A mount: mount: can't open /etc/mtab for writing: Input/output error Oct 6 07:57:27 Info1A kernel: PCI: Probing PCI hardware (bus 00) Oct 6 07:57:27 Info1A netfs: Mounting other filesystems: failed . . . Oct 6 07:57:33 Info1A su(pam_unix)[1229]: session opened for user postgres by (uid=0) Oct 6 07:57:33 Info1A su(pam_unix)[1229]: session closed for user postgres Oct 6 07:57:34 Info1A postgresql: Starting postgresql service: failed ---------------------------------------------End---------------------------- --------------------- ----- Original Message ----- From: "Joshua D. Drake" <jd@commandprompt.com> To: "AgentM" <agentm@themactionfaction.com> Cc: "PostgreSQL Mailing lists" <pgsql-general@postgresql.org> Sent: Thursday, October 05, 2006 5:42 PM Subject: Re: [GENERAL] PostgreSQL Database Transfer between machines(again) > AgentM wrote: > > > > On Oct 5, 2006, at 19:10 , Brian J. Erickson wrote: > > > >>> And since it's a text file, can't someone fix it with $EDITOR? > >> > >> I tried to edit the file, but I get the Input/Output error. > >> > > O.k. hold on... are you getting any errors in /var/log/messages? > > Joshua D. Drake > > > > > >> The recommendatation was to re-install the OS. > >> However, I DO NOT want to lose my database, > >> so I am tring to backup the database. > > > > You can boot from any rescue CD, mount the partition, copy the database > > directory away and then copy it back once you have reinstalled. This is > > safe because it is on the same machine. It is not safe to copy the > > database to some arbitrary computer and expect it to run. Make sure to > > match the database version. Good luck! > > > > -M > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org > > > > > -- > > === The PostgreSQL Company: Command Prompt, Inc. === > Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 > Providing the most comprehensive PostgreSQL solutions since 1997 > http://www.commandprompt.com/ > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
Brian J. Erickson wrote: >>> You can boot from any rescue CD, mount the partition, copy the database >>> directory away and then copy it back once you have reinstalled. This is >>> safe because it is on the same machine. It is not safe to copy the >>> database to some arbitrary computer and expect it to run. > That is basically the plan but I want to make sure that I have all of the > data. > >> O.k. hold on... are you getting any errors in /var/log/messages? > Here is some of the "/var/log/messages" file That's odd. Have you tried removing /etc/mtab manually and rebooting? does the problem still occur? As far as PostgreSQL... you need to look in to postgresql logs. Did you compile from source? What version of Linux is this? Joshua D. Drake > -------------------------------------------Begin---------------------------- > ---------------------- > Oct 6 07:57:27 Info1A kernel: PCI: Using configuration type 1 > Oct 6 07:57:27 Info1A kernel: PCI: Probing PCI hardware > Oct 6 07:57:27 Info1A mount: mount: can't open /etc/mtab for writing: > Input/output error > Oct 6 07:57:27 Info1A kernel: PCI: Probing PCI hardware (bus 00) > Oct 6 07:57:27 Info1A netfs: Mounting other filesystems: failed > . > . > . > Oct 6 07:57:33 Info1A su(pam_unix)[1229]: session opened for user postgres > by (uid=0) > Oct 6 07:57:33 Info1A su(pam_unix)[1229]: session closed for user postgres > Oct 6 07:57:34 Info1A postgresql: Starting postgresql service: failed > ---------------------------------------------End---------------------------- > --------------------- > > > > ----- Original Message ----- > From: "Joshua D. Drake" <jd@commandprompt.com> > To: "AgentM" <agentm@themactionfaction.com> > Cc: "PostgreSQL Mailing lists" <pgsql-general@postgresql.org> > Sent: Thursday, October 05, 2006 5:42 PM > Subject: Re: [GENERAL] PostgreSQL Database Transfer between machines(again) > > >> AgentM wrote: >>> On Oct 5, 2006, at 19:10 , Brian J. Erickson wrote: >>> >>>>> And since it's a text file, can't someone fix it with $EDITOR? >>>> I tried to edit the file, but I get the Input/Output error. >>>> >> O.k. hold on... are you getting any errors in /var/log/messages? >> >> Joshua D. Drake >> >> >> >> >>>> The recommendatation was to re-install the OS. >>>> However, I DO NOT want to lose my database, >>>> so I am tring to backup the database. >>> You can boot from any rescue CD, mount the partition, copy the database >>> directory away and then copy it back once you have reinstalled. This is >>> safe because it is on the same machine. It is not safe to copy the >>> database to some arbitrary computer and expect it to run. Make sure to >>> match the database version. Good luck! >>> >>> -M >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 4: Have you searched our list archives? >>> >>> http://archives.postgresql.org >>> >> >> -- >> >> === The PostgreSQL Company: Command Prompt, Inc. === >> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 >> Providing the most comprehensive PostgreSQL solutions since 1997 >> http://www.commandprompt.com/ >> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 3: Have you checked our extensive FAQ? >> >> http://www.postgresql.org/docs/faq >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Am I going to retrieve an image row by its image data? I would certainly like! For instance, I would like to get the pictures whose main color is green (requirement from a real project), and a
select * from images where main_color(image) = 'green' would be nice.
JCR
From: Alexander Staubo <alex@purefiction.net>
To: Jean-Christophe Roux <jcxxr@yahoo.com>
Cc: pgsql-general@postgresql.org
Sent: Thursday, October 5, 2006 7:35:04 PM
Subject: Re: [GENERAL] Storing images in PostgreSQL databases (again)
> By the way, is it practical to set a bytea column (containing
> pictures) as primary key? That would severely slow down many
> operations I guess.
Why would you? It's possible, but completely impractical, since image
data typically exceeds the index page size. Moreover, are you really
going to retrieve an image row by its image data?
Alexander.
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/05/06 18:54, Jean-Christophe Roux wrote: > Why would I set a bytea column (containing picures) as a primary > key? Because I want to be sure that the same image is inserted > only once (that requirement comes from a real project) and using > a primary key for that purpose makes sense to me. A hash function makes better sense. SHA-1 creates a 20 byte string that is the digest of the file. Almost perfect as a primary key. SHA-256 (so far!) guarantees uniqueness in 32 bytes. > Am I going to > retrieve an image row by its image data? I would certainly like! > For instance, I would like to get the pictures whose main color > is green (requirement from a real project), and a select * from > images where main_color(image) = 'green' would be nice. That would speedily be handled by preprocessing metadata records as the image is inserted. - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFJql5S9HxQb37XmcRAgYjAKCeTur3N/AUwQLI79GYbde8nCMYwACfejg+ Xe3qGl7GHlhC4jssvGkAJ6w= =dW03 -----END PGP SIGNATURE-----
On 10/5/06, Jean-Christophe Roux <jcxxr@yahoo.com> wrote: > > Why would I set a bytea column (containing picures) as a primary key? > Because I want to be sure that the same image is inserted only once (that > requirement comes from a real project) and using a primary key for that > purpose makes sense to me. > Am I going to retrieve an image row by its image data? I would certainly > like! For instance, I would like to get the pictures whose main color is > green (requirement from a real project), and a > select * from images where main_color(image) = 'green' would be nice. > JCR standard postgresql indexes use a btree which would be completely impractical to index bitmaps. what you would want is a hand rigged unqiue constraint which is combination of insert trigger and hash algorithm plus some code to resolve collisions. this is what i would do if bitmaps were stored in/out of the database. merlin
you could store the pkey as a md5 or sha1 of the image's data. or any of the other large hashing algorithms. that way your index only has to compare 32 or 40 bytes instead of kilobytes per row.
as for the main color, you could generate histogram-like columns (or even a single column) with the relative strengths of each channel and store them into a smallint or bitstring. you could then do whatever you wanted per channel and it could be indexed.
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jean-Christophe Roux
Sent: Thursday, October 05, 2006 7:55 PM
To: Alexander Staubo
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Storing images in PostgreSQL databases (again)
Why would I set a bytea column (containing picures) as a primary key? Because I want to be sure that the same image is inserted only once (that requirement comes from a real project) and using a primary key for that purpose makes sense to me.
Am I going to retrieve an image row by its image data? I would certainly like! For instance, I would like to get the pictures whose main color is green (requirement from a real project), and a
select * from images where main_color(image) = 'green' would be nice.
JCR
----- Original Message ----
From: Alexander Staubo <alex@purefiction.net>
To: Jean-Christophe Roux <jcxxr@yahoo.com>
Cc: pgsql-general@postgresql.org
Sent: Thursday, October 5, 2006 7:35:04 PM
Subject: Re: [GENERAL] Storing images in PostgreSQL databases (again)
On Oct 6, 2006, at 01:29 , Jean-Christophe Roux wrote:
> By the way, is it practical to set a bytea column (containing
> pictures) as primary key? That would severely slow down many
> operations I guess.
Why would you? It's possible, but completely impractical, since image
data typically exceeds the index page size. Moreover, are you really
going to retrieve an image row by its image data?
Alexander.
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Hate to suggest corporate software, but there is an Informix/Illustra blade that could do something like what you're after(I remember a demo of sunset/sunrise photos being selected on the basis of color values) ... But I think they used smart blobs and didn't use them as key values. G -----Original Message----- From: pgsql-general-owner@postgresql.org on behalf of Jean-Christophe Roux Sent: Thu 10/5/2006 4:54 PM To: Alexander Staubo Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Storing images in PostgreSQL databases (again) Why would I set a bytea column (containing picures) as a primary key? Because I want to be sure that the same image is insertedonly once (that requirement comes from a real project) and using a primary key for that purpose makes sense to me. Am I going to retrieve an image row by its image data? I would certainly like! For instance, I would like to get the pictureswhose main color is green (requirement from a real project), and a select * from images where main_color(image) = 'green' would be nice. JCR ----- Original Message ---- From: Alexander Staubo <alex@purefiction.net> To: Jean-Christophe Roux <jcxxr@yahoo.com> Cc: pgsql-general@postgresql.org Sent: Thursday, October 5, 2006 7:35:04 PM Subject: Re: [GENERAL] Storing images in PostgreSQL databases (again) On Oct 6, 2006, at 01:29 , Jean-Christophe Roux wrote: > By the way, is it practical to set a bytea column (containing > pictures) as primary key? That would severely slow down many > operations I guess. Why would you? It's possible, but completely impractical, since image data typically exceeds the index page size. Moreover, are you really going to retrieve an image row by its image data? Alexander. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly ------------------------------------------------------- Click link below if it is SPAM gsw@globexplorer.com "https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=4526a19c122019835456387&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1" !DSPAM:4526a19c122019835456387! -------------------------------------------------------
On Thu, Oct 05, 2006 at 05:08:27PM -0600, Leonel Nunez wrote: > Subject: Re: [GENERAL] Storing images in PostgreSQL databases (again) > User-Agent: SquirrelMail/1.4.8 > > > Leonel Nunez wrote: > >>> I think the arguments for keeping stuff inside the database are > >>> (a) far easier to maintain transactional semantics for insert/delete, > >>> and (b) easier to serve the data out to clients that aren't on the > >>> same machine. You aren't going to find a performance win though. > >>> > >> > >> (c) easy to replicate > > > > I don't follow that. Suppose your database minus images is 3 GB, and > > your images are another 50 gigabytes. Which is easier to replicate, 3 > > or 53? Put the images on a file server, separate from the DBs - no need > > to replicate them. > > yes 3GB are *faster* han 53 gb but is the same as easy as 3 or 100 And, the above only applies to *initial* costs of replication. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Am 2006-10-05 21:22:04, schrieb Gregory S. Williamson: > a sufficiently large hash (MD5 for instance). Of course, there's I do this already but have problems since I have stored arround 130 million files on a server... > time to create the hash which might be an issue in a high volume > system. Extending a hash with some other data (date ?) can MD5 hashes are 32 Bytes long, maybe they change it to 64 Bytes? I have already over 2000 collisions and checked it, that the files are NOT the same. > considerably decrease the chance of collisions. It's still a > longish key, but workable I suspect (untested, we used an > artificial key, a serial). Greetings Michelle Konzack Systemadministrator Tamay Dogan Network Debian GNU/Linux Consultant -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ ##################### Debian GNU/Linux Consultant ##################### Michelle Konzack Apt. 917 ICQ #328449886 50, rue de Soultz MSM LinuxMichi 0033/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com)
In response to Michelle Konzack <linux4michelle@freenet.de>: > Am 2006-10-05 21:22:04, schrieb Gregory S. Williamson: > > > a sufficiently large hash (MD5 for instance). Of course, there's > > I do this already but have problems since I have > stored arround 130 million files on a server... > > > time to create the hash which might be an issue in a high volume > > system. Extending a hash with some other data (date ?) can > > MD5 hashes are 32 Bytes long, maybe they change > it to 64 Bytes? > > I have already over 2000 collisions and checked > it, that the files are NOT the same. Try sha1 or sha256. -- Bill Moran Collaborative Fusion Inc.
Michelle Konzack <linux4michelle@freenet.de> wrote: > I do this already but have problems since I have > stored arround 130 million files on a server... > > ... > > MD5 hashes are 32 Bytes long, maybe they change > it to 64 Bytes? > > I have already over 2000 collisions and checked > it, that the files are NOT the same. You mean you have 2000 collisions out of the checksums of 130 million different files? That can't be right. An MD5 hash is 128 bits, and using the values found in <http://en.wikipedia.org/wiki/Birthday_attack>, you don't reach a 50% chance of a single collision until you've checksummed 2.2 x 10^19 different inputs. That's, ummm, 22,000,000,000,000,000,000, I think, which is much larger than 130,000,000. In other words, you should not expect even a single collision until you have 169,230,769,231 times as many files as you currently have, which should not be a issue before the end of the useful life of the solar system. If you have 2000 collisions after 130 million different files (or even if you have two collisions), something is almost certainly wrong with your code, unfortunately. -- Robert L Mathews "The trouble with doing something right the first time is that nobody appreciates how difficult it was."
On Oct 13, 2006, at 01:36 , Robert L Mathews wrote: > Michelle Konzack <linux4michelle@freenet.de> wrote: > >> I do this already but have problems since I have >> stored arround 130 million files on a server... > > ... > > >> MD5 hashes are 32 Bytes long, maybe they change >> it to 64 Bytes? >> I have already over 2000 collisions and checked >> it, that the files are NOT the same. > > You mean you have 2000 collisions out of the checksums of 130 > million different files? That can't be right. [...] > If you have 2000 collisions after 130 million different files (or > even if you have two collisions), something is almost certainly > wrong with your code, unfortunately. Or Michelle is storing a bunch of duplicate images. Alexander.
The machine did boot and PostgreSQL started backup, etc. Thanks. ----- Original Message ----- From: "Joshua D. Drake" <jd@commandprompt.com> To: "Brian J. Erickson" <ericksbr@infocon-inc.com> Cc: "PostgreSQL Mailing lists" <pgsql-general@postgresql.org> Sent: Friday, October 06, 2006 9:17 AM Subject: Re: [GENERAL] PostgreSQL Database Transfer between machines(again) > Brian J. Erickson wrote: > >>> You can boot from any rescue CD, mount the partition, copy the database > >>> directory away and then copy it back once you have reinstalled. This is > >>> safe because it is on the same machine. It is not safe to copy the > >>> database to some arbitrary computer and expect it to run. > > That is basically the plan but I want to make sure that I have all of the > > data. > > > >> O.k. hold on... are you getting any errors in /var/log/messages? > > Here is some of the "/var/log/messages" file > > > That's odd. Have you tried removing /etc/mtab manually and rebooting? > does the problem still occur? As far as PostgreSQL... you need to look > in to postgresql logs. > > Did you compile from source? What version of Linux is this? > > Joshua D. Drake > > > > -------------------------------------------Begin-------------------------- -- > > ---------------------- > > Oct 6 07:57:27 Info1A kernel: PCI: Using configuration type 1 > > Oct 6 07:57:27 Info1A kernel: PCI: Probing PCI hardware > > Oct 6 07:57:27 Info1A mount: mount: can't open /etc/mtab for writing: > > Input/output error > > Oct 6 07:57:27 Info1A kernel: PCI: Probing PCI hardware (bus 00) > > Oct 6 07:57:27 Info1A netfs: Mounting other filesystems: failed > > . > > . > > . > > Oct 6 07:57:33 Info1A su(pam_unix)[1229]: session opened for user postgres > > by (uid=0) > > Oct 6 07:57:33 Info1A su(pam_unix)[1229]: session closed for user postgres > > Oct 6 07:57:34 Info1A postgresql: Starting postgresql service: failed > > ---------------------------------------------End-------------------------- -- > > --------------------- > > > > > > > > ----- Original Message ----- > > From: "Joshua D. Drake" <jd@commandprompt.com> > > To: "AgentM" <agentm@themactionfaction.com> > > Cc: "PostgreSQL Mailing lists" <pgsql-general@postgresql.org> > > Sent: Thursday, October 05, 2006 5:42 PM > > Subject: Re: [GENERAL] PostgreSQL Database Transfer between machines(again) > > > > > >> AgentM wrote: > >>> On Oct 5, 2006, at 19:10 , Brian J. Erickson wrote: > >>> > >>>>> And since it's a text file, can't someone fix it with $EDITOR? > >>>> I tried to edit the file, but I get the Input/Output error. > >>>> > >> O.k. hold on... are you getting any errors in /var/log/messages? > >> > >> Joshua D. Drake > >> > >> > >> > >> > >>>> The recommendatation was to re-install the OS. > >>>> However, I DO NOT want to lose my database, > >>>> so I am tring to backup the database. > >>> You can boot from any rescue CD, mount the partition, copy the database > >>> directory away and then copy it back once you have reinstalled. This is > >>> safe because it is on the same machine. It is not safe to copy the > >>> database to some arbitrary computer and expect it to run. Make sure to > >>> match the database version. Good luck! > >>> > >>> -M > >>> > >>> ---------------------------(end of broadcast)--------------------------- > >>> TIP 4: Have you searched our list archives? > >>> > >>> http://archives.postgresql.org > >>> > >> > >> -- > >> > >> === The PostgreSQL Company: Command Prompt, Inc. === > >> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 > >> Providing the most comprehensive PostgreSQL solutions since 1997 > >> http://www.commandprompt.com/ > >> > >> > >> > >> ---------------------------(end of broadcast)--------------------------- > >> TIP 3: Have you checked our extensive FAQ? > >> > >> http://www.postgresql.org/docs/faq > >> > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/docs/faq > > > > > -- > > === The PostgreSQL Company: Command Prompt, Inc. === > Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 > Providing the most comprehensive PostgreSQL solutions since 1997 > http://www.commandprompt.com/ > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
Hello, I get the following message in the "pgstartup.log": postmaster cannot access the server configuration file "/database/postgresql.conf": Permission denied I have: 1) read the file doing the following: a) su posgres b) less postgresql.conf 2) made sure each directory to file is readable by "postgres" 3) made the postgresql.conf and each directory readable by everyone. 4) verify the owner and group are postgres. 5) ran initdb as postgres Any ideas as to what could be the problem? Brian Erickson
"Brian J. Erickson" <ericksbr@infocon-inc.com> writes: > I get the following message in the "pgstartup.log": > postmaster cannot access the server configuration file > "/database/postgresql.conf": Permission denied Are you trying to change the standard config file location on a Red Hat or Fedora system? If so you're likely to run into SELinux restrictions on where the postmaster daemon can touch the filesystem. You'd need to modify the SELinux policy (or turn off SELinux but I don't really recommend that) to use a nonstandard config file together with the standard postgres package. regards, tom lane