Thread: BLOB's bypassing the OS Filesystem for better Image loading speed?
Hi all again, My next queststion is dedicated to blobs in my Webapplication (using Tomcat 5 and JDBC integrated a the J2EE Appserver JBoss). Filesystems with many Filesystem Objects can slow down the Performance at opening and reading Data. My Question: Can i speedup my Webapplication if i store my JPEG Images with small sizes inside my PostgreSQL Database (on verry large Databasis over 1 GByte and above without Images at this time!) I hope some Peoples can give me a Tip or Hint where in can some usefull Information about it! Thanks Josh
apoc9009@yahoo.de wrote: > Hi all again, > > My next queststion is dedicated to blobs in my Webapplication (using > Tomcat 5 and JDBC > integrated a the J2EE Appserver JBoss). > > Filesystems with many Filesystem Objects can slow down the Performance > at opening > and reading Data. Which filesystems? I know ext2 used to have issues with many-thousands of files in one directory, but that was a directory scanning issue rather than file reading. > My Question: > Can i speedup my Webapplication if i store my JPEG Images with small > sizes inside my PostgreSQL Database (on verry large Databasis over 1 GByte > and above without Images at this time!) No. Otherwise the filesystem people would build their filesystems on top of PostgreSQL not the other way around. Of course, if you want image updates to be part of a database transaction, then it might be worth storing them in the database. > I hope some Peoples can give me a Tip or Hint where in can > some usefull Information about it! Look into having a separate server (process or actual hardware) to handle requests for static text and images. Keep the Java server for actually processing data. -- Richard Huxton Archonet Ltd
> Which filesystems? I know ext2 used to have issues with many-thousands > of files in one directory, but that was a directory scanning issue > rather than file reading. From my Point of view i think it is better to let one Process do the operation to an Postgres Cluster Filestructure as if i bypass it with a second process. For example: A User loads up some JPEG Images over HTTP. a) (Filesystem) On Filesystem it would be written in a File with a random generated Filename (timestamp or what ever) (the Directory Expands and over a Million Fileobjects with will be archived, written, replaced, e.t.c) b) (Database) The JPEG Image Information will be stored into a BLOB as Part of a special Table, where is linked wit the custid of the primary Usertable. From my Point of view is any outside Process (must be created, forked, Memory allocated, e.t.c) a bad choice. I think it is generall better to Support the Postmaster in all Ways and do some Hardware RAID Configurations. >> My Question: >> Can i speedup my Webapplication if i store my JPEG Images with small >> sizes inside my PostgreSQL Database (on verry large Databasis over 1 >> GByte >> and above without Images at this time!) > > > No. Otherwise the filesystem people would build their filesystems on > top of PostgreSQL not the other way around. Of course, if you want > image updates to be part of a database transaction, then it might be > worth storing them in the database. Hmm, ORACLE is going the other Way. All File Objects can be stored into the Database if the DB has the IFS Option (Database Filesystem and Fileserver insinde the Database). > >> I hope some Peoples can give me a Tip or Hint where in can >> some usefull Information about it! > > Look into having a separate server (process or actual hardware) to > handle requests for static text and images. Keep the Java server for > actually processing Thanks
My laptop reads an entire compiled linux kernel (23000 files totalling 250 MBytes) in about 1.5 seconds if they're in cache. It's about 15.000 files/second. You think it's slow ? If you want to read them in random order, you'll probably use something else than a laptop drive, but you get the idea. Filesystem is reiser4. If you use ext2, you'll have a problem with many files in the same directory because I believe it uses a linear search, hence time proportional to the number of files (ouch). I once tried to put a million 1-kbyte files in a directory ; it was with reiserfs 3, and it didn't seem to feel anything close to molested. I believe it took some 10 minutes, but it was two years ago so I don't remember very well. NTFS took a day, that I do remember ! By curiosity I tried to stuff 1 million 1KB files in a directory on my laptop right now, It took a bit less than two minutes. On Tue, 26 Apr 2005 11:34:45 +0200, apoc9009@yahoo.de <apoc9009@yahoo.de> wrote: > >> Which filesystems? I know ext2 used to have issues with many-thousands >> of files in one directory, but that was a directory scanning issue >> rather than file reading. > > From my Point of view i think it is better to let one Process do the > operation to an Postgres Cluster Filestructure as > if i bypass it with a second process. > > For example: > A User loads up some JPEG Images over HTTP. > > a) (Filesystem) > On Filesystem it would be written in a File with a random generated > Filename (timestamp or what ever) > (the Directory Expands and over a Million Fileobjects with will be > archived, written, replaced, e.t.c) > > b) (Database) > The JPEG Image Information will be stored into a BLOB as Part of a > special Table, where is linked > wit the custid of the primary Usertable. > > From my Point of view is any outside Process (must be created, forked, > Memory allocated, e.t.c) > a bad choice. I think it is generall better to Support the Postmaster in > all Ways and do some > Hardware RAID Configurations. > >>> My Question: >>> Can i speedup my Webapplication if i store my JPEG Images with small >>> sizes inside my PostgreSQL Database (on verry large Databasis over 1 >>> GByte >>> and above without Images at this time!) >> >> >> No. Otherwise the filesystem people would build their filesystems on >> top of PostgreSQL not the other way around. Of course, if you want >> image updates to be part of a database transaction, then it might be >> worth storing them in the database. > > Hmm, ORACLE is going the other Way. All File Objects can be stored into > the Database if the DB > has the IFS Option (Database Filesystem and Fileserver insinde the > Database). > > >> >>> I hope some Peoples can give me a Tip or Hint where in can >>> some usefull Information about it! >> >> Look into having a separate server (process or actual hardware) to >> handle requests for static text and images. Keep the Java server for >> actually processing > > > Thanks > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
* apoc9009@yahoo.de <apoc9009@yahoo.de> wrote: Hi, > My next queststion is dedicated to blobs in my Webapplication (using > Tomcat 5 and JDBC > integrated a the J2EE Appserver JBoss). > > Filesystems with many Filesystem Objects can slow down the Performance > at opening and reading Data. As others already pointed out, you probably meant: overcrowded directories can make some filesystems slow. For ext2 this is the case. Instead reiserfs is designed to handle very large directories (in fact by using similar indices like an database does). If your application is an typical web app your will probably have the situation: + images get read quite often, while they get updated quite seldom. + you dont want to use image content in quries (ie. match against it) + the images will be transfered directly, without further processing + you can give the upload and the download-server access to a shared filesystem or synchronize their filesystems (ie rsync) Under this assumptions, I'd suggest directly using the filesystem. This should save some load, ie. + no transfer from postgres -> webserver and further processing (server side application) necessary, the webserver can directly fetch files from filesystem + no further processing (server side application) necessary + backup and synchronization is quite trivial (good old fs tools) + clustering (using many image webservers) is quite trivial Already mentioned that you've got to choose the right filesystem or at least the right fs organization (ie. working with a n-level hierachy to keep directory sizes small and lookups fast). An RDBMS can do this for you and so will save some implementation work, but I don't think it will be noticably faster than an good fs-side implementation. Of course there may be a lot of good reasons to put images into the database, ie. if some clients directly work on db connections and all work (including image upload) should be done over the db link. cu -- --------------------------------------------------------------------- Enrico Weigelt == metux IT service phone: +49 36207 519931 www: http://www.metux.de/ fax: +49 36207 519932 email: contact@metux.de --------------------------------------------------------------------- Realtime Forex/Stock Exchange trading powered by postgresSQL :)) http://www.fxignal.net/ ---------------------------------------------------------------------
>> Filesystems with many Filesystem Objects can slow down the Performance >> at opening and reading Data. On my laptop, lighttpd takes upto 15000 hits PER SECOND on static 2-3 Kb files (tested with apachebench 2). Apache is slower, of course : 3-4000 hits per second which is not that bad. Using a dynamic script with images in the database, you should account for query and transmission overhead, dynamic page overhead... mmm, I'd say using a fast application server you could maybe get 2-300 images served per second from the database, and that's very optimistic. And then the database will crawl, it will be disintegrated by the incoming flow of useless requests... scalability will be awful... Not mentioning that browsers ask the server "has this image changed since the last time ?" (HEAD request) and then they don't download it if it doesn't. The server just stat()'s the file. statting a file on any decent filesystem (ie. XFS Reiser JFS etc.) should take less than 10 microseconds if the information is in the cache. You'll have to look in the database to check the date... more queries ! If you want to control download rights on files, you can still put the files on the filesystem (which is the right choice IMHO) and use a dynamic script to serve them. Even better, you could use lighttpd's authorized file download feature. The only case I see putting files in a database as interesting is if you want them to be part of a transaction. In that case, why not...