Thread: BLOB's bypassing the OS Filesystem for better Image loading speed?

BLOB's bypassing the OS Filesystem for better Image loading speed?

From
"apoc9009@yahoo.de"
Date:
Hi all again,

My next queststion is dedicated to blobs in my  Webapplication (using
Tomcat 5 and JDBC
integrated a the J2EE Appserver JBoss).

Filesystems with many Filesystem Objects can slow down the Performance
at opening
and reading Data.

My Question:
Can i speedup my Webapplication if i store my JPEG Images with small
sizes inside my PostgreSQL Database (on verry large Databasis over 1 GByte
and above without Images at this time!)

I hope some Peoples can give me a Tip or Hint where in can
some usefull Information about it!

Thanks
Josh




Re: BLOB's bypassing the OS Filesystem for better Image

From
Richard Huxton
Date:
apoc9009@yahoo.de wrote:
> Hi all again,
>
> My next queststion is dedicated to blobs in my  Webapplication (using
> Tomcat 5 and JDBC
> integrated a the J2EE Appserver JBoss).
>
> Filesystems with many Filesystem Objects can slow down the Performance
> at opening
> and reading Data.

Which filesystems? I know ext2 used to have issues with many-thousands
of files in one directory, but that was a directory scanning issue
rather than file reading.

> My Question:
> Can i speedup my Webapplication if i store my JPEG Images with small
> sizes inside my PostgreSQL Database (on verry large Databasis over 1 GByte
> and above without Images at this time!)

No. Otherwise the filesystem people would build their filesystems on top
of PostgreSQL not the other way around. Of course, if you want image
updates to be part of a database transaction, then it might be worth
storing them in the database.

> I hope some Peoples can give me a Tip or Hint where in can
> some usefull Information about it!

Look into having a separate server (process or actual hardware) to
handle requests for static text and images. Keep the Java server for
actually processing data.

--
   Richard Huxton
   Archonet Ltd

Re: BLOB's bypassing the OS Filesystem for better Image

From
"apoc9009@yahoo.de"
Date:
> Which filesystems? I know ext2 used to have issues with many-thousands
> of files in one directory, but that was a directory scanning issue
> rather than file reading.

 From my Point of view i think it is better to let one Process do the
operation to an Postgres Cluster Filestructure as
if i bypass it with a second process.

For example:
A User loads up some JPEG Images over HTTP.

a) (Filesystem)
On Filesystem it would be written in a File with a random generated
Filename (timestamp or what ever)
(the Directory Expands and over a Million Fileobjects with will be
archived, written, replaced, e.t.c)

b) (Database)
The JPEG Image Information will be stored into a BLOB as Part of a
special Table, where is linked
wit the custid of the primary Usertable.

 From my Point of view is any outside Process (must be created, forked,
Memory allocated, e.t.c)
a bad choice. I think it is generall better to Support the Postmaster in
all Ways and do some
Hardware RAID Configurations.

>> My Question:
>> Can i speedup my Webapplication if i store my JPEG Images with small
>> sizes inside my PostgreSQL Database (on verry large Databasis over 1
>> GByte
>> and above without Images at this time!)
>
>
> No. Otherwise the filesystem people would build their filesystems on
> top of PostgreSQL not the other way around. Of course, if you want
> image updates to be part of a database transaction, then it might be
> worth storing them in the database.

Hmm, ORACLE is going the other Way. All File Objects can be stored into
the Database if the DB
has the IFS Option (Database Filesystem and Fileserver insinde the
Database).


>
>> I hope some Peoples can give me a Tip or Hint where in can
>> some usefull Information about it!
>
> Look into having a separate server (process or actual hardware) to
> handle requests for static text and images. Keep the Java server for
> actually processing


Thanks


Re: BLOB's bypassing the OS Filesystem for better Image

From
PFC
Date:
    My laptop reads an entire compiled linux kernel (23000 files totalling
250 MBytes) in about 1.5 seconds if they're in cache. It's about 15.000
files/second. You think it's slow ? If you want to read them in random
order, you'll probably use something else than a laptop drive, but you get
the idea.

    Filesystem is reiser4.

    If you use ext2, you'll have a problem with many files in the same
directory because I believe it uses a linear search, hence time
proportional to the number of files (ouch). I once tried to put a million
1-kbyte files in a directory ; it was with reiserfs 3, and it didn't seem
to feel anything close to molested. I believe it took some 10 minutes, but
it was two years ago so I don't remember very well. NTFS took a day, that
I do remember ! By curiosity I tried to stuff 1 million 1KB files in a
directory on my laptop right now, It took a bit less than two minutes.

On Tue, 26 Apr 2005 11:34:45 +0200, apoc9009@yahoo.de <apoc9009@yahoo.de>
wrote:

>
>> Which filesystems? I know ext2 used to have issues with many-thousands
>> of files in one directory, but that was a directory scanning issue
>> rather than file reading.
>
>  From my Point of view i think it is better to let one Process do the
> operation to an Postgres Cluster Filestructure as
> if i bypass it with a second process.
>
> For example:
> A User loads up some JPEG Images over HTTP.
>
> a) (Filesystem)
> On Filesystem it would be written in a File with a random generated
> Filename (timestamp or what ever)
> (the Directory Expands and over a Million Fileobjects with will be
> archived, written, replaced, e.t.c)
>
> b) (Database)
> The JPEG Image Information will be stored into a BLOB as Part of a
> special Table, where is linked
> wit the custid of the primary Usertable.
>
>  From my Point of view is any outside Process (must be created, forked,
> Memory allocated, e.t.c)
> a bad choice. I think it is generall better to Support the Postmaster in
> all Ways and do some
> Hardware RAID Configurations.
>
>>> My Question:
>>> Can i speedup my Webapplication if i store my JPEG Images with small
>>> sizes inside my PostgreSQL Database (on verry large Databasis over 1
>>> GByte
>>> and above without Images at this time!)
>>
>>
>> No. Otherwise the filesystem people would build their filesystems on
>> top of PostgreSQL not the other way around. Of course, if you want
>> image updates to be part of a database transaction, then it might be
>> worth storing them in the database.
>
> Hmm, ORACLE is going the other Way. All File Objects can be stored into
> the Database if the DB
> has the IFS Option (Database Filesystem and Fileserver insinde the
> Database).
>
>
>>
>>> I hope some Peoples can give me a Tip or Hint where in can
>>> some usefull Information about it!
>>
>> Look into having a separate server (process or actual hardware) to
>> handle requests for static text and images. Keep the Java server for
>> actually processing
>
>
> Thanks
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>



Re: BLOB's bypassing the OS Filesystem for better Image loading speed?

From
Enrico Weigelt
Date:
* apoc9009@yahoo.de <apoc9009@yahoo.de> wrote:

Hi,

> My next queststion is dedicated to blobs in my  Webapplication (using
> Tomcat 5 and JDBC
> integrated a the J2EE Appserver JBoss).
>
> Filesystems with many Filesystem Objects can slow down the Performance
> at opening and reading Data.

As others already pointed out, you probably meant: overcrowded
directories can make some filesystems slow. For ext2 this is the case.
Instead reiserfs is designed to handle very large directories
(in fact by using similar indices like an database does).

If your application is an typical web app your will probably have
the situation:

+ images get read quite often, while they get updated quite seldom.
+ you dont want to use image content in quries (ie. match against it)
+ the images will be transfered directly, without further processing
+ you can give the upload and the download-server access to a shared
  filesystem or synchronize their filesystems (ie rsync)

Under this assumptions, I'd suggest directly using the filesystem.
This should save some load, ie.

+ no transfer from postgres -> webserver and further processing
  (server side application) necessary, the webserver can directly
  fetch files from filesystem
+ no further processing (server side application) necessary
+ backup and synchronization is quite trivial (good old fs tools)
+ clustering (using many image webservers) is quite trivial

Already mentioned that you've got to choose the right filesystem or
at least the right fs organization (ie. working with a n-level hierachy
to keep directory sizes small and lookups fast).

An RDBMS can do this for you and so will save some implementation work,
but I don't think it will be noticably faster than an good fs-side
implementation.


Of course there may be a lot of good reasons to put images into the
database, ie. if some clients directly work on db connections and
all work (including image upload) should be done over the db link.


cu
--
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service
  phone:     +49 36207 519931         www:       http://www.metux.de/
  fax:       +49 36207 519932         email:     contact@metux.de
---------------------------------------------------------------------
  Realtime Forex/Stock Exchange trading powered by postgresSQL :))
                                            http://www.fxignal.net/
---------------------------------------------------------------------

>> Filesystems with many Filesystem Objects can slow down the Performance
>> at opening and reading Data.

    On my laptop, lighttpd takes upto 15000 hits PER SECOND on static 2-3 Kb
files (tested with apachebench 2).
    Apache is slower, of course : 3-4000 hits per second which is not that
bad.
    Using a dynamic script with images in the database, you should account
for query and transmission overhead, dynamic page overhead... mmm, I'd say
using a fast application server you could maybe get 2-300 images served
per second from the database, and that's very optimistic. And then the
database will crawl, it will be disintegrated by the incoming flow of
useless requests... scalability will be awful...
    Not mentioning that browsers ask the server "has this image changed since
the last time ?" (HEAD request) and then they don't download it if it
doesn't. The server just stat()'s the file. statting a file on any decent
filesystem (ie. XFS Reiser JFS etc.) should take less than 10 microseconds
if the information is in the cache. You'll have to look in the database to
check the date... more queries !

    If you want to control download rights on files, you can still put the
files on the filesystem (which is the right choice IMHO) and use a dynamic
script to serve them. Even better, you could use lighttpd's authorized
file download feature.

    The only case I see putting files in a database as interesting is if you
want them to be part of a transaction. In that case, why not...