Thread: Filesystem vs. Postgres for images

Filesystem vs. Postgres for images

From
Michal Hlavac
Date:
Hello,

I am working on web portal. There are some ads. We have about 200 000
ads. Every ad have own directory called ID, where is 5 subdirectories
with various sizes of 5 images.

Filesystem is too slow. But I don't know, if I store these images into
postgres, performace will grow.

Second question is, what kind of hardware I need for storing in DB. Now
I have Intel(R) Pentium(R) 4 CPU 1.70GHz with 512MB RAM and 120GB HDD.

thanx for advices...

miso

Re: Filesystem vs. Postgres for images

From
"Development - multi.art.studio"
Date:
Hello,

 >Filesystem is too slow. But I don't know, if I store these images into
 >postgres, performace will grow.

but postres also stores its data on the filesystem.
maybe take a better FS like XFS (xfs is very nice and performes good),
imho other filesystems like reiser have some version-problems ;)
so storing images in postres as blob imho is not as fast as XFS, but
maybe you have to do some performance tests.
maybe you can store an index table for your images with path and
filenames - if did not so far, this should speed up your file-search.
also having a lot of ram, at least 1Gb for such a big portal, and scsi
is a good idea, if you want to boost it up, take a dual proz system with
scsi-raid and a lot of ram - and costs a lot of money also :((.
another tip is to activate proxy or httpd-cache and compression or other
caching stuff in php, java etc.

sorry for my broken english.
volker

Michal Hlavac wrote:

> Hello,
>
> I am working on web portal. There are some ads. We have about 200 000
> ads. Every ad have own directory called ID, where is 5 subdirectories
> with various sizes of 5 images.
>
> Filesystem is too slow. But I don't know, if I store these images into
> postgres, performace will grow.
>
> Second question is, what kind of hardware I need for storing in DB. Now
> I have Intel(R) Pentium(R) 4 CPU 1.70GHz with 512MB RAM and 120GB HDD.
>
> thanx for advices...
>
> miso
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>



Re: Filesystem vs. Postgres for images

From
Holger Klawitter
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

> imho other filesystems like reiser have some version-problems ;)

Oh please. Reiser is as unstable as postgres is slow - in other words, both
have to suffer prejudice which used to be true loooong ago. ;-)

In cases of large directories ext2/3 perform extremely bad (as in the original
post) So this guy will be better off with anything but ext2/3. That's why I
switched from ext2 to reiser ~2 years ago (without any problems since).

Mit freundlichem Gruß / With kind regards
    Holger Klawitter
- --
lists <at> klawitter <dot> de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAe8dP1Xdt0HKSwgYRAkLqAJ4vmqkDGkFYDL67aPMAK6qGAavEQgCfekvV
JCht52XAoXE8DrbXX24B8gc=
=XjOz
-----END PGP SIGNATURE-----


Re: Filesystem vs. Postgres for images

From
Cott Lang
Date:
On Tue, 2004-04-13 at 01:44, Michal Hlavac wrote:
> Hello,
>
> I am working on web portal. There are some ads. We have about 200 000
> ads. Every ad have own directory called ID, where is 5 subdirectories
> with various sizes of 5 images.
>
> Filesystem is too slow. But I don't know, if I store these images into
> postgres, performace will grow.


Consider breaking your directories up, i.e.:

/ads/(ID % 1000)/ID

I use that for a system with several million images, works great. I
really don't think putting them in the database will do anything
positive for you. :)






Re: Filesystem vs. Postgres for images

From
Jeremiah Jahn
Date:
There has got to be some sort of standard way to do this. We have the
same problem where I work. Terabytes of images, but the question is
still sort of around "BLOBs or Files?" Our final decision was to use the
file system. We found that you didn't really gain anything by storing
the images in the DB, other than having one place to get the data from.
The file system approach is much easier to backup, because each image
can be archived separately as well as browsed by 3rd party tools.

-jj-


On Tue, 2004-04-13 at 07:40, Cott Lang wrote:
> On Tue, 2004-04-13 at 01:44, Michal Hlavac wrote:
> > Hello,
> >
> > I am working on web portal. There are some ads. We have about 200 000
> > ads. Every ad have own directory called ID, where is 5 subdirectories
> > with various sizes of 5 images.
> >
> > Filesystem is too slow. But I don't know, if I store these images into
> > postgres, performace will grow.
>
>
> Consider breaking your directories up, i.e.:
>
> /ads/(ID % 1000)/ID
>
> I use that for a system with several million images, works great. I
> really don't think putting them in the database will do anything
> positive for you. :)
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
--
Jeremiah Jahn <jeremiah@cs.earlham.edu>


Re: Filesystem vs. Postgres for images

From
"Joshua D. Drake"
Date:
Hello,

No standard way that I know of :). We tend to use BLOBS because we can
have associated tables
with metadata about the images that can be searched etc.... Of course
you could that with the filesystem
as well but we find blobs easier.

I will say we tend to use BLOBS or Bytea.

J


Jeremiah Jahn wrote:

>There has got to be some sort of standard way to do this. We have the
>same problem where I work. Terabytes of images, but the question is
>still sort of around "BLOBs or Files?" Our final decision was to use the
>file system. We found that you didn't really gain anything by storing
>the images in the DB, other than having one place to get the data from.
>The file system approach is much easier to backup, because each image
>can be archived separately as well as browsed by 3rd party tools.
>
>-jj-
>
>
>On Tue, 2004-04-13 at 07:40, Cott Lang wrote:
>
>
>>On Tue, 2004-04-13 at 01:44, Michal Hlavac wrote:
>>
>>
>>>Hello,
>>>
>>>I am working on web portal. There are some ads. We have about 200 000
>>>ads. Every ad have own directory called ID, where is 5 subdirectories
>>>with various sizes of 5 images.
>>>
>>>Filesystem is too slow. But I don't know, if I store these images into
>>>postgres, performace will grow.
>>>
>>>
>>Consider breaking your directories up, i.e.:
>>
>>/ads/(ID % 1000)/ID
>>
>>I use that for a system with several million images, works great. I
>>really don't think putting them in the database will do anything
>>positive for you. :)
>>
>>
>>
>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 3: if posting/reading through Usenet, please send an appropriate
>>      subscribe-nomail command to majordomo@postgresql.org so that your
>>      message can get through to the mailing list cleanly
>>
>>


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL


Re: Filesystem vs. Postgres for images

From
Jeremiah Jahn
Date:
I tried the bytea types, but the parsing done by the system on insert
etc. was so bad that it made it usable for me. Our solution is to keep
all of the metadata in the db plus an id and then a web service that
gets the image from the FS.

On Tue, 2004-04-13 at 09:05, Joshua D. Drake wrote:
> Hello,
>
> No standard way that I know of :). We tend to use BLOBS because we can
> have associated tables
> with metadata about the images that can be searched etc.... Of course
> you could that with the filesystem
> as well but we find blobs easier.
>
> I will say we tend to use BLOBS or Bytea.
>
> J
>
>
> Jeremiah Jahn wrote:
>
> >There has got to be some sort of standard way to do this. We have the
> >same problem where I work. Terabytes of images, but the question is
> >still sort of around "BLOBs or Files?" Our final decision was to use the
> >file system. We found that you didn't really gain anything by storing
> >the images in the DB, other than having one place to get the data from.
> >The file system approach is much easier to backup, because each image
> >can be archived separately as well as browsed by 3rd party tools.
> >
> >-jj-
> >
> >
> >On Tue, 2004-04-13 at 07:40, Cott Lang wrote:
> >
> >
> >>On Tue, 2004-04-13 at 01:44, Michal Hlavac wrote:
> >>
> >>
> >>>Hello,
> >>>
> >>>I am working on web portal. There are some ads. We have about 200 000
> >>>ads. Every ad have own directory called ID, where is 5 subdirectories
> >>>with various sizes of 5 images.
> >>>
> >>>Filesystem is too slow. But I don't know, if I store these images into
> >>>postgres, performace will grow.
> >>>
> >>>
> >>Consider breaking your directories up, i.e.:
> >>
> >>/ads/(ID % 1000)/ID
> >>
> >>I use that for a system with several million images, works great. I
> >>really don't think putting them in the database will do anything
> >>positive for you. :)
> >>
> >>
> >>
> >>
> >>
> >>
> >>---------------------------(end of broadcast)---------------------------
> >>TIP 3: if posting/reading through Usenet, please send an appropriate
> >>      subscribe-nomail command to majordomo@postgresql.org so that your
> >>      message can get through to the mailing list cleanly
> >>
> >>
--
Jeremiah Jahn <jeremiah@cs.earlham.edu>


Re: Filesystem vs. Postgres for images

From
Christopher Petrilli
Date:
On Apr 13, 2004, at 9:40 AM, Jeremiah Jahn wrote:

> There has got to be some sort of standard way to do this. We have the
> same problem where I work. Terabytes of images, but the question is
> still sort of around "BLOBs or Files?" Our final decision was to use
> the
> file system. We found that you didn't really gain anything by storing
> the images in the DB, other than having one place to get the data from.
> The file system approach is much easier to backup, because each image
> can be archived separately as well as browsed by 3rd party tools.

This is a pretty "classic problem," of performance modeling.  While it
wasn't images, I worked on a system that had several million small
files (5-100K) that needed to be stored.  The performance bottleneck
was a couple of things, in storing them in the FS (the bottleneck is
similar in PostgreSQL):

1. Directory name lookups do not scale well, so keep the number of
files in a directory to a manageable number (100-500).
2. Retrieval time is limited not by disk bandwidth, but by I/O seek
performance. More spindles = more concurrent I/O in flight. Also, this
is where SCSI takes a massive lead with tag-command-queuing.

In our case, we ended up using a three-tier directory structure, so
that we could manage the number of files per directory, and then
because load was relatively even across the top 20 "directories", we
split them onto 5 spindle-pairs (i.e. RAID-1).  This is a place where
RAID-5 is your enemy. RAID-1, when implemented with read-balancing, is
a substantial performance increase.

Hope this helps.  Some of these things apply to PostgreSQL, except
until there's better manageability of TABLESPACE, and the ability to
split tables across multiple spaces, it's going to be hard to hit those
numbers.  This is a place where the "big databases" are better.  But
then, that's the top 5% of installs. Tradeoffs.

Chris
--
| Christopher Petrilli
| petrilli (at) amber.org


Re: Filesystem vs. Postgres for images

From
Anton Nikiforov
Date:
Michal Hlavac пишет:

> Hello,
>
> I am working on web portal. There are some ads. We have about 200 000
> ads. Every ad have own directory called ID, where is 5 subdirectories
> with various sizes of 5 images.
>
> Filesystem is too slow. But I don't know, if I store these images into
> postgres, performace will grow.
>
> Second question is, what kind of hardware I need for storing in DB. Now
> I have Intel(R) Pentium(R) 4 CPU 1.70GHz with 512MB RAM and 120GB HDD.
>
> thanx for advices...
>
> miso

Hello Miso.
I used to have the  same problem with web hosting and
storing/sorting/retreiving images for banner exchange and for user
"sites/pages".
My tests was done on FreeBSD 4.5 with postgreSQL 7.3.2 (or soething) and
with Mysql (i do not remember it's version.
And we found out that only storing of filenames in the database and
getting the actual binary data from filesystem giving some performance.

--
Best regads,
Anton Nikiforov


Attachment

Re: Filesystem vs. Postgres for images

From
"Mattias Kregert"
Date:
> Hi,
>
> > imho other filesystems like reiser have some version-problems ;)
>
> Oh please. Reiser is as unstable as postgres is slow - in other words, both
> have to suffer prejudice which used to be true loooong ago. ;-)
>
> In cases of large directories ext2/3 perform extremely bad (as in the original
> post) So this guy will be better off with anything but ext2/3. That's why I
> switched from ext2 to reiser ~2 years ago (without any problems since).
>
> Mit freundlichem Gruß / With kind regards
> Holger Klawitter
> - --

I use reiserfs, too. Large directories (hundreds of thousand files) does not slow down file retrival, and i never had
anyproblems with stability. 

/Mattias


Re: Filesystem vs. Postgres for images

From
"scott.marlowe"
Date:
On Tue, 13 Apr 2004, Christopher Petrilli wrote:

> 2. Retrieval time is limited not by disk bandwidth, but by I/O seek
> performance. More spindles = more concurrent I/O in flight. Also, this
> is where SCSI takes a massive lead with tag-command-queuing.
>
> In our case, we ended up using a three-tier directory structure, so
> that we could manage the number of files per directory, and then
> because load was relatively even across the top 20 "directories", we
> split them onto 5 spindle-pairs (i.e. RAID-1).  This is a place where
> RAID-5 is your enemy. RAID-1, when implemented with read-balancing, is
> a substantial performance increase.

Please explain why RAID 5 is so bad here.  I would think that on a not
very heavily updated fs, RAID-5 would be the functional equivalent of a
RAID 0 array with one fewer disks, wouldn't it?  Or is RAID 0 also a bad
idea (other than the unreliability of it) because it only puts the data on
one spindle, unlike RAID-1 which puts it on many.

In that case >2 drive RAID 1 setups might be a huge win.  The linux kernel
certainly supports them, and I think some RAID cards do too.

Just wondering.


Re: Filesystem vs. Postgres for images

From
Phil Endecott
Date:
> I am working on web portal. There are some ads. We have about 200 000
> ads. Every ad have own directory called ID, where is 5 subdirectories
> with various sizes of 5 images.
>
> Filesystem is too slow. But I don't know, if I store these images into
> postgres, performace will grow.

Certainly the problem you are experiencing is because you have 200,000
directories, and directory lookups are not scaling well.

I had a look at this a few weeks ago for an email storage application.
Using a filesystem with better directory lookup performance (? xfs,
resiserfs, jfs)is one obvious solution, as is storing the data in the
database.  If you want to use files in an ext2/3 filesystem, you need to
break up the directories into a hierarchy.

I did some web research trying to find numbers for how many entries you
can get away with in an ext2/3 filesystem before the lookup time starts
to bite.  I didn't find very much useful data.  The best answer I got
was "between 100 and 1000".  Since my identifiers are decimail numbers,
I had a choice of breaking them up into groups of two or three (i.e.
12/34/56/78 or 012/345/678).  I went for groups of two and it works
well.  Certainly this is not the limiting factor in the system as a
whole.

Looking back, I wonder if I should have gone for groups of three.  Is
the lookup time a function of the number of entries in the directory, or
the size (in bytes) of the directory?  Since my directory names are
short in this scheme, I get more directory entries per disk block.

One other thing to mention: have you turned off access time (atime)
logging for the filesystem? (man mount)

--Phil.






Re: Filesystem vs. Postgres for images

From
Christopher Petrilli
Date:
On Apr 13, 2004, at 11:27 AM, scott.marlowe wrote:

> On Tue, 13 Apr 2004, Christopher Petrilli wrote:
>
>> 2. Retrieval time is limited not by disk bandwidth, but by I/O seek
>> performance. More spindles = more concurrent I/O in flight. Also, this
>> is where SCSI takes a massive lead with tag-command-queuing.
>>
>> In our case, we ended up using a three-tier directory structure, so
>> that we could manage the number of files per directory, and then
>> because load was relatively even across the top 20 "directories", we
>> split them onto 5 spindle-pairs (i.e. RAID-1).  This is a place where
>> RAID-5 is your enemy. RAID-1, when implemented with read-balancing, is
>> a substantial performance increase.
>
> Please explain why RAID 5 is so bad here.  I would think that on a not
> very heavily updated fs, RAID-5 would be the functional equivalent of a
> RAID 0 array with one fewer disks, wouldn't it?  Or is RAID 0 also a
> bad
> idea (other than the unreliability of it) because it only puts the
> data on
> one spindle, unlike RAID-1 which puts it on many.
>
> In that case >2 drive RAID 1 setups might be a huge win.  The linux
> kernel
> certainly supports them, and I think some RAID cards do too.

The issue comes down to read and write strategies.  If your files are
bigger than the stripe size and begin to involve multiple drives, then
the rotational latency of each drive can come into play.  This is often
hidden under caching during those wonderful comparison reviews, but
when you're talking about near random distributed access of more
information than could fit in the cache, then you have to face the
rotational issues of drives.  Since the spindles are not locked
together, they drift apart in location, and you often end up with
worst-case latency in the drive subsystem.  Mirroring doesn't face
this, especially when you can distribute the READS across all the
drives.

For example, if you ran triplex RAID-0, meaning 3 copies of the data,
which is often done in large environments so that you can take one copy
offline for a backup, while maintaining 2 copies online, then you can
basically handle 3 reads for the cost of 1, increasing the number of
read ops you can handle.  This doesn't work with RAID-0, or RAID-5.

Chris
--
| Christopher Petrilli
| petrilli (at) amber.org


Re: Filesystem vs. Postgres for images

From
Anton Nikiforov
Date:
scott.marlowe пишет:

>On Tue, 13 Apr 2004, Christopher Petrilli wrote:
>
>
>
>>2. Retrieval time is limited not by disk bandwidth, but by I/O seek
>>performance. More spindles = more concurrent I/O in flight. Also, this
>>is where SCSI takes a massive lead with tag-command-queuing.
>>
>>In our case, we ended up using a three-tier directory structure, so
>>that we could manage the number of files per directory, and then
>>because load was relatively even across the top 20 "directories", we
>>split them onto 5 spindle-pairs (i.e. RAID-1).  This is a place where
>>RAID-5 is your enemy. RAID-1, when implemented with read-balancing, is
>>a substantial performance increase.
>>
>>
>
>Please explain why RAID 5 is so bad here.  I would think that on a not
>very heavily updated fs, RAID-5 would be the functional equivalent of a
>RAID 0 array with one fewer disks, wouldn't it?  Or is RAID 0 also a bad
>idea (other than the unreliability of it) because it only puts the data on
>one spindle, unlike RAID-1 which puts it on many.
>
>In that case >2 drive RAID 1 setups might be a huge win.  The linux kernel
>certainly supports them, and I think some RAID cards do too.
>
>Just wondering.
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 7: don't forget to increase your free space map settings
>
>
Hello All.
I'll try to explain the raid scheme
First of all the head movement takes 99% of all data retrival time in
case you would like to get a small block of data (actualy one FS block).
You need to move HDD's heads something like 4-20ms when reading of a
block of data (actualy one cilinder will hit hte disk's cache) takes
1000 times less time.
Now to the RAIDs:
It is true that the only RAID that allow increasing of record speed is
RAID0. That is why all database developers recomend RAID0+1 or RAID 10
(They are different, but it is not the topic here). So if you need
record speed - you know the way.
At most all RAIDs give you a read performance goal. The matter is when
RAID5 is slower than RAID1 (whatever else) is the matter of disk
subsystem planning and configuration.
If you have the FS block size is 4K, then all disk IO from the OS point
of view is reading 4k blocks.
While in the RAID you could have a block size configured to
4,8,16,32,64,128k.
Lets imagine three situations:
1. Raid bock size is 4k and we have 3 disks in RAID5
The controller will read data by blocks, so it could get 2 blocks at a
time (3rd disk stores redundancy information). The situation is exactly
like when using RAID1 with 2 disks.
2. Raid block is 128k, and we have 3 disks in the RAID5
The controller will read the whole block even if you have asked to read
only 4k (and you did, because of FS request size). And as you could see
124k will hit cache but will be useless.
But if you have files that of comparable size with the block or much
more in size than a block you will increase reading performance
drammaticaly (like for video files which were put on the disks
contineously and are being read block by block).

So, if you have some time try to "play" with your raid 5 and you will
see the differences when you change block size of you FS or RAID's
stripe size. But you will see that single disk writes data always faster
than RAID 5.

If you are talking about software raid (supported by the kernel) - it
will be always slower than hardware one (you will loose at least 30% of
your system bus and CPU power for calculations and internal RAID5 data
computing). With RAID 0/1 it is not so drammatical but remember that you
have RAID1 support in the kernel not for the productivity improvement of
IO but for redundancy. And software raids does not decrease your system
downtime.

--
Best regads,
Anton Nikiforov



Attachment

Re: Filesystem vs. Postgres for images

From
Alex
Date:
Hi,
is the file system approach really easier and faster? What if you need
to protect the image data e.g. you dont want users just to just dowload
the pictures directly from your website?

-a

Jeremiah Jahn wrote:

>There has got to be some sort of standard way to do this. We have the
>same problem where I work. Terabytes of images, but the question is
>still sort of around "BLOBs or Files?" Our final decision was to use the
>file system. We found that you didn't really gain anything by storing
>the images in the DB, other than having one place to get the data from.
>The file system approach is much easier to backup, because each image
>can be archived separately as well as browsed by 3rd party tools.
>
>-jj-
>
>
>On Tue, 2004-04-13 at 07:40, Cott Lang wrote:
>
>
>>On Tue, 2004-04-13 at 01:44, Michal Hlavac wrote:
>>
>>
>>>Hello,
>>>
>>>I am working on web portal. There are some ads. We have about 200 000
>>>ads. Every ad have own directory called ID, where is 5 subdirectories
>>>with various sizes of 5 images.
>>>
>>>Filesystem is too slow. But I don't know, if I store these images into
>>>postgres, performace will grow.
>>>
>>>
>>Consider breaking your directories up, i.e.:
>>
>>/ads/(ID % 1000)/ID
>>
>>I use that for a system with several million images, works great. I
>>really don't think putting them in the database will do anything
>>positive for you. :)
>>
>>
>>
>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 3: if posting/reading through Usenet, please send an appropriate
>>      subscribe-nomail command to majordomo@postgresql.org so that your
>>      message can get through to the mailing list cleanly
>>
>>



Re: Filesystem vs. Postgres for images

From
Jeremiah Jahn
Date:
Your code is retrieving the file from the file system. It doesn't have
to be accessible from the web server at all. Our current design uses a
JDBC connection to the database for the metadata (digital
signature,path,name,file type, etc..) and a SOAP call to the same server
(but doesn't have to be) to retrieve/store the image data.
-jj-


On Wed, 2004-04-14 at 08:15, Alex wrote:
> Hi,
> is the file system approach really easier and faster? What if you need
> to protect the image data e.g. you dont want users just to just dowload
> the pictures directly from your website?
>
> -a
>
> Jeremiah Jahn wrote:
>
> >There has got to be some sort of standard way to do this. We have the
> >same problem where I work. Terabytes of images, but the question is
> >still sort of around "BLOBs or Files?" Our final decision was to use the
> >file system. We found that you didn't really gain anything by storing
> >the images in the DB, other than having one place to get the data from.
> >The file system approach is much easier to backup, because each image
> >can be archived separately as well as browsed by 3rd party tools.
> >
> >-jj-
> >
> >
> >On Tue, 2004-04-13 at 07:40, Cott Lang wrote:
> >
> >
> >>On Tue, 2004-04-13 at 01:44, Michal Hlavac wrote:
> >>
> >>
> >>>Hello,
> >>>
> >>>I am working on web portal. There are some ads. We have about 200 000
> >>>ads. Every ad have own directory called ID, where is 5 subdirectories
> >>>with various sizes of 5 images.
> >>>
> >>>Filesystem is too slow. But I don't know, if I store these images into
> >>>postgres, performace will grow.
> >>>
> >>>
> >>Consider breaking your directories up, i.e.:
> >>
> >>/ads/(ID % 1000)/ID
> >>
> >>I use that for a system with several million images, works great. I
> >>really don't think putting them in the database will do anything
> >>positive for you. :)
> >>
> >>
> >>
> >>
> >>
> >>
> >>---------------------------(end of broadcast)---------------------------
> >>TIP 3: if posting/reading through Usenet, please send an appropriate
> >>      subscribe-nomail command to majordomo@postgresql.org so that your
> >>      message can get through to the mailing list cleanly
> >>
> >>
--
Jeremiah Jahn <jeremiah@cs.earlham.edu>


Re: Filesystem vs. Postgres for images

From
Steve Atkins
Date:
On Wed, Apr 14, 2004 at 10:15:51PM +0900, Alex wrote:
> Hi,
> is the file system approach really easier and faster? What if you need
> to protect the image data e.g. you dont want users just to just dowload
> the pictures directly from your website?

It can be much faster, if implemented correctly, to put the large
files directly on the filesystem. It makes it a little harder to cluster,
but it can significantly reduce DB overhead.

There's no issue with the users downloading images directly, as you
normally wouldn't mount them directly into the URL namespace. Instead
the URL would point to a script that would lookup the image in the
database, and check permissions. If the user is allowed to load the
image the script will close it's connection to the database, and start
shoveling bytes from the filesystem to the http connection. Most
decent web application platforms have some amount of support for this
sort of thing built in.

That has a number of other advantages too - it can take a long time
for a user to download a large file, and you really don't want the
thread handling them to tie up a database connection for all that
time.  If you're on a platform that supports nice things like
sendfile(2) you can even have the kernel do almost all the work.

Cheers,
  Steve

Re: Filesystem vs. Postgres for images

From
"Development - multi.art.studio"
Date:
hello all,

if somebody is interested in a script using blobs on pgsql and php,
leave me a mail at kalle@erdtrabant.de
this little skript can upload files to filesystem and directly into db,
release files from db to filesystem, and store files from filesystem to
database.
its tested with php 4.3 and postgres 7.1

greetings,
volker




Re: Filesystem vs. Postgres for images

From
"Development - multi.art.studio"
Date:
hi all,
maybe somewhat off-topic , but here it is....

http://www.erdtrabant.de/index.php?i=500200104

a little php-script to demonstrate how to store files as blobs into
postgres (tested with v7.1) as a base for testing etc., not very
beautiful script-style, but useable
feel free to download and change what and use where ever you want

thanks for looking
volker

Development - multi.art.studio wrote:

> hello all,
>
> if somebody is interested in a script using blobs on pgsql and php,
> leave me a mail at kalle@erdtrabant.de
> this little skript can upload files to filesystem and directly into
> db, release files from db to filesystem, and store files from
> filesystem to database.
> its tested with php 4.3 and postgres 7.1
>
> greetings,
> volker
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>



Re: Filesystem vs. Postgres for images

From
"Holger Hoffstaette"
Date:
On Tue, 13 Apr 2004 12:56:09 +0200, Holger Klawitter wrote:

> Oh please. Reiser is as unstable as postgres is slow - in other words,
> both have to suffer prejudice which used to be true loooong ago. ;-)

True.

> In cases of large directories ext2/3 perform extremely bad (as in the
> original post) So this guy will be better off with anything but ext2/3.

"Oh please."

- umount /dev/xxx
- tune2fs -O dir_index dev/xxx
- fsck -f -D /dev/hdxx
- mount /dev/xxx

> That's why I switched from ext2 to reiser ~2 years ago (without any
> problems since).

Time marches on. :-)

Holger