Thread: How to store fixed size images?
We have many small size(most fixed size) images, how to store them? There are two options:
1. Store images in folders, managed by os file system, only store path in postgresql
2. Store image as bytea in postgresql
How do you usually store images?
Thanks!
peng
På tirsdag 17. juni 2014 kl. 03:10:57, skrev sunpeng <bluevaley@gmail.com>:
We have many small size(most fixed size) images, how to store them? There are two options:1. Store images in folders, managed by os file system, only store path in postgresql2. Store image as bytea in postgresqlHow do you usually store images?Thanks!
The images we store might be quite large so we use OID (Large Objects) and use the java.sql.Blob API of the pgjdbc-ng driver to avoid having to deal with large byte-arrays.
--
Andreas Jospeh Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
On Mon, Jun 16, 2014 at 6:10 PM, sunpeng <bluevaley@gmail.com> wrote: > We have many small size(most fixed size) images, how to store them? There > are two options: > 1. Store images in folders, managed by os file system, only store path in > postgresql > 2. Store image as bytea in postgresql > How do you usually store images? I use method 1, because the library/modules I made use of only implemented that method. I'd prefer to use method 2, but not enough to write the code for doing it when there was existing code. The problem with 1 is now you have two streams of data to back up, and the data itself is no longer transactional with its metadata. A potential problem with 2 is that it will run into problems if any of the data is more than a small fraction of RAM. So the images must be "always small". If they are just "usually small", that isn't good enough. Another problem with bytea is the encoding issues. Good up-to-date drivers will handle that for you (mostly) transparently, but there are lots of drivers that are not good, or not up-to-date. Cheers, Jeff
Thank you, Jeff!
peng
On Wed, Jun 18, 2014 at 12:15 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
I use method 1, because the library/modules I made use of onlyOn Mon, Jun 16, 2014 at 6:10 PM, sunpeng <bluevaley@gmail.com> wrote:
> We have many small size(most fixed size) images, how to store them? There
> are two options:
> 1. Store images in folders, managed by os file system, only store path in
> postgresql
> 2. Store image as bytea in postgresql
> How do you usually store images?
implemented that method. I'd prefer to use method 2, but not enough
to write the code for doing it when there was existing code. The
problem with 1 is now you have two streams of data to back up, and the
data itself is no longer transactional with its metadata. A potential
problem with 2 is that it will run into problems if any of the data is
more than a small fraction of RAM. So the images must be "always
small". If they are just "usually small", that isn't good enough.
Another problem with bytea is the encoding issues. Good up-to-date
drivers will handle that for you (mostly) transparently, but there are
lots of drivers that are not good, or not up-to-date.
Cheers,
Jeff
On 06/16/2014 08:10 PM, sunpeng wrote: > We have many small size(most fixed size) images, how to store them? There are two options: > 1. Store images in folders, managed by os file system, only store path in postgresql > 2. Store image as bytea in postgresql > How do you usually store images? > Thanks! > > peng I think it depends on how you are going to use them. I, for example, have lots of images that are served on a web page,after benchmarks I found it was faster to store them on filesystem and let apache serve them directly. -Andy
On Thursday, June 19, 2014 10:21:56 AM Andy Colson wrote: > On 06/16/2014 08:10 PM, sunpeng wrote: > > We have many small size(most fixed size) images, how to store them? There > > are two options: 1. Store images in folders, managed by os file system, > > only store path in postgresql 2. Store image as bytea in postgresql > > How do you usually store images? > > Thanks! > > > > peng > > I think it depends on how you are going to use them. I, for example, have > lots of images that are served on a web page, after benchmarks I found it > was faster to store them on filesystem and let apache serve them directly. > > -Andy That will always be the (much) faster option. There's basically no CPU overhead, the web server can tell the kernel to copy the image from the filesystem cache directly into a network buffer, and off it goes. Even apache can serve line speed like that. It's a lot easier to manage the images if they're in the database, though, especially if you run off multiple web servers. If CPU overhead is actually an issue, you can eliminate most of the speed hit by sticking a caching proxy server like Varnish in front of your site, or by offloading the image serving to a pass-through CDN. Just make sure images get a new URL path if they change content.
On Jun 19, 2014, at 11:21 AM, Andy Colson wrote: > I think it depends on how you are going to use them. I, for example, have lots of images that are served on a web page,after benchmarks I found it was faster to store them on filesystem and let apache serve them directly. I rarely store images like that locally now; I just toss them onto Amazon S3. When I did have to store lots of images locally , I found this to be the best method: 1. The Postgres record for the image is given a unique and random hash as a hexdigest 2. The Image is saved onto a filesystem into a directory mapped by the hexdigest for example, there might be something like this: Postgres: id | filename | hash 001 | image.jpg | abcdef123 Filesystem abc/def/123/abcdef123-image.jpg nginx/apache rewrite rule : abcdef123-image.jpg -> abc/def/123/abcdef123-image.jpg the reason for this has to do with the performance of various filesystems and issues with the distribution of digits in asequence. it ties into Benford's Law ( http://en.wikipedia.org/wiki/Benford's_law ) as well. a handful of filesystems exhibit decreased performance as the number of items in a directory increases. a few years ago,1k-4k items was a safe max -- but at 10x that some filesystems really slowed. i think most modern filesystems are stillquick at the 5-10k range. a hash has more characters and a more normal distribution than a series of numbers or natural language filenames. and if you group a hexdigest into triplets , you get 4096 max files/folders in a directory which is a decent sweet spot 16 * 16 * 16 = 4096 i haven't had to deal with this sort of stuff in almost 10 years now. but archiving content like this back then was a considerableimprovement to filesystem performance and web serving.