Thread: Storing images in database for web applications
Hi all,
I'm wondering if anyone has experience of storing and getting images to and from a database? We currently have the problem of images being uploaded to a single gateway used by many companies, most of which run several websites. As it stands, once they upload the image, it then has to be fsync'd to the appropriate servers (3-way in some cases) or accessed through our image proxy.
But now we wish to redesign our various image upload systems into one system which will also make images available across various sites/companies and therefore servers. So one solution is to store images in a database. What I'm wondering is if this is an appropriate solution? What are the downsides? Could we expect severe performance hits? To help mitigate the issue of many sites continuously querying a database for images, we'd implement a caching system, so images would only be queried from the database probably around once an hour.
The benefits I can see is that images are managed in one place and accessibly easily by all servers. The problem is putting everything in one place (all eggs in one basket), so if the server goes down, all sites lose their images once their cache has expired... unless we implemented a system that falls back to cache if connection fails, even if cache has expired.
Any suggestion? Alternatives?
Thanks
Thom
I'm wondering if anyone has experience of storing and getting images to and from a database? We currently have the problem of images being uploaded to a single gateway used by many companies, most of which run several websites. As it stands, once they upload the image, it then has to be fsync'd to the appropriate servers (3-way in some cases) or accessed through our image proxy.
But now we wish to redesign our various image upload systems into one system which will also make images available across various sites/companies and therefore servers. So one solution is to store images in a database. What I'm wondering is if this is an appropriate solution? What are the downsides? Could we expect severe performance hits? To help mitigate the issue of many sites continuously querying a database for images, we'd implement a caching system, so images would only be queried from the database probably around once an hour.
The benefits I can see is that images are managed in one place and accessibly easily by all servers. The problem is putting everything in one place (all eggs in one basket), so if the server goes down, all sites lose their images once their cache has expired... unless we implemented a system that falls back to cache if connection fails, even if cache has expired.
Any suggestion? Alternatives?
Thanks
Thom
On 27/11/2009 7:04 PM, Thom Brown wrote: > But now we wish to redesign our various image upload systems into one > system which will also make images available across various > sites/companies and therefore servers. So one solution is to store > images in a database. What I'm wondering is if this is an appropriate > solution? What are the downsides? Could we expect severe performance > hits? It depends a bit on your needs. If you need facilities provided by the database, such as transactional updates of images and associated metadata, then it's probably a good choice. Otherwise, you'll be paying the costs associated with storing blobs in a database without gaining much. There *are* costs to using a database over a network file system - mostly increased management. It's harder to do incremental backups and harder to work with those incremental backups (see PITR and continuous archiving) than it is with a regular file system. Storage requirements may be slightly increased. Deletion of images requires additional cleanup work (VACUUM) to free the space. Etc. With a database, you can't use the sendfile(...) system call or the like to transmit the data, so there's more work for the system when transmitting query results. It's also not generally transmitted as raw binary, but hex- or octal- encoded, which can be ugly to work with. You can request it as pure binary instead if you're using libpq directly, though. Why don't you use a HTTP-based image server? It's pretty low overhead, is easily cached by commonly available tools, etc. Failing that, there are plenty of network file systems out there (NFSv4 and CIFS come to mind) that would do the job. > To help mitigate the issue of many sites continuously querying a > database for images, we'd implement a caching system, so images would > only be queried from the database probably around once an hour. Memcached ? -- Craig Ringer
Thom, > I'm wondering if anyone has experience of storing and getting images to and > from a database? Yes. For a customer I have one application running for ~8 years which stores images (and various other BLOBS) within a PostgreSQL database. Started with 7.something, now running on 8.3; allways used BYTEA. It is a intranet and extranet application, so it does not get slashdotted or reddited. Another application is a public website, also storing HTML(fragments) and images within a PostgreSQL database. > The benefits I can see is that images are managed in one place and > accessibly easily by all servers. The problem is putting everything in one > place (all eggs in one basket), so if the server goes down, all sites lose > their images once their cache has expired... unless we implemented a system > that falls back to cache if connection fails, even if cache has expired. Your analyzes is correct. Other benefits are: - no filesystem fuss (rights, sharing, locking, names (windows / unix) - options for security: in my application, editing happens within a strongly firewalled VPN. The webserver in the big, bad internet only accesses the database; so the attack vectors are limited. - transactional save: no problems with "partially saved images", "images locked by whatever" - mime information etc. can be stored alongside the bytes - reporting options drawbacks are: - scaling is more challenging: static content from filesystems can be replicated ( as you do), replicating a database is more difficult - different performance characteristics then filesystem (search usually faster (index), but access to image contents has more layers to the disc) - backups of the database grow huge, and usually the dump of a file in the database is bigger then the file itself. (usually, because the effects of compression are hard to judge before) - dump / restore times grow Best wishes, Harald -- GHUM Harald Massa persuadere et programmare Harald Armin Massa Spielberger Straße 49 70435 Stuttgart 0173/9409607 no fx, no carrier pigeon - %s is too gigantic of an industry to bend to the whims of reality
On Fri, Nov 27, 2009 at 6:04 AM, Thom Brown <thombrown@gmail.com> wrote: > I'm wondering if anyone has experience of storing and getting images to and > from a database? We currently have the problem of images being uploaded to > a single gateway used by many companies, most of which run several > websites. As it stands, once they upload the image, it then has to be > fsync'd to the appropriate servers (3-way in some cases) or accessed through > our image proxy. For our customer image hosting service, we store the master copy of everything in the database, and have an external process that pushes the (scaled) images to the front-end delivery service. Currently this is amazon's cloudfront service. The URLs that our customers can use/give out are all pointing to the cloudfront, which is wicked fast and scalable. We do not directly serve the images from the DB. That would just not scale. What I'd do is put a shim layer in front of the http service that gives out the image and looks for it in a cache. If not found, fetches it from the DB then every subsequent request gets it from the local cache on the http server. You could probably implement this shim as a 'file not found' error handler in apache, so there would be no overhead on a cache hit.
On 2009-11-27, Thom Brown <thombrown@gmail.com> wrote: > --0016e659f44c2bea2504795842a7 > Content-Type: text/plain; charset=ISO-8859-1 > > Hi all, > > I'm wondering if anyone has experience of storing and getting images to and > from a database? We currently have the problem of images being uploaded to > a single gateway used by many companies, most of which run several > websites. As it stands, once they upload the image, it then has to be > fsync'd to the appropriate servers (3-way in some cases) or accessed through > our image proxy. > > But now we wish to redesign our various image upload systems into one system > which will also make images available across various sites/companies and > therefore servers. So one solution is to store images in a database. What > I'm wondering is if this is an appropriate solution? What are the > downsides? Could we expect severe performance hits? To help mitigate the > issue of many sites continuously querying a database for images, we'd > implement a caching system, so images would only be queried from the > database probably around once an hour. > > The benefits I can see is that images are managed in one place and > accessibly easily by all servers. The problem is putting everything in one > place (all eggs in one basket), so if the server goes down, all sites lose > their images once their cache has expired... unless we implemented a system > that falls back to cache if connection fails, even if cache has expired. > > Any suggestion? Alternatives? some sort of broadcasting to share the images as when they are uploaded, perhaps something based on NNTP or email.