Thread: Best Strategy for Large Number of Images
I don't currently use PostgreSQL, but I plan to migrate and I have a question about the best way/strategy for storing images. I have about 2 million images with a growth trend of around 1 million images per year. I plan to store in bytea format in an isolated table. Is this recommended? Is there another better way?
How big are the images on average?Hi,
I don't currently use PostgreSQL, but I plan to migrate and I have a question about the best way/strategy for storing images. I have about 2 million images with a growth trend of around 1 million images per year. I plan to store in bytea format in an isolated table. Is this recommended? Is there another better way?
On 12/15/21 10:45, Estevan Rech wrote:How big are the images on average?Hi,
I don't currently use PostgreSQL, but I plan to migrate and I have a question about the best way/strategy for storing images. I have about 2 million images with a growth trend of around 1 million images per year. I plan to store in bytea format in an isolated table. Is this recommended? Is there another better way?
350kbEm qua., 15 de dez. de 2021 às 14:50, Rob Sargent <robjsargent@gmail.com> escreveu:On 12/15/21 10:45, Estevan Rech wrote:How big are the images on average?Hi,
I don't currently use PostgreSQL, but I plan to migrate and I have a question about the best way/strategy for storing images. I have about 2 million images with a growth trend of around 1 million images per year. I plan to store in bytea format in an isolated table. Is this recommended? Is there another better way?
David J.
On 12/15/21 10:17 AM, Estevan Rech wrote: > I think about using it inside the database to facilitate the select and > generation of reports... > As long as the meta data about the images is stored in the database that is still possible. -- Adrian Klaver adrian.klaver@aklaver.com
On 12/15/21 10:48 AM, Estevan Rech wrote: > But PostgreSQL can handle it? Does it have good search performance? and > is memory usage feasible? Search on what? -- Adrian Klaver adrian.klaver@aklaver.com
But PostgreSQL can handle it? Does it have good search performance? and is memory usage feasible?
Hi,
I don't currently use PostgreSQL, but I plan to migrate and I have a question about the best way/strategy for storing images. I have about 2 million images with a growth trend of around 1 million images per year. I plan to store in bytea format in an isolated table. Is this recommended? Is there another better way?
We have an application which store images in the database using bytea fields. The next version of the software will store the images externally, with the filename in the database, because filesystems just inherently do a better job at storing lots of files than does a database.
Angular momentum makes the world go 'round.
My previous employer dealt with radiology images. These were in the file system with an image server as suggested up thread.The possibilities are known, but does anyone have experience with this scenario?
On 12/15/21 11:22, Estevan Rech wrote: > The possibilities are known, but does anyone have experience with this > scenario? It would help if you elaborated on the scenerio is? There is no questioning that you can store images in the database. The issues are what you want to do with them once they are there, the hardware(real or virtual) you have available, the user load on the database, etc? Providing at least an outline that answers the above would go a long way to getting some reasonable answers based on experience. -- Adrian Klaver adrian.klaver@aklaver.com
These images are saved and then used to generate a report.
Each item averages 10 photos and I have about 2 million photos currently, with an average growth of 1 million photos over the next year.
I think about using it within the database because of the ease of selecting records with the content of the images (when necessary).
I think my biggest concern is to continue to use this strategy for the next 2 years and encounter a PostgreSQL limitation or some failure and have to migrate database again or have significant loss of images. And on the contrary too, if I use it on disk and in 2 years I find some failure or file limit in the operating system (currently it's windows server 2016)...
Related to this is the backup service that seems to me that in a disaster, I can restore the database relatively quickly if it's in the database. On the disk, I believe that the restoration is much slower and I don't think it's reliable that all the images are copied/restored.
On 16 Dec 2021, at 7:12, Estevan Rech wrote: > I have an application that takes pictures of items and uses them as > evidence in a report. > > These images are saved and then used to generate a report. > Once you start talking reports and evidence I think audit trail and verification a.k.a. “chain of evidence”. With the images in the database they can be part of your information ecosystem, get a datestamp along with checksum and allthis included in the audit trail. All this can be done with the images in the file system and the database holds the pointers only… it just may not be as easyto convince the oppositions lawyers that it is tamper proof, except the best you can really get is tamper evident. 😉 > Each item averages 10 photos and I have about 2 million photos currently, > with an average growth of 1 million photos over the next year. > > I think about using it within the database because of the ease of selecting > records with the content of the images (when necessary). > That will be the same however you go. The selection will be on the context data you enter with the images. There has beenno mention of searching within the image itself, and this is not something PG can do “out of the box” anyway. > I think my biggest concern is to continue to use this strategy for the next > 2 years and encounter a PostgreSQL limitation or some failure and have to > migrate database again or have significant loss of images. And on the > contrary too, if I use it on disk and in 2 years I find some failure or > file limit in the operating system (currently it's windows server 2016)... > The total image data will end up somewhere less than 4Tb if your estimates continue to hold. That’s not going to stress PG.It’s not such a big deal on a modern home computer either, but I can’t make any informed comment on Windows anything.Realistically, once the data is in the database the whole can be moved to a more capable machine and/or OS if problemsstart to appear. PG will work wherever you deploy it. > Related to this is the backup service that seems to me that in a disaster, > I can restore the database relatively quickly if it's in the database. On > the disk, I believe that the restoration is much slower and I don't think > it's reliable that all the images are copied/restored. > If you have a reliable tested backup system for the database and don’t want to add to the workload then get the images intothe database. If there are reasons why you end up with the images outside the database then, of course, the backup processhas to be able to keep everything together and guarantee a full restore. Doable as well, but it does add some complexity. If I am reading this correctly the use case is one of receiving data in the form of photos and field reports, entering thereport items associated with the photos into the information system, generating a report (with embedded photos), and keepingrecords so disputes can be handled. This is not a high volume access to the images so there is no real need for optimumfilesystem speed to serve the images… keeping them in the database as bytea is perfectly workable and will work fordata volumes well beyond the current estimates. Gavan Schneider —— Gavan Schneider, Sodwalls, NSW, Australia Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat,plausible, and wrong. — H. L. Mencken, 1920
Adrian,I have an application that takes pictures of items and uses them as evidence in a report.
These images are saved and then used to generate a report.
Each item averages 10 photos and I have about 2 million photos currently, with an average growth of 1 million photos over the next year.
I think about using it within the database because of the ease of selecting records with the content of the images (when necessary).
I think my biggest concern is to continue to use this strategy for the next 2 years and encounter a PostgreSQL limitation or some failure and have to migrate database again or have significant loss of images. And on the contrary too, if I use it on disk and in 2 years I find some failure or file limit in the operating system (currently it's windows server 2016)...
Related to this is the backup service that seems to me that in a disaster, I can restore the database relatively quickly if it's in the database. On the disk, I believe that the restoration is much slower and I don't think it's reliable that all the images are copied/restored.
On Wed, Dec 15, 2021 at 02:45:15PM -0300, Estevan Rech wrote: > Hi, > > I don't currently use PostgreSQL, but I plan to migrate and I have a question > about the best way/strategy for storing images. I have about 2 million images > with a growth trend of around 1 million images per year. I plan to store in > bytea format in an isolated table. Is this recommended? Is there another better > way? You might want to read this: https://momjian.us/main/blogs/pgblog/2020.html#June_19_2020 -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Wed, Dec 15, 2021 at 02:45:15PM -0300, Estevan Rech wrote:
> Hi,
>
> I don't currently use PostgreSQL, but I plan to migrate and I have a question
> about the best way/strategy for storing images. I have about 2 million images
> with a growth trend of around 1 million images per year. I plan to store in
> bytea format in an isolated table. Is this recommended? Is there another better
> way?
You might want to read this:
https://momjian.us/main/blogs/pgblog/2020.html#June_19_2020
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
Do you need to edit those images? Or do you need to make sure they are not edited, and if they are, you want to know who did it?
If so, I'd say make a versioned external repo (a git repo will do), and have your application store a chain of SHA-1s of the file commits. If anyone wants to alter your images behind your back, they'll need to gain access to both the DB and the repo, and know how you pair the two information levels (how deeply paranoid you need to be is up to your business requirements).
If you just want to keep the images and no particular security is required, I'd say just store them in the DB. 1.000.000/year pictures are ~2/minute, if the flow is regular. As per your later posts they are not so big that they could not be managed, you don't seem to expect a lot of concurrent reads and doing it like this all you have to think of is ONE db backup/restore procedure. This might be more complex if you expect traffic peaks on the insert phase, of course.
Berto
Related to this is the backup service that seems to me that in a disaster, I can restore the database relatively quickly if it's in the database. On the disk, I believe that the restoration is much slower and I don't think it's reliable that all the images are copied/restored.
I think about using it inside the database to facilitate the select and generation of reports...
Confidential
We have a large B2B application used in marketing management – heavily focused on images/video – no binaries ever stored in DB – we have all the metadata in the DB and binaries in scalable folder structure. Serving up images from the DB is not very scalable. I am not sure what you will do with the binary data that it is needed in the DB?
Z
From: Estevan Rech <softrech@gmail.com>
Sent: 15 December 2021 17:45
To: pgsql-general@lists.postgresql.org
Subject: Best Strategy for Large Number of Images
Hi,
I don't currently use PostgreSQL, but I plan to migrate and I have a question about the best way/strategy for storing images. I have about 2 million images with a growth trend of around 1 million images per year. I plan to store in bytea format in an isolated table. Is this recommended? Is there another better way?
How is this folder structure like 10,000 folders? and the backup of it, how long does it take?
Attachment
> folder structure
"Given a Who's On First ID its (relative) URI can be derived by splitting the ID in to 3-number chunks representing nested subdirectories, followed by filename consisting of the ID followed by .geojson. For example the ID for Montréal is 101736545 which becomes: 101/736/545/101736545.geojson"
"As of this writing it remains clear that this approach (lots of tiny files parented by lots of nested directories) can be problematic. We may be forced to choose another approach, like fewer subdirectories but nothing has been decided and anything we do will be backwards compatible." ( from https://whosonfirst.org/data/principles/ )
How is this folder structure like 10,000 folders? and the backup of it, how long does it take?