Re: Best Strategy for Large Number of Images - Mailing list pgsql-general

From Gavan Schneider
Subject Re: Best Strategy for Large Number of Images
Date
Msg-id AE9AABF7-7CEE-4D85-9343-4DCFC38A3880@pendari.org
Whole thread Raw
In response to Re: Best Strategy for Large Number of Images  (Estevan Rech <softrech@gmail.com>)
List pgsql-general
On 16 Dec 2021, at 7:12, Estevan Rech wrote:

> I have an application that takes pictures of items and uses them as
> evidence in a report.
>
> These images are saved and then used to generate a report.
>
Once you start talking reports and evidence I think audit trail and verification a.k.a. “chain of evidence”.
With the images in the database they can be part of your information ecosystem, get a datestamp along with checksum and
allthis included in the audit trail. 
All this can be done with the images in the file system and the database holds the pointers only… it just may not be as
easyto convince the oppositions lawyers that it is tamper proof, except the best you can really get is tamper evident.
😉

> Each item averages 10 photos and I have about 2 million photos currently,
> with an average growth of 1 million photos over the next year.
>
> I think about using it within the database because of the ease of selecting
> records with the content of the images (when necessary).
>
That will be the same however you go. The selection will be on the context data you enter with the images. There has
beenno mention of searching within the image itself, and this is not something PG can do “out of the box” anyway. 

> I think my biggest concern is to continue to use this strategy for the next
> 2 years and encounter a PostgreSQL limitation or some failure and have to
> migrate database again or have significant loss of images. And on the
> contrary too, if I use it on disk and in 2 years I find some failure or
> file limit in the operating system (currently it's windows server 2016)...
>
The total image data will end up somewhere less than 4Tb if your estimates continue to hold. That’s not going to stress
PG.It’s not such a big deal on a modern home computer either, but I can’t make any informed comment on Windows
anything.Realistically, once the data is in the database the whole can be moved to a more capable machine and/or OS if
problemsstart to appear. PG will work wherever you deploy it. 

> Related to this is the backup service that seems to me that in a disaster,
> I can restore the database relatively quickly if it's in the database. On
> the disk, I believe that the restoration is much slower and I don't think
> it's reliable that all the images are copied/restored.
>
If you have a reliable tested backup system for the database and don’t want to add to the workload then get the images
intothe database. If there are reasons why you end up with the images outside the database then, of course, the backup
processhas to be able to keep everything together and guarantee a full restore. Doable as well, but it does add some
complexity.

If I am reading this correctly the use case is one of receiving data in the form of photos and field reports, entering
thereport items associated with the photos into the information system, generating a report (with embedded photos), and
keepingrecords so disputes can be handled. This is not a high volume access to the images so there is no real need for
optimumfilesystem speed to serve the images… keeping them in the database as bytea is perfectly workable and will work
fordata volumes well beyond the current estimates. 

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem —
neat,plausible, and wrong. 
— H. L. Mencken, 1920



pgsql-general by date:

Previous
From: Bryn Llewellyn
Date:
Subject: Re: Why can't I have a "language sql" anonymous block?
Next
From: Marc Millas
Date:
Subject: Re: Best Strategy for Large Number of Images