Re: How to store "blobs" efficiently for small and large sizes, with random access - Mailing list pgsql-general

From Andreas Joseph Krogh
Subject Re: How to store "blobs" efficiently for small and large sizes, with random access
Date
Msg-id VisenaEmail.8c.b06498274ef2e53f.183f001419c@visena.app.internal.visena.net
Whole thread Raw
In response to Re: How to store "blobs" efficiently for small and large sizes, with random access  (Dominique Devienne <ddevienne@gmail.com>)
Responses Re: How to store "blobs" efficiently for small and large sizes, with random access
Re: How to store "blobs" efficiently for small and large sizes, with random access
List pgsql-general
På onsdag 19. oktober 2022 kl. 13:21:38, skrev Dominique Devienne <ddevienne@gmail.com>:
On Wed, Oct 19, 2022 at 1:00 PM Andreas Joseph Krogh <andreas@visena.com> wrote:
> Ok, just something to think about;

Thank you. I do appreciate the feedback.

> Will your database grow beyond 10TB with blobs?

The largest internal store I've seen (for the subset of data that goes
in the DB) is shy of 3TB.
But we are an ISV, not one of our clients, which have truly massive
scale for data.
And they don't share the exact scale of their proprietary data with me...

> If so try to calculate how long it takes to restore, and comply with SLA,
> and how long it would have taken to restore without the blobs.

Something I don't quite get is why somehow backup is no longer needed
if the large blobs are external?
i.e. are you saying backups are so much more worse in PostgreSQL than
with the FS? I'm curious now.

I'm not saying you don't need backup (or redundancy) of other systems holding blobs, but moving them out of RDBMS makes you restore the DB to a consistent state, and able to serve clients, faster. In my experience It's quite unlikely that your (redundant) blob-store needs crash-recovery at the same time you DB does. The same goes with PITR, needed because of some logical error (like client deleted some data they shouldn't have), which is much faster without blobs in DB and doesn't affect the blobstore at all (if you have a smart insert/update/delete-policy there).

 

Also, managing the PostgreSQL server will be the client's own concern
mostly. We are not into Saas here.
As hinted above, the truly massive data is already not in the DB, used
by different systems, and processed
down to the GB sized inputs all the data put in the DB is generated
from. It's a scientific data heavy environment.
And one where security of the data is paramount, for contractual and
legal reasons. Files make that harder IMHO.

Anyways, this is straying from the main theme of this post I'm afraid.
Hopefully we can come back on the main one too. --DD

There's a reason “everybody” advices to move blobs out of DB, I've learned.

 

--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

pgsql-general by date:

Previous
From: Dominique Devienne
Date:
Subject: Re: How to store "blobs" efficiently for small and large sizes, with random access
Next
From: Dominique Devienne
Date:
Subject: Re: How to store "blobs" efficiently for small and large sizes, with random access