Re: How to store "blobs" efficiently for small and large sizes, with random access - Mailing list pgsql-general

From esconsult1@gmail.com
Subject Re: How to store "blobs" efficiently for small and large sizes, with random access
Date
Msg-id 0BE00147-79F4-45E8-BDEF-E809BCFAA668@gmail.com
Whole thread Raw
In response to Re: How to store "blobs" efficiently for small and large sizes, with random access  (Dominique Devienne <ddevienne@gmail.com>)
List pgsql-general
We had the same thought of storing the blobs inside LO’s as well many years ago.

But ultimately chose cloud storage and stored a pointer in the database instead.

Now that we are approaching a terabyte of just normal data I don’t regret this decision one bit. Just handling backups and storage is already a chore. 

Data in S3 compatible storage is very easy to protect in numerous ways.

We have one set of code responsible for uploading, downloading and deleting the files themselves.

One downside? Occasionally an S3 delete fails and now and again a file or two gets orphaned. But we’ve never not found a file pointed to from our attachments table in 11 years.

We also only store pathnames/base names so we can easily move storage providers if we decide to go on Prem.

There is absolutely no upside to storing files in the db if you anticipate any kind of growth or significant volume.

Ericson Smith
CTO
Travel Agency Tribes

Sent from my iPhone

On 19 Oct 2022, at 7:01 PM, Dominique Devienne <ddevienne@gmail.com> wrote:


On Wed, Oct 19, 2022 at 1:38 PM Andreas Joseph Krogh <andreas@visena.com> wrote:
There's a reason “everybody” advices to move blobs out of DB, I've learned.

I get that. I really do. But the alternative has some real downsides too.
Especially around security, as I already mentioned. That's why I'd like if possible
to get input on the technical questions of my initial post.

That's not to say we wouldn't ultimately move out the big blobs outside the DB.
But given how much that would complexify the project, I do believe it is better
to do it as a second step, once the full system is up-and-running and testing at
scale has actually been performed.

We've already moved other kind of data to PostgreSQL, from SQLite DBs (thousands) this time,
and ported "as-is" the sharding done on the SQLite side to PostgreSQL (despite TOAST).
And so far, so good. With good ingestion rates. And decent runtime access to data too,
in the albeit limited testing we've had so far.

Now we need to move this other kind of data, from proprietary DB-like files this times (thousands too),
to finish our system, and be able to finally test the whole system in earnest, and at (our limited internal) scale.

So you see, I'm not completely ignoring your advise.

But for now, I'm inquiring as to the *best* way to put that data *in* PostgreSQL,
with the requirements / constraints I've listed in the first post.
It may indeed be a bad idea long term. But let's make the most of it for now.
Makes sense? Am I being unreasonable here? --DD

pgsql-general by date:

Previous
From: Dominique Devienne
Date:
Subject: Re: How to store "blobs" efficiently for small and large sizes, with random access
Next
From: "Daniel Verite"
Date:
Subject: Re: How to store "blobs" efficiently for small and large sizes, with random access