Re: Storing files: 2.3TBytes, 17M file count - Mailing list pgsql-general

From Stuart Bishop
Subject Re: Storing files: 2.3TBytes, 17M file count
Date
Msg-id CADmi=6NU4Q98_CxmgcmgGpJ9909KC+jurNJbCrKXZu8keGJE5A@mail.gmail.com
Whole thread Raw
In response to Re: Storing files: 2.3TBytes, 17M file count  (Thomas Güttler <guettliml@thomas-guettler.de>)
List pgsql-general
On 29 November 2016 at 16:50, Thomas Güttler <guettliml@thomas-guettler.de> wrote:


Am 29.11.2016 um 01:52 schrieb Mike Sofen:
From: Thomas Güttler   Sent: Monday, November 28, 2016 6:28 AM

...I have 2.3TBytes of files. File count is 17M

Since we already store our structured data in postgres, I think about storing the files in PostgreSQL, too.

Is it feasible to store file in PostgreSQL?

 
I guess I will use some key-to-blob store like s3. AFAIK there are open source s3 implementations available.

Thank you all for your feeback!

 Regards, Thomas


I have a similar setup. I have about 20TB of data in over 60 million files. It might be possible to store that in PG, but I think it would be a huge headache easily avoided. Files are GPG encrypted and backed up offsite to S3, with lifecycle rules to migrate that to Glacier storage. A tool like boto lets you sync things easily to S3, and maybe directly to glacier, and there are alternatives out there.

If your rsync is taking too long, it will be worse syncing to s3 though. If that is your bottleneck, then you need to fix it. Probably by knowing which files have changed and only resyncing them,for example using timestamps from the database or storing 'incoming' files in a separate area from your 'archive'. Once you have this sorted you can do your backups every few minutes and reduce your potential data loss.
 

--

pgsql-general by date:

Previous
From: Michael Paquier
Date:
Subject: Re: pg_rewind rewinded too much...
Next
From: elbriga
Date:
Subject: Rounding Problems?