Re: Storing files: 2.3TBytes, 17M file count - Mailing list pgsql-general

From Jerome Wagner
Subject Re: Storing files: 2.3TBytes, 17M file count
Date
Msg-id CA+=V_fMuZ650wfCxmYGrK1YmyFTVkWk5OEi-a7vVY-eZhrnKeQ@mail.gmail.com
Whole thread Raw
In response to Re: Storing files: 2.3TBytes, 17M file count  (Thomas Güttler <guettliml@thomas-guettler.de>)
List pgsql-general
2 other options that you may want to look at :

- cephfs

This has nothing to do with postgres but is a distributed filesystem handling very large amount of files (thinks next generation NFS)
I haven't tried it myself yet but they reached a "stable" milestone regarding the distributed fs.

- fuse based postgres storage

this can 
 - relieve the "pg_largeobject" beeing a system table issue (tablespace, ..)
 - give you a fs-like access to your data

The closest I have seen to this thus far is https://github.com/andreasbaumann/pgfuse which would probably need some tinkering.





On Tue, Nov 29, 2016 at 10:50 AM, Thomas Güttler <guettliml@thomas-guettler.de> wrote:


Am 29.11.2016 um 01:52 schrieb Mike Sofen:
From: Thomas Güttler   Sent: Monday, November 28, 2016 6:28 AM

...I have 2.3TBytes of files. File count is 17M

Since we already store our structured data in postgres, I think about storing the files in PostgreSQL, too.

Is it feasible to store file in PostgreSQL?

-------

I am doing something similar, but in reverse.  The legacy mysql databases I’m converting into a modern Postgres data
model, have very large genomic strings stored in 3 separate columns.  Out of the 25 TB of legacy data storage (in 800
dbs across 4 servers, about 22b rows), those 3 columns consume 90% of the total space, and they are just used for
reference, never used in searches or calculations.  They range from 1k to several MB.



Since I am collapsing all 800 dbs into a single PG db, being very smart about storage was critical.  Since we’re also
migrating everything to AWS, we’re placing those 3 strings (per row) into a single json document and storing the
document in S3 bins, with the pointer to the file being the globally unique PK for the row…super simple.  The app tier
knows to fetch the data from the db and large string json from the S3 bins.  The retrieval time is surprisingly fast,
this is all real time web app stuff.



This is a model that could work for anyone dealing with large objects (text or binary).  The nice part is, the original
25TB of data storage drops to 5TB – a much more manageable number, allowing for significant growth, which is on the horizon.

Thank you Mike for your feedback.

Yes, I think I will drop my idea. Encoding binary (the file content) to text and decoding to binary again makes no sense. I was not aware that this is needed.

I guess I will use some key-to-blob store like s3. AFAIK there are open source s3 implementations available.

Thank you all for your feeback!

 Regards, Thomas





--
Thomas Guettler http://www.thomas-guettler.de/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: Thomas Güttler
Date:
Subject: Re: Storing files: 2.3TBytes, 17M file count
Next
From: Patrick B
Date:
Subject: Re: Wal files - Question | Postgres 9.2