Re: Storing files: 2.3TBytes, 17M file count - Mailing list pgsql-general

From Chris Travers
Subject Re: Storing files: 2.3TBytes, 17M file count
Date
Msg-id CAKt_Zfv-OZVDQGtxSOD=O4hSGSc2_BHoGxLkAAQpQRH549-0yA@mail.gmail.com
Whole thread Raw
In response to Storing files: 2.3TBytes, 17M file count  (Thomas Güttler <guettliml@thomas-guettler.de>)
List pgsql-general


On Mon, Nov 28, 2016 at 3:28 PM, Thomas Güttler <guettliml@thomas-guettler.de> wrote:
Hi,

PostgreSQL is rock solid and one of the most reliable parts of our toolchain.

   Thank you

Up to now, we don't store files in PostgreSQL.

I was told, that you must not do this .... But this was 20 years ago.


I have 2.3TBytes of files. File count is 17M

Up to now we use rsync (via rsnapshot) to backup our data.

But it takes longer and longer for rsync to detect
the changes. Rsync checks many files. But daily only
very few files really change. More than 99.9% don't.

Since we already store our structured data in postgres, I think
about storing the files in PostgreSQL, too.

What is the current state of the art?

Is it feasible to store file in PostgreSQL?

Are there already projects which use PostgreSQL as storage backend?

I have the hope, that it would be easier to backup only the files which changed.


There is a tradeoff.  On one hand, as you note, it is easier to back things up if you are storing the files in PostgreSQL.   Now, I have *not* looked at how this would work for binary format transfer so that might be different, but in most cases I have looked at, the downside is in the encoding and decoding.

If files come in as hexadecimal, then you already have twice as much transfer as space.  Then typically driver-based encoding will copy in the process, meaning that you end up with many times the RAM used as the files.  When I tested this in Perl, it was common for 8x the size of the file to be used in RAM in the course of decoding and sending it on.  Driver, framework, and encoding may affect this, however.

Now, depending on what you are doing, that may not be a problem.  It sounds like you have a large number of files, and they are up to a number MB in size.  Since that memory usage would be short-term that may not be a problem but I cannot say for you whether it is or not.

So be aware of the tradeoff and decide appropriately.

Regards,
   Thomas Güttler


Related question at rsnapshot mailing list:
https://sourceforge.net/p/rsnapshot/mailman/rsnapshot-discuss/thread/57A1A2F3.5090409@thomas-guettler.de/



--
Thomas Guettler http://www.thomas-guettler.de/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Storing files: 2.3TBytes, 17M file count
Next
From: Melvin Davidson
Date:
Subject: Re: pg_dump system catalog