Re: Storing files: 2.3TBytes, 17M file count - Mailing list pgsql-general

From Eduardo Morras
Subject Re: Storing files: 2.3TBytes, 17M file count
Date
Msg-id 20161128203749.3e8002605dcb6406af48375d@yahoo.es
Whole thread Raw
In response to Storing files: 2.3TBytes, 17M file count  (Thomas Güttler <guettliml@thomas-guettler.de>)
List pgsql-general
On Mon, 28 Nov 2016 15:28:28 +0100
Thomas Güttler <guettliml@thomas-guettler.de> wrote:

> Hi,
>
> Up to now, we don't store files in PostgreSQL.
>
> I was told, that you must not do this .... But this was 20 years ago.
>
>
> I have 2.3TBytes of files. File count is 17M
>
> Up to now we use rsync (via rsnapshot) to backup our data.
>
> But it takes longer and longer for rsync to detect
> the changes. Rsync checks many files. But daily only
> very few files really change. More than 99.9% don't.
>
> Since we already store our structured data in postgres, I think
> about storing the files in PostgreSQL, too.
>
> What is the current state of the art?
>
> Is it feasible to store file in PostgreSQL?

Yes and no, it's another level of indirection, slower than pure
filesystem solution.

Rsync checks last read/access time, last write/modification time and
file hash before copying it. If no one of those metadata change, rsync
don't copy it. File hash must be recalculated if access time and
modification time change.

> Are there already projects which use PostgreSQL as storage backend?
>
> I have the hope, that it would be easier to backup only the files
> which changed.

Rsync tries to backup only the files that changed. There are other
tools like cpdup, don't know if it's ported to linux, It's similar to
rsync.

You can use a p2p system, unlike ftp, rsync, etc they store a full Tree Hash (Tiger Tree Hash often) of file content to
allowmulti peer to peer copy. 

> Regards,
>     Thomas Güttler
>
>
> Related question at rsnapshot mailing list:
> https://sourceforge.net/p/rsnapshot/mailman/rsnapshot-discuss/thread/57A1A2F3.5090409@thomas-guettler.de/
> --
> Thomas Guettler http://www.thomas-guettler.de/
>

---   ---
Eduardo Morras <emorrasg@yahoo.es>


pgsql-general by date:

Previous
From: Devin Smith
Date:
Subject: Bad query? Or planner?
Next
From: greigwise
Date:
Subject: Re: Query with large in clauses uses a lot of memory