Re: > 16TB worth of data question - Mailing list pgsql-general

From Magnus Naeslund(f)
Subject Re: > 16TB worth of data question
Date
Msg-id 086a01c30842$1af0fa90$f80c0a0a@mnd
Whole thread Raw
In response to > 16TB worth of data question  (Jeremiah Jahn <jeremiah@cs.earlham.edu>)
List pgsql-general
Jeremiah Jahn <jeremiah@cs.earlham.edu> wrote:
> I have a system that will store about 2TB+ of images per year in a PG
> database. Linux unfortunatly has the 16TB limit for 32bit systems. Not
> really sure what should be done here. Would life better to not store
> the images as BLOBS, and instead come up with some complicated way to
> only store the location in the database, or is there someway to have
> postgres handle this somehow? What are other people out there doing
> about this sort of thing?
>

I have some experience in this matter.
We've made a system that contains about 5 million images today, and it's
about 2.5 TB. Our system works by having the metadata in a postgresql
database to be able to do complex searches, and it only has i filename
in _relative_ form like this:

11/2/33

and then the application server handling requests maps it into a real
file:

/storage/links/11/2/33.jpg

The links directory is just a indirection from the real storage
location, so "11" in this case can be /storage/nfs/server1/11.
Flexibility is the big win here. I can split partitions up any way i
want it and i don't have to care about any limits except maybe that the
"links" directory is getting too big. But since this is our application
we know that when the database is complete (20 million images+) it will
till have only 4000 directory entries in "links".

The blob stuff also sucks from other points of views.
It makes the database less manageable and also messes with lazy (we use
rsync) mirroring backups of the images, and also database dumps must be
huge? (i don't know how blobs are handled in pg_dump)

If it's nessecary to be able to handle this via postgresql, i would do a
serverside function that fetch the image from the filesystem and sends
it over, maybe using the bytea encoding?

I wouldn't go for the all-in-database approach ever in anything big.

Cheers
Magnus


pgsql-general by date:

Previous
From: Doug McNaught
Date:
Subject: Re: > 16TB worth of data question
Next
From: "Jeff MacDonald"
Date:
Subject: Time handling in pgsql.