Re: Experience wanted: low-cost Photo-Archive in TB range - Mailing list pgsql-admin

From Jamie Lawrence
Subject Re: Experience wanted: low-cost Photo-Archive in TB range
Date
Msg-id 20031128143157.GB24082@clueinc.net
Whole thread Raw
In response to Re: Experience wanted: low-cost Photo-Archive in TB range  (Marek Florianczyk <franki@tpi.pl>)
List pgsql-admin
> Hello
>
> We are planning and offering a low-cost Photo-Archive in TB range. Is
> there any experience available with that amount of Data (1-3 TB)?
> What kind of machines (CPU, RAM)? What kind of I/O-System? What kind of
> Network (100Mbit Ethernet, Fibre, etc.).
> What kind of backup-system? How to backup that amount of data, etc.

Well, one can generalize a bit.

You don't talk about the economics at all, so I can't help you there.

Assuming one is providing access to small chunks of data (generally only
a meg or two) via metadata about it (account, name, flag, permissions,
maybe other things) then you can generalize a bit.

Best practice for large storage arrays is proprietary SAN storage
critter or some sort. EMC, Netapp, etc. Since you're asking this on a
Postgres list, I assume you don't want to go that route. Clustering
several machines to provide NFS/Coda/AFS could work. Then you don't care
so much about the machines themselves, and can buy cheap, and a lot
of them.

The network is trickier. What sort of access patterns do you expect? I
get the feeling you expect lots of access. So, partition storage from
delivery. Delivery should be staightforward - web content delivery isn't
that hard. At that point you can talk about the network - what do you
expect to handle?  Where do you need caches? How do you handle failover?

Backup is business specific. This is a recovery question - what is the
damage if data is lost vs. what are you willing to spend? There really
isn't a best practice here, because cost gets in the way.

As far as structuring the data, well, you don't ask about that, and
don't talk about the application at all. Blobs might be in order; you
might want to manage metadata in the database and files on a filesystem;
it depends on what you want to do with it.

If you want to provide more info, I'll chat with you - I've done large
storage networks before. You have to provide more info to get better
answers. I gvague hint that you're not going to spend much money on
it, and that's a mistake. Reliable terabyte-scale installations are
expensive.

-j



--
Jamie Lawrence                                        jal@jal.org
It it ain't broke, let me have a shot at it.



pgsql-admin by date:

Previous
From: "Gabriel Migo"
Date:
Subject: Unsuscribe
Next
From: Tom Lane
Date:
Subject: Re: PGDUMP BLOB PROBLEM