Thread: clustering and denormalizing with PostgreSQL?

clustering and denormalizing with PostgreSQL?

From
Chirag Patel
Date:
My application is designed in such a way that the amount of data is expected to be enormous (in the terabytes). I would like to cluster the data and have the luxury of being able to slice up (denormalize) the database tables by user. In other words, users 1-50 and user 51-100 can be divided so that queries across users is not needed.
 
Does PostgresSQL or any third party add-ons allow this scaling to automatically occur? When I say automatically, I mean:
1.       New users are routed to a different database when the current database reaches a pre-defined capacity
2.       Denormalization logic is provided so that the querying across databases is transparent to the user
 
I’ve heard this feature described as “federation” or “horizontal scaling” in the O'Reilly book “Building Scalable Web Sites”
 
I’ve never done clustering  before so I’m clueless. Any ideas or better suggestions?
 
Thanks
Chirag


Re: clustering and denormalizing with PostgreSQL?

From
Jon Sime
Date:
Chirag Patel wrote:
> Does PostgresSQL or any third party add-ons allow this scaling to
> automatically occur? When I say automatically, I mean:
>
> 1.       New users are routed to a different database when the current
> database reaches a pre-defined capacity

No.

> 2.       Denormalization logic is provided so that the querying across
> databases is transparent to the user

That is not denormalization.

> I’ve heard this feature described as “federation” or “horizontal
> scaling” in the O'Reilly book “Building Scalable Web Sites”
>
> I’ve never done clustering  before so I’m clueless. Any ideas or better
> suggestions?

It seems you may be confusing several different terms. Nothing you
mentioned involves denormalization. That is a process (hopefully a
conscious and justifiable one!) which reduces the highest Normal Form to
which your data conforms.

Partitioning (which you didn't mention, but is most likely pretty
relevant when talking about terabytes of data) is a feature which stores
the actual data of a logical table into any number of physical tables
(within the same database), each of which store an explicitly defined
subset of the data.

Horizontal scaling is merely a general concept which, in its simplest
form, means "adding more nodes" as opposed to vertical scaling which
(very roughly) means making an individual node as powerful as possible.
The two are not mutually exclusive (you could add more nodes, and make
each one as powerful as possible -- if you've got the money), and the
concept is not limited to databases.

Federation typically refers to a situation where you have multiple
datastores (which can be different databases on one machine, entirely
different servers, various mixtures of storage formats, etc.) which each
store and are responsible for subsets of your data, with a middleware
layer on top of them that mediates access and updates to the underlying
and disparate repositories. A certain software package also uses the
term to refer to individual non-local tables stored on a single remote
server which are presented as if they were local tables.

Clustering generally indicates a setup where data is managed across a
set of two or more nodes, each node either storing copies of all the
data locally, or at least able to access the non-local portions
transparently from another source when necessary. Each node in the
cluster should be able to present a consistent version of the entire
dataset to connected clients (i.e. if you connect to two different nodes
and issue the same query at the same time to each, you would get the
same data back from both). The issue of propagating all changes to the
data from one node to all others in a fail-safe and consistent way makes
clustering generally very complicated. Specifically in the context of an
index, clustering can also mean ordering the data in a table based on
the values of an indexed column.

-Jon

--
Senior Systems Developer
Media Matters for America
http://mediamatters.org/