[Q] Cluster design for geographically separated dbs - Mailing list pgsql-general

From V S P
Subject [Q] Cluster design for geographically separated dbs
Date
Msg-id 1236459827.25013.1304185549@webmail.messagingengine.com
Whole thread Raw
In response to Re: VACUUM  (John R Pierce <pierce@hogranch.com>)
Responses Re: [Q] Cluster design for geographically separated dbs
List pgsql-general
Hello,
I am designing a db to hold often changed user data.

I just wanted to write down what I am thinking and ask people on the
list to comment
if they have any experiences in that area.

My thought is to have
say west-cost and east-cost data center and each user will
go to either East Coast or West Coast

and then within each Coast, I would want to partition by Hash on a user
id.

I am reviewing the Skype paper on the subject


http://kaiv.wordpress.com/2007/07/27/postgresql-cluster-partitioning-with-plproxy-part-i/


And wanted to ask what would be the main challenges I am facing with --
from the experience of the users on this list.

Especially I am not sure how to for example manage 'overlapping unique
IDs' data.

First, say I have a user who is trying to register with the same ID as
somebody
else only in a different data center -- that means that I always have to
check
first in each datacenter if ID exists.  Then based on his/her IP address
I decide what data center is closest (but IP addresses are often not a
good indication of geographical location of the user either, so I will
give them a 'manual' select option)

Then if I have say 'BIG' serial in my tables, but since there is more
than one database -- the 'big-serial' in one database can well overlap
it in another database.

So if I have any tables that must contain data from different databases
-- I have to add something else to the 'foreign' key -- besides the
reference to the big serial. And so on...

Right now - on paper, I am just having quite a few 'extra' fields in my
tables just to support 'UNiqueness' of the record across clusters.

I am not sure if I am doing it the right way (because then I also have
to at some point in time 'Defgrament' the IDs (as the data with
BIGserial keys can be deleted).

It looks to me that If I design things to take advantage of  Skype's
plproxy -- I will be able to leverage, what appears to be, a relatively
easy way to get data between databases (for reports that span clusters).


thanks in advance for any comments,
Vlad
--
  V S P
  toreason@fastmail.fm

--
http://www.fastmail.fm - Email service worth paying for. Try it for free


pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: VACUUM
Next
From: Willy-Bas Loos
Date:
Subject: open up firewall from "anywhere" to postgres ports?