Re: Database design: Data synchronization - Mailing list pgsql-general

From Decibel!
Subject Re: Database design: Data synchronization
Date
Msg-id D929C937-343F-416A-87BF-959EC730BEED@decibel.org
Whole thread Raw
In response to Database design: Data synchronization  (David <wizzardx@gmail.com>)
List pgsql-general
On Jun 18, 2008, at 7:07 AM, David wrote:
> - Many foreign keys weren't enforced
>
> - Some fields needed special treatment (eg: should be unique, or
> behave like a foreign key ref, even if db schema doesn't specify it.
> In other cases they need to be updated during the migration).
>
> - Most auto-incrementing primary keys (and related foreign key
> references) needed to be updated during migration, because they are
> already used in the destination database for other records.
>
> - Many tables are undocumented, some fields have an unknown purpose
>
> - Some tables didn't have fields that can be used as a 'natural' key
> for the purpose of migration (eg: tables which only exist to link
> together other tables, or tables where there are duplicate records).
>
> I wrote a Python script (using SQLAlchemy and Elixir) to do the above
> for our databases.
>
> Are there any existing migration tools which could have helped with
> the above? (it would have required a *lot* of user help).
>
> Are there recommended ways of designing tables so that synchronization
> is easier?
>
> The main thing I've read about is ensuring that all records have a
> natural key of some kind, eg GUID. Also, your migration app needs to
> have rules for conflict resolution.


Well, it sounds like you've got a good list of what NOT to do. The
first step is to make sure that you have a good database design,
outside of replication considerations. Most tables should have
natural unique keys; make sure you have FK's, documment things (see
the COMMENT ON command), etc. If you have low data quality to start
with, spreading that all over is just going to make things worse.

For the actual replication, there isn't really a multi-master
solution for Postgres. Your best bet is to try and design the system
so that you don't have conflicts (ie: if you have a bunch of branch
offices, each one is responsible for their own data). You can then
build something akin to multi-master using londiste and pgq.
--
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



Attachment

pgsql-general by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Losing data
Next
From: Decibel!
Date:
Subject: Re: pg_locks "at-a-glance" view