Home > mailing lists

Re: syncing - between databases - Mailing list pgsql-sql

From	Steven Crandell
Subject	Re: syncing - between databases
Date	May 14, 2012 05:32:40
Msg-id	CALvesgkzc1a0kkJUeyAJxxdywafjjkLAHsbUeapQ8fSCrYauKA@mail.gmail.com Whole thread Raw
In response to	Re: syncing - between databases (John Fabiani <johnf@jfcomputer.com>)
Responses	Re: syncing - between databases
List	pgsql-sql

Tree view

Having all the dblink destinations on the same server makes it a much more viable option since the chances of the dblink update failing are greatly reduced. That said, here's a run down on the queue system I described with some specificity toward your situation.

You can add a flag to each customer row that denotes whether or not the data has been synced to the other tables,

ALTER TABLE customer ADD synced BOOLEAN DEFAULT FALSE; --watch out for how this default might affect existing customers

and then have your sync process flip the flag to true when it has safely written the data to all other tables.

Alternatively, you can store the data that needs to be sync'd in a separate table if making changes to the customer table isn't a viable option.

CREATE TABLE customer_queue (LIKE customer);

ALTER TABLE customer_queue ADD created_time TIMESTAMP DEFAULT now();

ALTER TABLE customer_queue ADD operation CHAR(1); -- 'i' = insert, 'u' == update, etc

ALTER TABLE customer_queue ADD processed_time TIMESTAMP;

ALTER TABLE customer_queue ADD processed BOOLEAN DEFAULT FALSE;

......or something similar

If patching the application to write new/updated customer data to the customer_queue table (in addition to or instead of the customer table) is out of scope, you could populate it via trigger.

Once you have a data structure that stores your customer data and the meta data which captures whether the row has been safely synced out to the other tables it's just a matter of writing a script that reads your sync meta data (queue table or your customer table where not processed) and processes all rows that are pending.

SELECT foo,bar,baz FROM customer_queue WHERE NOT processed ORDER BY created_time; -- FIFO

The script should be able to verify that a given row was safely written to all destinations before setting processed to true.

Anyway, that's one of many ways to accomplish this and it's surely far from the best but I hope this is helpful.

regards

-steve

On Sun, May 13, 2012 at 1:01 PM, John Fabiani <johnf@jfcomputer.com> wrote:

All on the same cluster (only one server). Although, it would be nice to
have only one table there are real business reasons to dup the databases.

I am interested in how you dealt with a queue table. Would you take a little
time to describe the way it worked.

Johnf

On Saturday, May 12, 2012 08:53:52 PM you wrote:
> Are these 5 databases on different servers and at different locations or
> are they on the same local cluster?
> If they are all on the same local cluster you may want to rethink how you
> are storing customer data. The design you describe seems redundant.
>
> If you are dealing with multiple servers (and perhaps business rules that
> require duplicate, writable user tables at each location?) then your plan
> needs to account for network failure. A synchronous cross-network dblink
> trigger mechanism left to its own devices will eventually fail and you will
> be left with inconsistent data. Nothing wrong with dblink but you need to
> build in some error handling.
>
> I've built systems that accomplished similar things by writing data to a
> queue table (in addition to your local master customer table) which is then
> reconciled/synced out to other nodes or process by an periodic script that
> is able to deal with or alert on locking/dupe key/network and other errors
> that keep it from properly syncing a row to all other nodes. This
> introduces added durability to your sync mechanism but also introduces some
> lag time. Pick your poison.
>
> -steve
>
> On Sat, May 12, 2012 at 7:28 AM, John Fabiani <johnf@jfcomputer.com> wrote:
> > I need to maintain a sync-ed table across several databases. For
> > example I have a customer table in 5 databases. If a user of any of
> > the databases inserts a new customer I need to insert the new record
> > into the other four databases. But question is updates and deletes.
> >
> > I can use a trigger and dblink to update the other databases when the
> > action
> > is an insert because in each of the other databases I don't have to
> > worry
> > about a locked record. But what happens if a user is updating at the
> > same moment as a different user in a different database is updating the
> > same customer. Can a race condition occur?
> >
> > I was thinking I could create a master database. And have all the other
> > databases use dblink to excute the master trigger.
> >
> >
> > Any advise would be helpful,
> >
> > Johnf
> >
> > --
> > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-sql

pgsql-sql by date:

From: Tom Lane
Date: 14 May 2012, 03:12:03
Subject: Re: Finding Max Value in a Row

From: Samuel Gendler
Date: 14 May 2012, 05:45:30
Subject: Re: Finding Max Value in a Row

Re: syncing - between databases - Mailing list pgsql-sql

Previous

Next