Thread: syncing - between databases

syncing - between databases

From
John Fabiani
Date:
I need to maintain a sync-ed table across several databases.  For example I 
have a customer table in 5 databases.  If a user of any of the databases 
inserts a new customer I need to insert the new record into the other four 
databases.  But question is updates and deletes.

I can use a trigger and dblink to update the other databases when the action 
is an insert because in each of the other databases I don't have to worry 
about a locked record.  But what happens if a user is updating at the same 
moment as a different user in a different database is updating the same 
customer.  Can a race condition occur?

I was thinking I could create a master database.  And have all the other 
databases use dblink to excute the master trigger. 


Any advise would be helpful,

Johnf


Re: syncing - between databases

From
Steven Crandell
Date:
Are these 5 databases on different servers and at different locations or are they on the same local cluster?
If they are all on the same local cluster you may want to rethink how you are storing customer data.  The design you describe seems redundant.

If you are dealing with multiple servers (and perhaps business rules that require duplicate, writable user tables at each location?) then your plan needs to account for network failure.  A synchronous cross-network dblink trigger mechanism left to its own devices will eventually fail and you will be left with inconsistent data.  Nothing wrong with dblink but you need to build in some error handling.

I've built systems that accomplished similar things by writing data to a queue table (in addition to your local master customer table) which is then reconciled/synced out to other nodes or process by an periodic script that is able to deal with or alert on locking/dupe key/network and other errors that keep it from properly syncing a row to all other nodes.  This introduces added durability to your sync mechanism but also introduces some lag time.  Pick your poison.

-steve





On Sat, May 12, 2012 at 7:28 AM, John Fabiani <johnf@jfcomputer.com> wrote:
I need to maintain a sync-ed table across several databases.  For example I
have a customer table in 5 databases.  If a user of any of the databases
inserts a new customer I need to insert the new record into the other four
databases.  But question is updates and deletes.

I can use a trigger and dblink to update the other databases when the action
is an insert because in each of the other databases I don't have to worry
about a locked record.  But what happens if a user is updating at the same
moment as a different user in a different database is updating the same
customer.  Can a race condition occur?

I was thinking I could create a master database.  And have all the other
databases use dblink to excute the master trigger.


Any advise would be helpful,

Johnf

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: syncing - between databases

From
John Fabiani
Date:
All on the same cluster (only one server).   Although, it would be nice to 
have only one table there are real business reasons to dup the databases.

I am interested in how you dealt with a queue table.  Would you take a little 
time to describe the way it worked.

Johnf

On Saturday, May 12, 2012 08:53:52 PM you wrote:
> Are these 5 databases on different servers and at different locations or
> are they on the same local cluster?
> If they are all on the same local cluster you may want to rethink how you
> are storing customer data.  The design you describe seems redundant.
> 
> If you are dealing with multiple servers (and perhaps business rules that
> require duplicate, writable user tables at each location?) then your plan
> needs to account for network failure.  A synchronous cross-network dblink
> trigger mechanism left to its own devices will eventually fail and you will
> be left with inconsistent data.  Nothing wrong with dblink but you need to
> build in some error handling.
> 
> I've built systems that accomplished similar things by writing data to a
> queue table (in addition to your local master customer table) which is then
> reconciled/synced out to other nodes or process by an periodic script that
> is able to deal with or alert on locking/dupe key/network and other errors
> that keep it from properly syncing a row to all other nodes.  This
> introduces added durability to your sync mechanism but also introduces some
> lag time.  Pick your poison.
> 
> -steve
> 
> On Sat, May 12, 2012 at 7:28 AM, John Fabiani <johnf@jfcomputer.com> wrote:
> > I need to maintain a sync-ed table across several databases.  For
> > example I have a customer table in 5 databases.  If a user of any of
> > the databases inserts a new customer I need to insert the new record
> > into the other four databases.  But question is updates and deletes.
> > 
> > I can use a trigger and dblink to update the other databases when the
> > action
> > is an insert because in each of the other databases I don't have to
> > worry
> > about a locked record.  But what happens if a user is updating at the
> > same moment as a different user in a different database is updating the
> > same customer.  Can a race condition occur?
> > 
> > I was thinking I could create a master database.  And have all the other
> > databases use dblink to excute the master trigger.
> > 
> > 
> > Any advise would be helpful,
> > 
> > Johnf
> > 
> > --
> > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-sql


Re: syncing - between databases

From
Steven Crandell
Date:
Having all the dblink destinations on the same server makes it a much more viable option since the chances of the dblink update failing are greatly reduced.  That said, here's a run down on the queue system I described with some specificity toward your situation.

You can add a flag to each customer row that denotes whether or not the data has been synced to the other tables, 

ALTER TABLE customer ADD synced BOOLEAN DEFAULT FALSE;  --watch out for how this default might affect existing customers

and then have your sync process flip the flag to true when it has safely written the data to all other tables.
Alternatively, you can store the data that needs to be sync'd in a separate table if making changes to the customer table isn't a viable option.

CREATE TABLE customer_queue (LIKE customer);
ALTER TABLE customer_queue ADD created_time TIMESTAMP DEFAULT now();
ALTER TABLE customer_queue ADD operation CHAR(1); -- 'i' = insert, 'u' == update, etc
ALTER TABLE customer_queue ADD processed_time TIMESTAMP;
ALTER TABLE customer_queue ADD processed BOOLEAN DEFAULT FALSE; 
......or something similar

If patching the application to write new/updated customer data to the customer_queue table (in addition to or instead of the customer table) is out of scope, you could populate it via trigger.  

Once you have a data structure that stores your customer data and the meta data which captures whether the row has been safely synced out to the other tables it's just a matter of writing a script that reads your sync meta data (queue table or your customer table where not processed) and processes all rows that are pending.  
SELECT foo,bar,baz FROM customer_queue WHERE NOT processed ORDER BY created_time;  -- FIFO
The script should be able to verify that a given row was safely written to all destinations before setting processed to true.

Anyway, that's one of many ways to accomplish this and it's surely far from the best but I hope this is helpful.

regards
-steve

On Sun, May 13, 2012 at 1:01 PM, John Fabiani <johnf@jfcomputer.com> wrote:
All on the same cluster (only one server).   Although, it would be nice to
have only one table there are real business reasons to dup the databases.

I am interested in how you dealt with a queue table.  Would you take a little
time to describe the way it worked.

Johnf

On Saturday, May 12, 2012 08:53:52 PM you wrote:
> Are these 5 databases on different servers and at different locations or
> are they on the same local cluster?
> If they are all on the same local cluster you may want to rethink how you
> are storing customer data.  The design you describe seems redundant.
>
> If you are dealing with multiple servers (and perhaps business rules that
> require duplicate, writable user tables at each location?) then your plan
> needs to account for network failure.  A synchronous cross-network dblink
> trigger mechanism left to its own devices will eventually fail and you will
> be left with inconsistent data.  Nothing wrong with dblink but you need to
> build in some error handling.
>
> I've built systems that accomplished similar things by writing data to a
> queue table (in addition to your local master customer table) which is then
> reconciled/synced out to other nodes or process by an periodic script that
> is able to deal with or alert on locking/dupe key/network and other errors
> that keep it from properly syncing a row to all other nodes.  This
> introduces added durability to your sync mechanism but also introduces some
> lag time.  Pick your poison.
>
> -steve
>
> On Sat, May 12, 2012 at 7:28 AM, John Fabiani <johnf@jfcomputer.com> wrote:
> > I need to maintain a sync-ed table across several databases.  For
> > example I have a customer table in 5 databases.  If a user of any of
> > the databases inserts a new customer I need to insert the new record
> > into the other four databases.  But question is updates and deletes.
> >
> > I can use a trigger and dblink to update the other databases when the
> > action
> > is an insert because in each of the other databases I don't have to
> > worry
> > about a locked record.  But what happens if a user is updating at the
> > same moment as a different user in a different database is updating the
> > same customer.  Can a race condition occur?
> >
> > I was thinking I could create a master database.  And have all the other
> > databases use dblink to excute the master trigger.
> >
> >
> > Any advise would be helpful,
> >
> > Johnf
> >
> > --
> > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-sql

Re: syncing - between databases

From
Trinath Somanchi
Date:
Hi-

You can use PgCluster with Slony-II for this type of requirements.




On Mon, May 14, 2012 at 11:02 AM, Steven Crandell <steven.crandell@gmail.com> wrote:
Having all the dblink destinations on the same server makes it a much more viable option since the chances of the dblink update failing are greatly reduced.  That said, here's a run down on the queue system I described with some specificity toward your situation.

You can add a flag to each customer row that denotes whether or not the data has been synced to the other tables, 

ALTER TABLE customer ADD synced BOOLEAN DEFAULT FALSE;  --watch out for how this default might affect existing customers

and then have your sync process flip the flag to true when it has safely written the data to all other tables.
Alternatively, you can store the data that needs to be sync'd in a separate table if making changes to the customer table isn't a viable option.

CREATE TABLE customer_queue (LIKE customer);
ALTER TABLE customer_queue ADD created_time TIMESTAMP DEFAULT now();
ALTER TABLE customer_queue ADD operation CHAR(1); -- 'i' = insert, 'u' == update, etc
ALTER TABLE customer_queue ADD processed_time TIMESTAMP;
ALTER TABLE customer_queue ADD processed BOOLEAN DEFAULT FALSE; 
......or something similar

If patching the application to write new/updated customer data to the customer_queue table (in addition to or instead of the customer table) is out of scope, you could populate it via trigger.  

Once you have a data structure that stores your customer data and the meta data which captures whether the row has been safely synced out to the other tables it's just a matter of writing a script that reads your sync meta data (queue table or your customer table where not processed) and processes all rows that are pending.  
SELECT foo,bar,baz FROM customer_queue WHERE NOT processed ORDER BY created_time;  -- FIFO
The script should be able to verify that a given row was safely written to all destinations before setting processed to true.

Anyway, that's one of many ways to accomplish this and it's surely far from the best but I hope this is helpful.

regards
-steve

On Sun, May 13, 2012 at 1:01 PM, John Fabiani <johnf@jfcomputer.com> wrote:
All on the same cluster (only one server).   Although, it would be nice to
have only one table there are real business reasons to dup the databases.

I am interested in how you dealt with a queue table.  Would you take a little
time to describe the way it worked.

Johnf

On Saturday, May 12, 2012 08:53:52 PM you wrote:
> Are these 5 databases on different servers and at different locations or
> are they on the same local cluster?
> If they are all on the same local cluster you may want to rethink how you
> are storing customer data.  The design you describe seems redundant.
>
> If you are dealing with multiple servers (and perhaps business rules that
> require duplicate, writable user tables at each location?) then your plan
> needs to account for network failure.  A synchronous cross-network dblink
> trigger mechanism left to its own devices will eventually fail and you will
> be left with inconsistent data.  Nothing wrong with dblink but you need to
> build in some error handling.
>
> I've built systems that accomplished similar things by writing data to a
> queue table (in addition to your local master customer table) which is then
> reconciled/synced out to other nodes or process by an periodic script that
> is able to deal with or alert on locking/dupe key/network and other errors
> that keep it from properly syncing a row to all other nodes.  This
> introduces added durability to your sync mechanism but also introduces some
> lag time.  Pick your poison.
>
> -steve
>
> On Sat, May 12, 2012 at 7:28 AM, John Fabiani <johnf@jfcomputer.com> wrote:
> > I need to maintain a sync-ed table across several databases.  For
> > example I have a customer table in 5 databases.  If a user of any of
> > the databases inserts a new customer I need to insert the new record
> > into the other four databases.  But question is updates and deletes.
> >
> > I can use a trigger and dblink to update the other databases when the
> > action
> > is an insert because in each of the other databases I don't have to
> > worry
> > about a locked record.  But what happens if a user is updating at the
> > same moment as a different user in a different database is updating the
> > same customer.  Can a race condition occur?
> >
> > I was thinking I could create a master database.  And have all the other
> > databases use dblink to excute the master trigger.
> >
> >
> > Any advise would be helpful,
> >
> > Johnf
> >
> > --
> > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-sql




--
Regards,
----------------------------------------------
Trinath Somanchi,
+91 9866 235 130

Re: syncing - between databases

From
Devrim GÜNDÜZ
Date:
On Mon, 2012-05-14 at 17:22 +0530, Trinath Somanchi wrote:
> You can use PgCluster with Slony-II for this type of requirements.

PGCluster is a dead project.
--
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz

Re: syncing - between databases

From
Trinath Somanchi
Date:
True, But it has its own stable version capable to handle this functionality


On Mon, May 14, 2012 at 5:23 PM, Devrim GÜNDÜZ <devrim@gunduz.org> wrote:
On Mon, 2012-05-14 at 17:22 +0530, Trinath Somanchi wrote:
> You can use PgCluster with Slony-II for this type of requirements.

PGCluster is a dead project.
--
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz



--
Regards,
----------------------------------------------
Trinath Somanchi,
+91 9866 235 130

Re: syncing - between databases

From
Devrim GÜNDÜZ
Date:
On Mon, 2012-05-14 at 17:25 +0530, Trinath Somanchi wrote:
> True, But it has its own stable version capable to handle this
> functionality

PGCluster was not a stable project.
--
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz