Thread: Home-brewed table syncronization

Home-brewed table syncronization

From
Michael A Nachbaur
Date:
Hello everyone,

While I still have plans to do some more work on RServ, it is apparent to me 
that I need a more immediate solution.  I'm not replicating my entire 
dataset, but rather just some "summary" tables that are maintained by stored 
procedures. This means RServ is an iffy proposition at best for me right now.

So, I'm looking at syncronizing 4 tables from one master database to several 
child databases. I'm thinking of doing the following with DBD::Multiplex:

DELETE FROM TableA;
INSERT INTO TableA (..) VALUES (...);
....

on all the child databases, but I'm not sure what kind of impact this would 
have on my servers.  My impression is that this would hammer the indexes, and 
might blow any memory optimization out the window.  Only a few records in my 
dataset will change from time-to-time, but just the process of determining 
what is different may take more effort than simply rebuilding.

What are your thoughts?  These tables will probably only have a maximum of 
10,000 rows in total, but I'm going to have to syncronize once per minute; as 
a result, I wouldn't like this to take any longer than about 10 seconds.

-- 
/* Michael A. Nachbaur <mike@nachbaur.com>* http://nachbaur.com/pgpkey.asc*/

"Out," he said. People who can supply that amount of firepower don't need to 
supply verbs as well.



Re: Home-brewed table syncronization

From
Cliff Wells
Date:
On Wed, 2003-07-09 at 14:14, Michael A Nachbaur wrote:

> So, I'm looking at syncronizing 4 tables from one master database to several 
> child databases. I'm thinking of doing the following with DBD::Multiplex:
> 
> DELETE FROM TableA;
> INSERT INTO TableA (..) VALUES (...);
> ....
> 
> on all the child databases, but I'm not sure what kind of impact this would 
> have on my servers.  My impression is that this would hammer the indexes, and 
> might blow any memory optimization out the window.  Only a few records in my 
> dataset will change from time-to-time, but just the process of determining 
> what is different may take more effort than simply rebuilding.

Keep a timestamp associated with each record.  Only update the records
with timestamps later than your last sync.

-- 
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726  (800) 735-0555



Re: Home-brewed table syncronization

From
Michael A Nachbaur
Date:
On Wednesday 09 July 2003 02:28 pm, Cliff Wells wrote:
> On Wed, 2003-07-09 at 14:14, Michael A Nachbaur wrote:
> > So, I'm looking at syncronizing 4 tables from one master database to
> > several child databases. I'm thinking of doing the following with
> > DBD::Multiplex:
> >
> > DELETE FROM TableA;
> > INSERT INTO TableA (..) VALUES (...);
> > ....
> >
> > on all the child databases, but I'm not sure what kind of impact this
> > would have on my servers.  My impression is that this would hammer the
> > indexes, and might blow any memory optimization out the window.  Only a
> > few records in my dataset will change from time-to-time, but just the
> > process of determining what is different may take more effort than simply
> > rebuilding.
>
> Keep a timestamp associated with each record.  Only update the records
> with timestamps later than your last sync.

I'm dealing with an existing database structure that, though I can change it, 
has a lot of impact on the rest of my infrastructure.  If I can find a way of 
doing this without resorting to timestamps, I'd much rather do it that way.

-- 
/* Michael A. Nachbaur <mike@nachbaur.com>* http://nachbaur.com/pgpkey.asc*/

"Rome wasn't burned in a day. "



Re: Home-brewed table syncronization

From
Jason Earl
Date:
Michael A Nachbaur <mike@nachbaur.com> writes:

> On Wednesday 09 July 2003 02:28 pm, Cliff Wells wrote:
>> On Wed, 2003-07-09 at 14:14, Michael A Nachbaur wrote:
>> > So, I'm looking at syncronizing 4 tables from one master database to
>> > several child databases. I'm thinking of doing the following with
>> > DBD::Multiplex:
>> >
>> > DELETE FROM TableA;
>> > INSERT INTO TableA (..) VALUES (...);
>> > ....
>> >
>> > on all the child databases, but I'm not sure what kind of impact
>> > this would have on my servers.  My impression is that this would
>> > hammer the indexes, and might blow any memory optimization out
>> > the window.  Only a few records in my dataset will change from
>> > time-to-time, but just the process of determining what is
>> > different may take more effort than simply rebuilding.
>>
>> Keep a timestamp associated with each record.  Only update the records
>> with timestamps later than your last sync.
>
> I'm dealing with an existing database structure that, though I can
> change it, has a lot of impact on the rest of my infrastructure.  If
> I can find a way of doing this without resorting to timestamps, I'd
> much rather do it that way.

Would it be possible to add another table (changelog) that contained
the primary key of the record that has changed and a timestamp?  The
changelog table could be maintained via triggers (on update, insert,
or delete).  Your synchronization software could then be relatively
simple.  It would simply need to check your changelog table for rows
that have changed.

Jason


Re: Home-brewed table syncronization

From
Raj Mathur
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>>>>> "Michael" == Michael A Nachbaur <Michael> writes:
   Michael> On Wednesday 09 July 2003 02:28 pm, Cliff Wells wrote:   >> On Wed, 2003-07-09 at 14:14, Michael A Nachbaur
wrote:> So,   >> I'm looking at syncronizing 4 tables from one master database   >> to > several child databases. I'm
thinkingof doing the   >> following with > DBD::Multiplex:   >> >   >> > DELETE FROM TableA; > INSERT INTO TableA (..)
VALUES(...); >   >> ....   >> >   >> > on all the child databases, but I'm not sure what kind of   >> impact this >
wouldhave on my servers.  My impression is that   >> this would hammer the > indexes, and might blow any memory   >>
optimizationout the window.  Only a > few records in my   >> dataset will change from time-to-time, but just the >
process  >> of determining what is different may take more effort than   >> simply > rebuilding.   >>    >> Keep a
timestampassociated with each record.  Only update the   >> records with timestamps later than your last sync.
 
   Michael> I'm dealing with an existing database structure that,   Michael> though I can change it, has a lot of
impacton the rest   Michael> of my infrastructure.  If I can find a way of doing this   Michael> without resorting to
timestamps,I'd much rather do it   Michael> that way.
 

Had the same issue, so I made a separate table to store
meta-information about what records have been changed in my master
tables.  Note that I do not store the actual change, just which record
was affected and the nature of the change: delete, insert or update.

The deltas table is called, strangely enough, ``delta'', and here's
the code that manages it (for a table called pdetail (keys: package,
pool, timeslot):

- -- Table pdetail

create or replace function pdetail_update_delta()       returns opaque       as '
declare       mykeyval varchar(1024) ;       upd char(1) ;

begin       if TG_OP = ''UPDATE'' then               upd := ''U'' ;               mykeyval := OLD.package || ''|'' ||
OLD.pool|| ''|''                       || OLD.timeslot ;       end if ;       if TG_OP = ''INSERT'' then
upd:= ''I'' ;               mykeyval := NEW.package || ''|'' || NEW.pool || ''|''                       || NEW.timeslot
;      end if ;       if TG_OP = ''DELETE'' then               upd := ''D'' ;               mykeyval := OLD.package ||
''|''|| OLD.pool || ''|''                       || OLD.timeslot ;               execute ''delete from delta where
relation=''''''                      || TG_RELNAME || '''''' and keyval=''''''                       || mykeyval ||
'''''';'';       end if ;       insert into delta ( relation , keyval , timestamp , what )               values (
''pdetail'', mykeyval , now () , upd ) ;       if TG_OP = ''UPDATE'' or TG_OP = ''INSERT'' then               return
NEW;       end if ;       if TG_OP = ''DELETE'' then               return OLD ;       end if ;
 
end ;
' language plpgsql ;
create trigger pdetail_update_delta_trigger       after update on pdetail       for each row       execute procedure
pdetail_update_delta();
 
create trigger pdetail_insert_delta_trigger       after insert on pdetail       for each row       execute procedure
pdetail_update_delta();
 
create trigger pdetail_delete_delta_trigger       before delete on pdetail       for each row       execute procedure
pdetail_update_delta();
 

Table delta itself looks like this:

create table delta
(       relation        varchar(32) , -- Table name to which update was made       keyval          varchar(1024) , --
Keyvalue of the updated record       timestamp       timestamp without time zone default now() , -- When       what
      char(1)                       check (what = 'U' or what = 'D' or what = 'I') ,
 
       primary key ( relation , keyval , timestamp )
) ;

Not much experienced with PgSQL, so would appreciate any tips the
masters can give for improving the plpgsql code.  However, it works
for me as it is at the moment.

You are free to use this code under the terms of the GNU GPL.

Regards,

- -- Raju
- -- 
Raj Mathur                raju@kandalaya.org      http://kandalaya.org/      GPG: 78D4 FC67 367F 40E2 0DD5  0FEF C968
D0EFCC68 D17F                     It is the mind that moves
 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.6 and Gnu Privacy Guard <http://www.gnupg.org/>

iD8DBQE/DN78yWjQ78xo0X8RAsmXAJ4k1cq7mFiRxUb6EGO0R81MVfAWfgCfdGxN
K7g2SsvUAPedg7RH86OZcTY=
=JkN/
-----END PGP SIGNATURE-----