Thread: Home-brewed table syncronization
Hello everyone, While I still have plans to do some more work on RServ, it is apparent to me that I need a more immediate solution. I'm not replicating my entire dataset, but rather just some "summary" tables that are maintained by stored procedures. This means RServ is an iffy proposition at best for me right now. So, I'm looking at syncronizing 4 tables from one master database to several child databases. I'm thinking of doing the following with DBD::Multiplex: DELETE FROM TableA; INSERT INTO TableA (..) VALUES (...); .... on all the child databases, but I'm not sure what kind of impact this would have on my servers. My impression is that this would hammer the indexes, and might blow any memory optimization out the window. Only a few records in my dataset will change from time-to-time, but just the process of determining what is different may take more effort than simply rebuilding. What are your thoughts? These tables will probably only have a maximum of 10,000 rows in total, but I'm going to have to syncronize once per minute; as a result, I wouldn't like this to take any longer than about 10 seconds. -- /* Michael A. Nachbaur <mike@nachbaur.com>* http://nachbaur.com/pgpkey.asc*/ "Out," he said. People who can supply that amount of firepower don't need to supply verbs as well.
On Wed, 2003-07-09 at 14:14, Michael A Nachbaur wrote: > So, I'm looking at syncronizing 4 tables from one master database to several > child databases. I'm thinking of doing the following with DBD::Multiplex: > > DELETE FROM TableA; > INSERT INTO TableA (..) VALUES (...); > .... > > on all the child databases, but I'm not sure what kind of impact this would > have on my servers. My impression is that this would hammer the indexes, and > might blow any memory optimization out the window. Only a few records in my > dataset will change from time-to-time, but just the process of determining > what is different may take more effort than simply rebuilding. Keep a timestamp associated with each record. Only update the records with timestamps later than your last sync. -- Cliff Wells, Software Engineer Logiplex Corporation (www.logiplex.net) (503) 978-6726 (800) 735-0555
On Wednesday 09 July 2003 02:28 pm, Cliff Wells wrote: > On Wed, 2003-07-09 at 14:14, Michael A Nachbaur wrote: > > So, I'm looking at syncronizing 4 tables from one master database to > > several child databases. I'm thinking of doing the following with > > DBD::Multiplex: > > > > DELETE FROM TableA; > > INSERT INTO TableA (..) VALUES (...); > > .... > > > > on all the child databases, but I'm not sure what kind of impact this > > would have on my servers. My impression is that this would hammer the > > indexes, and might blow any memory optimization out the window. Only a > > few records in my dataset will change from time-to-time, but just the > > process of determining what is different may take more effort than simply > > rebuilding. > > Keep a timestamp associated with each record. Only update the records > with timestamps later than your last sync. I'm dealing with an existing database structure that, though I can change it, has a lot of impact on the rest of my infrastructure. If I can find a way of doing this without resorting to timestamps, I'd much rather do it that way. -- /* Michael A. Nachbaur <mike@nachbaur.com>* http://nachbaur.com/pgpkey.asc*/ "Rome wasn't burned in a day. "
Michael A Nachbaur <mike@nachbaur.com> writes: > On Wednesday 09 July 2003 02:28 pm, Cliff Wells wrote: >> On Wed, 2003-07-09 at 14:14, Michael A Nachbaur wrote: >> > So, I'm looking at syncronizing 4 tables from one master database to >> > several child databases. I'm thinking of doing the following with >> > DBD::Multiplex: >> > >> > DELETE FROM TableA; >> > INSERT INTO TableA (..) VALUES (...); >> > .... >> > >> > on all the child databases, but I'm not sure what kind of impact >> > this would have on my servers. My impression is that this would >> > hammer the indexes, and might blow any memory optimization out >> > the window. Only a few records in my dataset will change from >> > time-to-time, but just the process of determining what is >> > different may take more effort than simply rebuilding. >> >> Keep a timestamp associated with each record. Only update the records >> with timestamps later than your last sync. > > I'm dealing with an existing database structure that, though I can > change it, has a lot of impact on the rest of my infrastructure. If > I can find a way of doing this without resorting to timestamps, I'd > much rather do it that way. Would it be possible to add another table (changelog) that contained the primary key of the record that has changed and a timestamp? The changelog table could be maintained via triggers (on update, insert, or delete). Your synchronization software could then be relatively simple. It would simply need to check your changelog table for rows that have changed. Jason
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >>>>> "Michael" == Michael A Nachbaur <Michael> writes: Michael> On Wednesday 09 July 2003 02:28 pm, Cliff Wells wrote: >> On Wed, 2003-07-09 at 14:14, Michael A Nachbaur wrote:> So, >> I'm looking at syncronizing 4 tables from one master database >> to > several child databases. I'm thinkingof doing the >> following with > DBD::Multiplex: >> > >> > DELETE FROM TableA; > INSERT INTO TableA (..) VALUES(...); > >> .... >> > >> > on all the child databases, but I'm not sure what kind of >> impact this > wouldhave on my servers. My impression is that >> this would hammer the > indexes, and might blow any memory >> optimizationout the window. Only a > few records in my >> dataset will change from time-to-time, but just the > process >> of determining what is different may take more effort than >> simply > rebuilding. >> >> Keep a timestampassociated with each record. Only update the >> records with timestamps later than your last sync. Michael> I'm dealing with an existing database structure that, Michael> though I can change it, has a lot of impacton the rest Michael> of my infrastructure. If I can find a way of doing this Michael> without resorting to timestamps,I'd much rather do it Michael> that way. Had the same issue, so I made a separate table to store meta-information about what records have been changed in my master tables. Note that I do not store the actual change, just which record was affected and the nature of the change: delete, insert or update. The deltas table is called, strangely enough, ``delta'', and here's the code that manages it (for a table called pdetail (keys: package, pool, timeslot): - -- Table pdetail create or replace function pdetail_update_delta() returns opaque as ' declare mykeyval varchar(1024) ; upd char(1) ; begin if TG_OP = ''UPDATE'' then upd := ''U'' ; mykeyval := OLD.package || ''|'' || OLD.pool|| ''|'' || OLD.timeslot ; end if ; if TG_OP = ''INSERT'' then upd:= ''I'' ; mykeyval := NEW.package || ''|'' || NEW.pool || ''|'' || NEW.timeslot ; end if ; if TG_OP = ''DELETE'' then upd := ''D'' ; mykeyval := OLD.package || ''|''|| OLD.pool || ''|'' || OLD.timeslot ; execute ''delete from delta where relation='''''' || TG_RELNAME || '''''' and keyval='''''' || mykeyval || '''''';''; end if ; insert into delta ( relation , keyval , timestamp , what ) values ( ''pdetail'', mykeyval , now () , upd ) ; if TG_OP = ''UPDATE'' or TG_OP = ''INSERT'' then return NEW; end if ; if TG_OP = ''DELETE'' then return OLD ; end if ; end ; ' language plpgsql ; create trigger pdetail_update_delta_trigger after update on pdetail for each row execute procedure pdetail_update_delta(); create trigger pdetail_insert_delta_trigger after insert on pdetail for each row execute procedure pdetail_update_delta(); create trigger pdetail_delete_delta_trigger before delete on pdetail for each row execute procedure pdetail_update_delta(); Table delta itself looks like this: create table delta ( relation varchar(32) , -- Table name to which update was made keyval varchar(1024) , -- Keyvalue of the updated record timestamp timestamp without time zone default now() , -- When what char(1) check (what = 'U' or what = 'D' or what = 'I') , primary key ( relation , keyval , timestamp ) ) ; Not much experienced with PgSQL, so would appreciate any tips the masters can give for improving the plpgsql code. However, it works for me as it is at the moment. You are free to use this code under the terms of the GNU GPL. Regards, - -- Raju - -- Raj Mathur raju@kandalaya.org http://kandalaya.org/ GPG: 78D4 FC67 367F 40E2 0DD5 0FEF C968 D0EFCC68 D17F It is the mind that moves -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) Comment: Processed by Mailcrypt 3.5.6 and Gnu Privacy Guard <http://www.gnupg.org/> iD8DBQE/DN78yWjQ78xo0X8RAsmXAJ4k1cq7mFiRxUb6EGO0R81MVfAWfgCfdGxN K7g2SsvUAPedg7RH86OZcTY= =JkN/ -----END PGP SIGNATURE-----