Thread: preserving data after updates
Is there a canonical form that db schema designers use to save changes to the data in their databases? For example, given a table with rows of data, if I UPDATE a field in a row, the previous value is lost. If I wanted to track the changes to my data over time, it occurs to me that I could, 1) copy the whole row of data using the new value, thus leaving the old row intact in the db for fishing expeditions, posterity, &c. -- awfully wasteful, especially with binary data 2) enter a new row that contains only new data fields, requiring building a full set of data through heavy lifting and multiple queries through 'n' number of old rows -- overly complex query design probably leading to errors 3) create a new table that tracks changes -- the table is either wide enough to mirror all columns in the working table, or uses generic columns and API tricks to parse token pair strings, ... 4) other? Thanks Scott
I do option 3. A generic trigger writes the tablename, field name, type of change (insert/update/delete) and the old and new values for columns that were affected. It is kind of a hog, but it works very well. I have cron delete old entries so it doesn't eat my whole disk. I haven't tried to get it to give up the data in the same representation as the source table, but it shouldn't be too hard with a set returning function, such that you tell it the table name and timestamp and it returns records as they existed at that time. I usually just query it directly to "see what happened". >>> Scott Frankel <leknarf@pacbell.net> 03/03/05 3:51 PM >>> Is there a canonical form that db schema designers use to save changes to the data in their databases? For example, given a table with rows of data, if I UPDATE a field in a row, the previous value is lost. If I wanted to track the changes to my data over time, it occurs to me that I could, 1) copy the whole row of data using the new value, thus leaving the old row intact in the db for fishing expeditions, posterity, &c. -- awfully wasteful, especially with binary data 2) enter a new row that contains only new data fields, requiring building a full set of data through heavy lifting and multiple queries through 'n' number of old rows -- overly complex query design probably leading to errors 3) create a new table that tracks changes -- the table is either wide enough to mirror all columns in the working table, or uses generic columns and API tricks to parse token pair strings, ... 4) other? Thanks Scott ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
I use a modified form of option 3 with an ON UPDATE RULE.... the update rule copies the row to an inherited table... CREATE TABLE dm_user ( id SERIAL NOT NULL PRIMARY KEY, lu_user_type INTEGER NOT NULL REFERENCES lu_user_type(id), dm_user_address INTEGER NOT NULL DEFAULT 0, dm_user_email INTEGER NOT NULL DEFAULT 0, f_name VARCHAR(50) NOT NULL, m_name VARCHAR(50) NOT NULL, l_name VARCHAR(50) NOT NULL, uname VARCHAR(20) NOT NULL, upwd VARCHAR(20) NOT NULL, pwd_change_reqd BOOLEAN DEFAULT FALSE, login_allowed BOOLEAN DEFAULT TRUE, lost_passwd BOOLEAN DEFAULT FALSE, create_dt TIMESTAMP NOT NULL DEFAULT NOW(), change_dt TIMESTAMP NOT NULL DEFAULT NOW(), change_id INTEGER NOT NULL DEFAULT 0, active_flag BOOLEAN NOT NULL DEFAULT TRUE ) WITH OIDS; CREATE TABLE dm_user_history ( history_id SERIAL NOT NULL PRIMARY KEY, hist_create_dt TIMESTAMP NOT NULL DEFAULT NOW() ) INHERITS (dm_user); CREATE RULE dm_user_upd_history AS ON UPDATE TO dm_user DO INSERT INTO dm_user_history SELECT * FROM dm_user WHERE id = old.id; CREATE RULE dm_user_nodelete AS ON DELETE TO dm_user DO INSTEAD UPDATE dm_user SET active_flag = FALSE WHERE id = old.id; "Scott Frankel" <leknarf@pacbell.net> wrote in message news:bd02bff5561d8b271301ba10bafca105@pacbell.net... > > Is there a canonical form that db schema designers use > to save changes to the data in their databases? > > For example, given a table with rows of data, if I UPDATE > a field in a row, the previous value is lost. If I wanted to > track the changes to my data over time, it occurs to me that > I could, > > 1) copy the whole row of data using the new value, thus > leaving the old row intact in the db for fishing expeditions, > posterity, &c. > -- awfully wasteful, especially with binary data > > 2) enter a new row that contains only new data fields, requiring > building a full set of data through heavy lifting and multiple > queries > through 'n' number of old rows > -- overly complex query design probably leading to errors > > 3) create a new table that tracks changes > -- the table is either wide enough to mirror all columns in > the working table, or uses generic columns and API tricks to > parse token pair strings, ... > > 4) other? > > Thanks > Scott > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
> I use a modified form of option 3 with an ON UPDATE RULE.... the update rule > copies the row to an inherited table... I just gotta say that THAT is one COOL use of PG inheritance! Do you find that it works well and is robust and all the good stuff it seems like would be the case? -- Berend
Door number 3. Thanks for the responses and terrific suggestions! Scott On Mar 4, 2005, at 8:28 AM, Greg Patnude wrote: > I use a modified form of option 3 with an ON UPDATE RULE.... the > update rule > copies the row to an inherited table... > > CREATE TABLE dm_user ( > > id SERIAL NOT NULL PRIMARY KEY, > > lu_user_type INTEGER NOT NULL REFERENCES lu_user_type(id), > dm_user_address INTEGER NOT NULL DEFAULT 0, > dm_user_email INTEGER NOT NULL DEFAULT 0, > > f_name VARCHAR(50) NOT NULL, > m_name VARCHAR(50) NOT NULL, > l_name VARCHAR(50) NOT NULL, > > uname VARCHAR(20) NOT NULL, > upwd VARCHAR(20) NOT NULL, > pwd_change_reqd BOOLEAN DEFAULT FALSE, > login_allowed BOOLEAN DEFAULT TRUE, > lost_passwd BOOLEAN DEFAULT FALSE, > > create_dt TIMESTAMP NOT NULL DEFAULT NOW(), > change_dt TIMESTAMP NOT NULL DEFAULT NOW(), > change_id INTEGER NOT NULL DEFAULT 0, > active_flag BOOLEAN NOT NULL DEFAULT TRUE > > ) WITH OIDS; > > > CREATE TABLE dm_user_history ( > > history_id SERIAL NOT NULL PRIMARY KEY, > hist_create_dt TIMESTAMP NOT NULL DEFAULT NOW() > > ) INHERITS (dm_user); > > CREATE RULE dm_user_upd_history AS ON UPDATE TO dm_user DO INSERT INTO > dm_user_history SELECT * FROM dm_user WHERE id = old.id; > > CREATE RULE dm_user_nodelete AS ON DELETE TO dm_user DO INSTEAD UPDATE > dm_user SET active_flag = FALSE WHERE id = old.id; > > > > "Scott Frankel" <leknarf@pacbell.net> wrote in message > news:bd02bff5561d8b271301ba10bafca105@pacbell.net... >> >> Is there a canonical form that db schema designers use >> to save changes to the data in their databases? >> >> For example, given a table with rows of data, if I UPDATE >> a field in a row, the previous value is lost. If I wanted to >> track the changes to my data over time, it occurs to me that >> I could, >> >> 1) copy the whole row of data using the new value, thus >> leaving the old row intact in the db for fishing expeditions, >> posterity, &c. >> -- awfully wasteful, especially with binary data >> >> 2) enter a new row that contains only new data fields, requiring >> building a full set of data through heavy lifting and multiple >> queries >> through 'n' number of old rows >> -- overly complex query design probably leading to errors >> >> 3) create a new table that tracks changes >> -- the table is either wide enough to mirror all columns in >> the working table, or uses generic columns and API tricks to >> parse token pair strings, ... >> >> 4) other? >> >> Thanks >> Scott >> >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 4: Don't 'kill -9' the postmaster >> > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
-----Original Message----- From: Berend Tober [mailto:btober@seaworthysys.com] Sent: Friday, March 04, 2005 8:47 AM To: Greg Patnude Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] preserving data after updates > I use a modified form of option 3 with an ON UPDATE RULE.... the update rule > copies the row to an inherited table... I just gotta say that THAT is one COOL use of PG inheritance! Do you find that it works well and is robust and all the good stuff it seems like would be the case? -- Berend [GP->] Thank you... ! [GP->] I find it VERY effective and completely transparent to both the programmer and the end-user... I don't use it on ALL of the tables in a given schema... ONLY the tables where end-users can manipulate / change data... [GP->] What it boils down to is that I can use it as a sort of a virtual "rollback" system by querying the inherited table and updating the parent table with an original value from the child -- Of course... this results in another change to the child but it can also be undone...
Its called a "temporal database". Usually its intended for medical or police databases where you need a hind sight. i.e. if today is 31/12/2005, what did we know at 20/12/2005. for example, for a doctor appearing at court and required to testify what he knew at 20/12/2005. Very cool. It would be nice if postgreSQL could have a switch that could turn it into a temporal database. Regards, tzahi. > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Scott Frankel > Sent: Friday, March 04, 2005 1:51 AM > To: pgsql-general@postgresql.org > Subject: [GENERAL] preserving data after updates > > > > Is there a canonical form that db schema designers use > to save changes to the data in their databases? > > For example, given a table with rows of data, if I UPDATE > a field in a row, the previous value is lost. If I wanted to > track the changes to my data over time, it occurs to me that I could, > > 1) copy the whole row of data using the new value, thus > leaving the old row intact in the db for fishing expeditions, > posterity, &c. > -- awfully wasteful, especially with binary data > > 2) enter a new row that contains only new data fields, requiring > building a full set of data through heavy lifting and multiple > queries > through 'n' number of old rows > -- overly complex query design probably leading to errors > > 3) create a new table that tracks changes > -- the table is either wide enough to mirror all columns in > the working table, or uses generic columns and API tricks to > parse token pair strings, ... > > 4) other? > > Thanks > Scott > > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > >
On Sat, Mar 05, 2005 at 03:46:53AM +0200, Tzahi Fadida wrote: > Its called a "temporal database". > Usually its intended for medical or police databases where > you need a hind sight. i.e. if today is 31/12/2005, what did we know at > 20/12/2005. > for example, for a doctor appearing at court and required to testify > what he knew at 20/12/2005. > Very cool. > It would be nice if postgreSQL could have a switch that > could turn it into a temporal database. It used to be builtin a long time ago. It's since been moved to the contrib module "timetravel". It does historical queries and stuff... -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
> Its called a "temporal database". > Usually its intended for medical or police databases where > you need a hind sight. i.e. if today is 31/12/2005, what did we know at > 20/12/2005. > for example, for a doctor appearing at court and required to testify > what he knew at 20/12/2005. What he "can have known" not "what he knew". With GnuMed we are running a trigger based auditing solution which so far works nicely. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Greg Patnude wrote: > Yeah… this is where the inheritance model gets a little funky… What do > you have “SQL_INEHERITANCE” set to when you dump the database ? I’ve > never tested this so I don’t know if it makes a difference being on or > off when you dump a table…. You might try it and compare the two > versions of the DDL for your inherited tables… > I set SQL_INEHERITANCE to OFF because I have lots of existing queries in an application that do not include the "ONLY" option. I did try setting it back on the default ON, and the problem remained.. > Note: postgreSQL recommends leaving SQL_INHERITANCE at “ON” and using > the keyword “ONLY” > > I’ve seen that before… The problem is that pg_dump creates the > person_history table as a standalone table (look at the DDL) with the > keyword “INHERITS” – My gut feeling is that this is probably a bug in > pg_dump – I don’t think pg_dump really knows how to dump just the > additional fields specified in an inherited table so it dumps the > actual definition it finds in the system catalogs… > > If you poke around in pg_catalog, you’ll find that the catalog > definition is a combination of pointers to the parent table and any > additional fields, constraints, rules, etc you defined when you > created the inherited table. > > My work-around has been to drop and recreate the history tables using > the “original” SQL I used to create the inherited table in the first > place… >