Thread: preserving data after updates

preserving data after updates

From
Scott Frankel
Date:
Is there a canonical form that db schema designers use
to save changes to the data in their databases?

For example, given a table with rows of data, if I UPDATE
a field in a row, the previous value is lost.  If I wanted to
track the changes to my data over time, it occurs to me that
I could,

1) copy the whole row of data using the new value, thus
      leaving the old row intact in the db for fishing expeditions,
      posterity, &c.
      -- awfully wasteful, especially with binary data

2) enter a new row that contains only new data fields, requiring
      building a full set of data through heavy lifting and multiple
queries
      through 'n' number of old rows
      -- overly complex query design probably leading to errors

3) create a new table that tracks changes
      -- the table is either wide enough to mirror all columns in
          the working table, or uses generic columns and API tricks to
          parse token pair strings, ...

4) other?

Thanks
Scott


Re: preserving data after updates

From
"Ian Harding"
Date:
I do option 3.  A generic trigger writes the tablename, field name, type
of change (insert/update/delete) and the old and new values for columns
that were affected.  It is kind of a hog, but it works very well.  I
have cron delete old entries so it doesn't eat my whole disk.

I haven't tried to get it to give up the data in the same representation
as the source table, but it shouldn't be too hard with a set returning
function, such that you tell it the table name and timestamp and it
returns records as they existed at that time.  I usually just query it
directly to "see what happened".


>>> Scott Frankel <leknarf@pacbell.net> 03/03/05 3:51 PM >>>

Is there a canonical form that db schema designers use
to save changes to the data in their databases?

For example, given a table with rows of data, if I UPDATE
a field in a row, the previous value is lost.  If I wanted to
track the changes to my data over time, it occurs to me that
I could,

1) copy the whole row of data using the new value, thus
      leaving the old row intact in the db for fishing expeditions,
      posterity, &c.
      -- awfully wasteful, especially with binary data

2) enter a new row that contains only new data fields, requiring
      building a full set of data through heavy lifting and multiple
queries
      through 'n' number of old rows
      -- overly complex query design probably leading to errors

3) create a new table that tracks changes
      -- the table is either wide enough to mirror all columns in
          the working table, or uses generic columns and API tricks to
          parse token pair strings, ...

4) other?

Thanks
Scott


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


Re: preserving data after updates

From
"Greg Patnude"
Date:
I use a modified form of option 3 with an ON UPDATE RULE.... the update rule
copies the row to an inherited table...

CREATE TABLE dm_user (

       id SERIAL NOT NULL PRIMARY KEY,

       lu_user_type INTEGER NOT NULL REFERENCES lu_user_type(id),
       dm_user_address INTEGER NOT NULL DEFAULT 0,
       dm_user_email INTEGER NOT NULL DEFAULT 0,

       f_name VARCHAR(50) NOT NULL,
       m_name VARCHAR(50) NOT NULL,
       l_name VARCHAR(50) NOT NULL,

       uname VARCHAR(20) NOT NULL,
       upwd VARCHAR(20) NOT NULL,
       pwd_change_reqd BOOLEAN DEFAULT FALSE,
       login_allowed BOOLEAN DEFAULT TRUE,
       lost_passwd BOOLEAN DEFAULT FALSE,

       create_dt TIMESTAMP NOT NULL DEFAULT NOW(),
       change_dt TIMESTAMP NOT NULL DEFAULT NOW(),
       change_id INTEGER NOT NULL DEFAULT 0,
       active_flag BOOLEAN NOT NULL DEFAULT TRUE

) WITH OIDS;


CREATE TABLE dm_user_history (

       history_id SERIAL NOT NULL PRIMARY KEY,
       hist_create_dt TIMESTAMP NOT NULL DEFAULT NOW()

) INHERITS (dm_user);

CREATE RULE dm_user_upd_history AS ON UPDATE TO dm_user DO INSERT INTO
dm_user_history SELECT * FROM dm_user WHERE id = old.id;

CREATE RULE dm_user_nodelete AS ON DELETE TO dm_user DO INSTEAD UPDATE
dm_user SET active_flag = FALSE WHERE id = old.id;



"Scott Frankel" <leknarf@pacbell.net> wrote in message
news:bd02bff5561d8b271301ba10bafca105@pacbell.net...
>
> Is there a canonical form that db schema designers use
> to save changes to the data in their databases?
>
> For example, given a table with rows of data, if I UPDATE
> a field in a row, the previous value is lost.  If I wanted to
> track the changes to my data over time, it occurs to me that
> I could,
>
> 1) copy the whole row of data using the new value, thus
>      leaving the old row intact in the db for fishing expeditions,
>      posterity, &c.
>      -- awfully wasteful, especially with binary data
>
> 2) enter a new row that contains only new data fields, requiring
>      building a full set of data through heavy lifting and multiple
> queries
>      through 'n' number of old rows
>      -- overly complex query design probably leading to errors
>
> 3) create a new table that tracks changes
>      -- the table is either wide enough to mirror all columns in
>          the working table, or uses generic columns and API tricks to
>          parse token pair strings, ...
>
> 4) other?
>
> Thanks
> Scott
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>



Re: preserving data after updates

From
"Berend Tober"
Date:
> I use a modified form of option 3 with an ON UPDATE RULE.... the update rule
> copies the row to an inherited table...

I just gotta say that THAT is one COOL use of PG inheritance! Do you find that
it works well and is robust and all the good stuff it seems like would be the
case?


-- Berend


Re: preserving data after updates

From
Scott Frankel
Date:
Door number 3.

Thanks for the responses and terrific suggestions!
Scott



On Mar 4, 2005, at 8:28 AM, Greg Patnude wrote:

> I use a modified form of option 3 with an ON UPDATE RULE.... the
> update rule
> copies the row to an inherited table...
>
> CREATE TABLE dm_user (
>
>        id SERIAL NOT NULL PRIMARY KEY,
>
>        lu_user_type INTEGER NOT NULL REFERENCES lu_user_type(id),
>        dm_user_address INTEGER NOT NULL DEFAULT 0,
>        dm_user_email INTEGER NOT NULL DEFAULT 0,
>
>        f_name VARCHAR(50) NOT NULL,
>        m_name VARCHAR(50) NOT NULL,
>        l_name VARCHAR(50) NOT NULL,
>
>        uname VARCHAR(20) NOT NULL,
>        upwd VARCHAR(20) NOT NULL,
>        pwd_change_reqd BOOLEAN DEFAULT FALSE,
>        login_allowed BOOLEAN DEFAULT TRUE,
>        lost_passwd BOOLEAN DEFAULT FALSE,
>
>        create_dt TIMESTAMP NOT NULL DEFAULT NOW(),
>        change_dt TIMESTAMP NOT NULL DEFAULT NOW(),
>        change_id INTEGER NOT NULL DEFAULT 0,
>        active_flag BOOLEAN NOT NULL DEFAULT TRUE
>
> ) WITH OIDS;
>
>
> CREATE TABLE dm_user_history (
>
>        history_id SERIAL NOT NULL PRIMARY KEY,
>        hist_create_dt TIMESTAMP NOT NULL DEFAULT NOW()
>
> ) INHERITS (dm_user);
>
> CREATE RULE dm_user_upd_history AS ON UPDATE TO dm_user DO INSERT INTO
> dm_user_history SELECT * FROM dm_user WHERE id = old.id;
>
> CREATE RULE dm_user_nodelete AS ON DELETE TO dm_user DO INSTEAD UPDATE
> dm_user SET active_flag = FALSE WHERE id = old.id;
>
>
>
> "Scott Frankel" <leknarf@pacbell.net> wrote in message
> news:bd02bff5561d8b271301ba10bafca105@pacbell.net...
>>
>> Is there a canonical form that db schema designers use
>> to save changes to the data in their databases?
>>
>> For example, given a table with rows of data, if I UPDATE
>> a field in a row, the previous value is lost.  If I wanted to
>> track the changes to my data over time, it occurs to me that
>> I could,
>>
>> 1) copy the whole row of data using the new value, thus
>>      leaving the old row intact in the db for fishing expeditions,
>>      posterity, &c.
>>      -- awfully wasteful, especially with binary data
>>
>> 2) enter a new row that contains only new data fields, requiring
>>      building a full set of data through heavy lifting and multiple
>> queries
>>      through 'n' number of old rows
>>      -- overly complex query design probably leading to errors
>>
>> 3) create a new table that tracks changes
>>      -- the table is either wide enough to mirror all columns in
>>          the working table, or uses generic columns and API tricks to
>>          parse token pair strings, ...
>>
>> 4) other?
>>
>> Thanks
>> Scott
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 4: Don't 'kill -9' the postmaster
>>
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>


Re: preserving data after updates

From
"Greg Patnude"
Date:

-----Original Message-----
From: Berend Tober [mailto:btober@seaworthysys.com]
Sent: Friday, March 04, 2005 8:47 AM
To: Greg Patnude
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] preserving data after updates

> I use a modified form of option 3 with an ON UPDATE RULE.... the update
rule
> copies the row to an inherited table...

I just gotta say that THAT is one COOL use of PG inheritance! Do you find
that
it works well and is robust and all the good stuff it seems like would be
the
case?


-- Berend

[GP->] Thank you... !

[GP->]  I find it VERY effective and completely transparent to both the
programmer and the end-user... I don't use it on ALL of the tables in a
given schema... ONLY the tables where end-users can manipulate / change
data...
[GP->]
What it boils down to is that I can use it as a sort of a virtual "rollback"
system by querying the inherited table and updating the parent table with an
original value from the child -- Of course... this results in another change
to the child but it can also be undone...

Re: preserving data after updates

From
Tzahi Fadida
Date:
Its called a "temporal database".
Usually its intended for medical or police databases where
you need a hind sight. i.e. if today is 31/12/2005, what did we know at
20/12/2005.
for example, for a doctor appearing at court and required to testify
what he knew at 20/12/2005.
Very cool.
It would be nice if postgreSQL could have a switch that
could turn it into a temporal database.

Regards,
    tzahi.

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Scott Frankel
> Sent: Friday, March 04, 2005 1:51 AM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] preserving data after updates
>
>
>
> Is there a canonical form that db schema designers use
> to save changes to the data in their databases?
>
> For example, given a table with rows of data, if I UPDATE
> a field in a row, the previous value is lost.  If I wanted to
> track the changes to my data over time, it occurs to me that I could,
>
> 1) copy the whole row of data using the new value, thus
>       leaving the old row intact in the db for fishing expeditions,
>       posterity, &c.
>       -- awfully wasteful, especially with binary data
>
> 2) enter a new row that contains only new data fields, requiring
>       building a full set of data through heavy lifting and multiple
> queries
>       through 'n' number of old rows
>       -- overly complex query design probably leading to errors
>
> 3) create a new table that tracks changes
>       -- the table is either wide enough to mirror all columns in
>           the working table, or uses generic columns and API tricks to
>           parse token pair strings, ...
>
> 4) other?
>
> Thanks
> Scott
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
>



Re: preserving data after updates

From
Martijn van Oosterhout
Date:
On Sat, Mar 05, 2005 at 03:46:53AM +0200, Tzahi Fadida wrote:
> Its called a "temporal database".
> Usually its intended for medical or police databases where
> you need a hind sight. i.e. if today is 31/12/2005, what did we know at
> 20/12/2005.
> for example, for a doctor appearing at court and required to testify
> what he knew at 20/12/2005.
> Very cool.
> It would be nice if postgreSQL could have a switch that
> could turn it into a temporal database.

It used to be builtin a long time ago. It's since been moved to the
contrib module "timetravel". It does historical queries and stuff...

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: preserving data after updates

From
Karsten Hilbert
Date:
> Its called a "temporal database".
> Usually its intended for medical or police databases where
> you need a hind sight. i.e. if today is 31/12/2005, what did we know at
> 20/12/2005.
> for example, for a doctor appearing at court and required to testify
> what he knew at 20/12/2005.
What he "can have known" not "what he knew".

With GnuMed we are running a trigger based auditing solution
which so far works nicely.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: preserving data after updates

From
Berend Tober
Date:
Greg Patnude wrote:

> Yeah… this is where the inheritance model gets a little funky… What do
> you have “SQL_INEHERITANCE” set to when you dump the database ? I’ve
> never tested this so I don’t know if it makes a difference being on or
> off when you dump a table…. You might try it and compare the two
> versions of the DDL for your inherited tables…
>

I set SQL_INEHERITANCE to OFF because I have lots of existing queries in
an application that do not include the "ONLY" option. I did try setting
it back on the default ON, and the problem remained..

> Note: postgreSQL recommends leaving SQL_INHERITANCE at “ON” and using
> the keyword “ONLY”
>
> I’ve seen that before… The problem is that pg_dump creates the
> person_history table as a standalone table (look at the DDL) with the
> keyword “INHERITS” – My gut feeling is that this is probably a bug in
> pg_dump – I don’t think pg_dump really knows how to dump just the
> additional fields specified in an inherited table so it dumps the
> actual definition it finds in the system catalogs…
>
> If you poke around in pg_catalog, you’ll find that the catalog
> definition is a combination of pointers to the parent table and any
> additional fields, constraints, rules, etc you defined when you
> created the inherited table.
>
> My work-around has been to drop and recreate the history tables using
> the “original” SQL I used to create the inherited table in the first
> place…
>