Thread: Implementing a change log

Implementing a change log

From
Michael Schuerig
Date:
In my current project I have a customer requirement for implementing a
change log. This is not just for auditing purposes, rather it is meant
to be accessible by users so they can get an overview of the change
history of an object. The entire data set is not big, I'm expecting
considerably less than 50.000 records. Changes are only made by about
30 human users.

Queries related to change history are (a) for all changes during a
specific time interval and (b) for all changes to a particular record.

My original intention was to keep two sets of tables. The first
containing only the working set of current records. The second
containing all prior versions. I haven't experimented with such a setup
yet and I'm wondering if it is even necessary. The alternative being to
keep only a single set of tables.

Can anyone relate their experiences with such a thing? Which approaches
should I take into consideration?

Michael

--
Michael Schuerig                       Face reality and stare it down
mailto:michael@schuerig.de        --Jethro Tull, Silver River Turning
http://www.schuerig.de/michael/

Re: Implementing a change log

From
Mike Rylander
Date:
On 9/18/05, Michael Schuerig <michael@schuerig.de> wrote:
>
> In my current project I have a customer requirement for implementing a
> change log. This is not just for auditing purposes, rather it is meant
> to be accessible by users so they can get an overview of the change
> history of an object. The entire data set is not big, I'm expecting
> considerably less than 50.000 records. Changes are only made by about
> 30 human users.
>

Will you need to tell who made what changes?  If so, you'll probably
want each user to be a fully fledged Postgres users.

> Queries related to change history are (a) for all changes during a
> specific time interval and (b) for all changes to a particular record.
>
> My original intention was to keep two sets of tables. The first
> containing only the working set of current records. The second
> containing all prior versions. I haven't experimented with such a setup
> yet and I'm wondering if it is even necessary. The alternative being to
> keep only a single set of tables.
>

Keeping a single set of tables can get pretty complex.  You'd need to
mask each table with a view and a set of rules for working with only
the newest version of each record.  I always set up as separate set of
audit tables to record the old versions of each row.

> Can anyone relate their experiences with such a thing? Which approaches
> should I take into consideration?
>

I blogged about my most recent incarnation of "audit tables" here:
http://open-ils.org/blog/?p=28 .  We don't use Postgres users (we have
2 million), but it would be trivial to modify what I've done there to
work with real PG users or any other particulars of your environment.

Hope that helps!

--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: Implementing a change log

From
Michael Schuerig
Date:
On Monday 19 September 2005 01:29, Mike Rylander wrote:
> On 9/18/05, Michael Schuerig <michael@schuerig.de> wrote:
> > In my current project I have a customer requirement for
> > implementing a change log. This is not just for auditing purposes,
> > rather it is meant to be accessible by users so they can get an
> > overview of the change history of an object. The entire data set is
> > not big, I'm expecting considerably less than 50.000 records.
> > Changes are only made by about 30 human users.
>
> Will you need to tell who made what changes?  If so, you'll probably
> want each user to be a fully fledged Postgres users.

Yes, but that's a thing I can easily handle at application/framework
level, whereas using multiple Pg users would be difficult.

[single set vs. working set and audit set]

> Keeping a single set of tables can get pretty complex.  You'd need to
> mask each table with a view and a set of rules for working with only
> the newest version of each record.  I always set up as separate set
> of audit tables to record the old versions of each row.
>
> > Can anyone relate their experiences with such a thing? Which
> > approaches should I take into consideration?
>
> I blogged about my most recent incarnation of "audit tables" here:
> http://open-ils.org/blog/?p=28 .  We don't use Postgres users (we
> have 2 million), but it would be trivial to modify what I've done
> there to work with real PG users or any other particulars of your
> environment.

Thanks. I'll have to read up on PL/PgSQL to understand your code, but
that's something I intended to do anyway.

Michael

--
Michael Schuerig                       All good people read good books
mailto:michael@schuerig.de                Now your conscience is clear
http://www.schuerig.de/michael/ --Tanita Tikaram, Twist In My Sobriety

Re: Implementing a change log

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> My original intention was to keep two sets of tables. The first
> containing only the working set of current records. The second
> containing all prior versions. I haven't experimented with such a setup
> yet and I'm wondering if it is even necessary. The alternative being to
> keep only a single set of tables.

> Can anyone relate their experiences with such a thing? Which approaches
> should I take into consideration?

I like the multi-table approach; I use a schema named "audit" that contains
a copy of some of the important tables (sans constraints). The nice part is
that I can use the exact same table name, which makes things easier. A few
extra columns on each audit table track who made the change, what type it
was (insert, update, or delete [trigger event]), and the time of the change
[default timestamptz]. Throw in some triggers and you're done.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200509192258
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFDL3txvJuQZxSWSsgRArxSAJ9z4v+pRjULrBg4AiyD4jw7iHpE2wCg/qa0
UwTQQdH4CVfs97l4OgLUATY=
=Yap5
-----END PGP SIGNATURE-----



Re: Implementing a change log

From
Berend Tober
Date:
Greg Sabino Mullane wrote:

>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>
>
>
>>My original intention was to keep two sets of tables. The first
>>containing only the working set of current records. The second
>>containing all prior versions. I haven't experimented with such a setup
>>yet and I'm wondering if it is even necessary. The alternative being to
>>keep only a single set of tables.
>>
>>
>
>
>
>>Can anyone relate their experiences with such a thing? Which approaches
>>should I take into consideration?
>>
>>
>
>I like the multi-table approach; I use a schema named "audit" that contains
>a copy of some of the important tables (sans constraints). The nice part is
>that I can use the exact same table name, which makes things easier. A few
>extra columns on each audit table track who made the change, what type it
>was (insert, update, or delete [trigger event]), and the time of the change
>[default timestamptz]. Throw in some triggers and you're done.
>
>
>
There was a very exciting discussion of this last May, in which Greg
Patnude suggested the most insightful, and in hindsight obviously
appropriate, use of table inheritance ever (IMHO). I slightly refined
the idea and posted documentation comments at the time. See "User
Comments" at

"http://www.postgresql.org/docs/8.0/interactive/tutorial-inheritance.html"

for something that should set you afire.


Re: Implementing a change log

From
Berend Tober
Date:
Berend Tober wrote:

> ...See "User Comments" at
>
> "http://www.postgresql.org/docs/8.0/interactive/tutorial-inheritance.html"
>
>
> for something that should set you afire.

And, commenting on my own post, try this cool function:

/*
The following is based on suggestion by Mike Rylander posted on
Postgresql-General
Sun, 18 Sep 2005 23:29:51 +0000

Rylander's original suggestion employed a trigger and tracked
only row updates. My implementation makes use of rules and
handles both updates and deletions.
*/

\o output.txt
\set ON_ERROR_STOP OFF

DROP SCHEMA auditor CASCADE;
DROP SCHEMA test CASCADE;

\set ON_ERROR_STOP ON

-- Create a schema to contain all of our audit tables and the creator
function
CREATE SCHEMA auditor;

CREATE OR REPLACE FUNCTION auditor.create_auditor(name, name)
  RETURNS bool AS
'
BEGIN
    -- This is the function that does the heavy lifting of creating
audit tables
    -- and the triggers that will populate them.

    -- Create the audit table: auditor.{schema}_{table}
    EXECUTE \'
        CREATE TABLE auditor.\' || $1 || \'_\' || $2 || \' (
            update_action VARCHAR(6) NOT NULL,
            update_date TIMESTAMP NOT NULL DEFAULT NOW(),
            update_user NAME NOT NULL DEFAULT CURRENT_USER
        ) INHERITS (\' || $1 || \'.\' || $2 || \') WITHOUT OIDS;
    \';

    EXECUTE \'
        CREATE RULE \'|| $2 ||\'_ru AS ON UPDATE TO \'|| $1 ||\'.\'|| $2
||\'
        DO INSERT INTO auditor.\'|| $1 ||\'_\'|| $2 ||\'
        SELECT OLD.*, \'\'UPDATE\'\';
    \';

    EXECUTE \'
        CREATE RULE \'|| $2 ||\'_rd AS ON DELETE TO \'|| $1 ||\'.\'|| $2
||\'
        DO INSERT INTO auditor.\'|| $1 ||\'_\'|| $2 ||\'
        SELECT OLD.*, \'\'DELETE\'\';
    \';

    RETURN TRUE;
END;
'
  LANGUAGE 'plpgsql' VOLATILE;



/* BEGIN EXAMPLE */

CREATE SCHEMA test AUTHORIZATION postgres;

-- This option makes it unnecessary to use the "ONLY" keyword in your
SELECT and UPDATE statements.
\set SQL_INHERITANCE TO OFF;

\set search_path = test, pg_catalog;
\set default_with_oids = false;

CREATE TABLE test.person (
    first_name character varying(24),
    last_name character varying(24),
    gender character(1),
    marital_status character(1)
) WITHOUT OIDS;

INSERT INTO test.person VALUES ('Charlie', 'Bucket', 'M', 'S');
INSERT INTO test.person VALUES ('Grandpa', 'Joe', 'M', NULL);
INSERT INTO test.person VALUES ('Veruca', 'Salt', NULL, 'S');
INSERT INTO test.person VALUES ('Augustus', 'Gloop', 'M', 'S');
INSERT INTO test.person VALUES ('Micheal', 'Teevee', 'M', 'S');
INSERT INTO test.person VALUES ('Violet', 'Beaureguard', 'M', 'S');

SELECT auditor.create_auditor('test', 'person');

UPDATE test.person set marital_status = 'M' WHERE last_name = 'Joe';
SELECT * FROM auditor.test_person;
/*
 first_name | last_name | gender | marital_status | update_action
|        update_date         | update_user
------------+-----------+--------+----------------+---------------+----------------------------+-------------
 Grandpa    | Joe       | M      |                | UPDATE        |
2005-09-20 03:26:23.063965 | postgres
(1 row)
*/

UPDATE test.person set first_name = 'Joe', last_name = 'Bucket' WHERE
last_name = 'Joe';
SELECT * FROM auditor.test_person;
/*
 first_name | last_name | gender | marital_status | update_action
|        update_date         | update_user
------------+-----------+--------+----------------+---------------+----------------------------+-------------
 Grandpa    | Joe       | M      |                | UPDATE        |
2005-09-20 03:26:23.063965 | postgres
 Grandpa    | Joe       | M      | M              | UPDATE        |
2005-09-20 03:26:23.13654  | postgres
(2 rows)
*/

UPDATE test.person set gender = 'F' WHERE last_name = 'Salt';
SELECT * FROM auditor.test_person;
/*
 first_name | last_name | gender | marital_status | update_action
|        update_date         | update_user
------------+-----------+--------+----------------+---------------+----------------------------+-------------
 Grandpa    | Joe       | M      |                | UPDATE        |
2005-09-20 03:26:23.063965 | postgres
 Grandpa    | Joe       | M      | M              | UPDATE        |
2005-09-20 03:26:23.13654  | postgres
 Veruca     | Salt      |        | S              | UPDATE        |
2005-09-20 03:26:23.175714 | postgres
(3 rows)
*/

DELETE FROM test.person WHERE last_name = 'Salt';
SELECT * FROM auditor.test_person;
/*
 first_name | last_name | gender | marital_status | update_action
|        update_date         | update_user
------------+-----------+--------+----------------+---------------+----------------------------+-------------
 Grandpa    | Joe       | M      |                | UPDATE        |
2005-09-20 03:26:23.063965 | postgres
 Grandpa    | Joe       | M      | M              | UPDATE        |
2005-09-20 03:26:23.13654  | postgres
 Veruca     | Salt      |        | S              | UPDATE        |
2005-09-20 03:26:23.175714 | postgres
 Veruca     | Salt      | F      | S              | DELETE        |
2005-09-20 03:26:23.201887 | postgres
(4 rows)
*/

/* END EXAMPLE */


Re: Implementing a change log

From
Mike Rylander
Date:
On 9/20/05, Berend Tober <btober@seaworthysys.com> wrote:
> /*
> The following is based on suggestion by Mike Rylander posted on
> Postgresql-General
> Sun, 18 Sep 2005 23:29:51 +0000
>
> Rylander's original suggestion employed a trigger and tracked
> only row updates. My implementation makes use of rules and
> handles both updates and deletions.
> */

I'm glad that was inspirational for you.  Not to nit pick, but my
trigger based version did in fact track deletions:

CREATE TRIGGER audit_$$ || sch || $$_$$ || tbl || $$_update_trigger
    AFTER UPDATE OR DELETE ON $$ || sch || $$.$$ || tbl || $$ FOR EACH ROW
              ^^^^^^^
    EXECUTE PROCEDURE auditor.audit_$$ || sch || $$_$$ || tbl || $$_func ();

You may want to consider using the LIKE style of table copying, as it
strips all constraints from the new table.  It's safer IMHO, as this
way you wouldn't have to worry about the primary key being propagated
to the new table (and accidentally forgetting to remove it).

--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: Implementing a change log

From
Berend Tober
Date:
Mike Rylander wrote:

>On 9/20/05, Berend Tober <btober@seaworthysys.com> wrote:
>
>
>>/*
>>The following is based on suggestion by Mike Rylander posted on
>>Postgresql-General
>>Sun, 18 Sep 2005 23:29:51 +0000
>>
>>Rylander's original suggestion employed a trigger and tracked
>>only row updates. My implementation makes use of rules and
>>handles both updates and deletions.
>>*/
>>
>>
>
>I'm glad that was inspirational ...
>
>

That was indeed pretty cool.

>...did in fact track deletions:
>
>

Guess I was too excited to actually read the whole thing more closely
once I grasped the direction you were going!!

>You may want to consider using the LIKE style of table copying, as it
>strips all constraints from the new table.  It's safer IMHO, as this
>way you wouldn't have to worry about the primary key being propagated
>to the new table (and accidentally forgetting to remove it).
>
>
I'm glad you pointed that out because you reminded me that when I tried
the original idea from Greg Patnude in Mar 2005 using inheritance, I did
indeed run into a problem with constraints. The problem there I think
was that I had a check constraint on the table for which I created the
audit log table, but the check constraint was defined in a different
schema than the original table. Something about the way inheritance
table creation works found this a problematic situation. I'll have to
revisit that and see if using LIKE overcomes that problem.

I guess I originally thought using INHERIT rather than LIKE was that,
having the audit history, I might at some point present a select view
across both the base and descendant tables or something ("...if you
record it, they (PHB's) will eventually ask for a report on it..."), but
I haven't actually had an implementation where such an audit history
table was actually required in production -- I'm just exercising the
functionality and exploring the quirks in order to be prepared for when
such a requirement is actually promulgated.

Any other significant distinquishing features of INHERIT verses LIKE for
this kind of use that you (or others) can think of?


Re: Implementing a change log

From
Michael Schuerig
Date:
On Tuesday 20 September 2005 08:44, Berend Tober wrote:
> Greg Sabino Mullane wrote:
> >-----BEGIN PGP SIGNED MESSAGE-----
> >Hash: SHA1

[MS: on audit tables]
> >>Can anyone relate their experiences with such a thing? Which
> >> approaches should I take into consideration?
> >
> >I like the multi-table approach; I use a schema named "audit" that
> > contains a copy of some of the important tables (sans constraints).
> > The nice part is that I can use the exact same table name, which
> > makes things easier. A few extra columns on each audit table track
> > who made the change, what type it was (insert, update, or delete
> > [trigger event]), and the time of the change [default timestamptz].
> > Throw in some triggers and you're done.
>
> There was a very exciting discussion of this last May, in which Greg
> Patnude suggested the most insightful, and in hindsight obviously
> appropriate, use of table inheritance ever (IMHO). I slightly refined
> the idea and posted documentation comments at the time. See "User
> Comments" at
>
> "http://www.postgresql.org/docs/8.0/interactive/tutorial-inheritance.
>html"

Looks interesting, but I'm hesitant to use the approach as I don't (yet)
understand it well enough. Also, I'm using the Rails framework and I
don't think it would play well with too much intelligence in the DB.

The consequences are that for my current project I'll do the change
logging at application level. However, this doesn't preclude learning
more about programming PgSQL and possibly writing an extension for
Rails latter on.

Regarding the former, learning about programming on the DB server, my
impression is that documentation is scarce. There is reference
documentation, of course, and the book by Douglas & Douglas devotes a
few chapters to the topic. Still, coverage appears pretty introductory.
In comparison, there's a wealth of books on everything above the
database: languages, design and architecture, frameworks. Literature on
database-server programming seems to be tied to specific products,
mostly Oracle. I'd like to be wrong, of course. What are good ways to
learn about these things in a free database world?

Michael

--
Michael Schuerig                  Failures to use one's frontal lobes
mailto:michael@schuerig.de        can result in the loss of them.
http://www.schuerig.de/michael/   --William H. Calvin

Re: Implementing a change log

From
Mike Rylander
Date:
On 9/20/05, Berend Tober <btober@seaworthysys.com> wrote:

[snip]

> I guess I originally thought using INHERIT rather than LIKE was that,
> having the audit history, I might at some point present a select view
> across both the base and descendant tables or something ("...if you
> record it, they (PHB's) will eventually ask for a report on it..."), but
> I haven't actually had an implementation where such an audit history
> table was actually required in production -- I'm just exercising the
> functionality and exploring the quirks in order to be prepared for when
> such a requirement is actually promulgated.
>

I can see your point.  You could use a UNION ALL view to combine the
main table with the audit table, though.

> Any other significant distinquishing features of INHERIT verses LIKE for
> this kind of use that you (or others) can think of?
>

I would personally still go with LIKE simply for the CONSTRAINT
stripping since the audit table will, by definition, have duplicates
for the main table's primary key.  In fact after looking at the
documentation more closely it seems that all child table must contain
all CHECK constraints from the base table when using INHERITS.  If you
decided to add a CHECK constraint to the base table at some future
time then you might have to modify the data in the audit table to
match it (big no-no!).

Read the INHERITS and LIKE sections of this* closely to see what I'm
talking about.

However, INHERITS may be useful for the audit table.  If the base
table is updated very frequently you could set up RULE based
partitioning for the audit table.  Then you would have the option of
archiving and dropping older sections of the audit table without
affecting the on-disk layout of the rest of the audit data by creating
holes in the audit table (or having to CLUSTER the table).


* http://www.postgresql.org/docs/8.0/interactive/sql-createtable.html

--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org