Still struggling with history tables - Mailing list pgsql-sql

From Ken Winter
Subject Still struggling with history tables
Date
Msg-id 004c01c61bdf$569b8ed0$6603a8c0@kenxp
Whole thread Raw
In response to Re: Rule causes baffling error  (Richard Huxton <dev@archonet.com>)
Responses Re: Still struggling with history tables  (Achilleus Mantzios <achill@matrix.gatewaynet.com>)
List pgsql-sql
Friends ~

I'm still trying to implement a solution to the requirement to keep a
complete history of data changes to a "person" table.  (See earlier
correspondence below.)  I'm trying for a variant of the architecture
suggested by Richard Huxton (also below).  In my variant, I have this
"identifier" table, carrying the id and invariant info about each person:

/*==============================================================*/
/* Table: person_i                                              */
/*==============================================================*/
create table person_i (
idi                  BIGSERIAL            not null,
date_of_birth        DATE                 null,
constraint PK_person_i_key_1 primary key (idi)
)
;

And then I have this "history" table, carrying the time-varying info on
which I want to keep a complete history:

/*==============================================================*/
/* Table: person_h                                              */
/*==============================================================*/
create table person_h (
idh                  INT8                 not null,
start_date           DATE                 not null default 'now()',
end_date             DATE                 null,
name                 VARCHAR(255)         null,
constraint PK_person_h_key_1 primary key (idh, start_date),
constraint fk_reference_6 foreign key (idh)  references person_i (idi)on delete restrict on update restrict
)
;

Triggers are in place on the "person_h" table so that when an app does an
update, the current h record is expired (with its old data) and a new record
(wuth the updated data)is inserted and made effective "now".  What I'm now
trying to build is this view:

/*==============================================================*/
/* View: person                                                 */
/*==============================================================*/
create view person as
select idi, date_of_birth, start_date, end_date, name
from person_i i, person_h h
where i.idi = h.idh;

I want to enable users (and apps) who need to deal only with current data to
be able to treat "person" as a real table (i.e. to write to it as well as
read from it).  Specifically, the requirements are that when a user does:

.    Insert - The system inserts a record into the i table and the first
record in the h table.
.    Select - The system returns attributes of i and h tables (not
duplicating the identifier columns).
.    Update - The system allows updating of i attributes
(update-in-place, not creating a new history record) and h attributes
(creating a new history record).
.    Delete - The system deletes the i record and all of its h records.

I'm stuck on how to implement the "insert" action, which I thought would be
simple.  The problem is this:  The i table id is of type BIGSERIAL, i.e.
sequence-assigned.  I've tried writing the following rule to get both the i
record and the first h record inserted:

CREATE RULE ru AS
ON INSERT TO person
DO INSTEAD (INSERT INTO person_i DEFAULT VALUES; INSERT INTO person_h (idh) VALUES (NEW.idi)
);

I thought this would grab the sequence-assigned value of person_i.idi to put
into person_h.idh (this trick works in trigger functions), but instead it
just assigns Null to person_h.idh, and the transaction fails with a "not
null" violation on person_h.idh.  And now that I look at the documentation
(http://www.postgresql.org/docs/7.4/static/rules-update.html), it says that
a column not assigned in the invoking query "is replaced by a null value
(for an INSERT)" in the NEW pseudo-record.  Bummer.  Triggers behave nicely,
but rules don't.

I'd be willing to do it with a trigger function instead, but I can't attach
a trigger to a view.

I considered doing it with a trigger function on the person_i table, but I
don't know how that could be made to cause an insert of the person_h table
record - and the assignment of h table values such as "name" from the app's
query.

Suggestions?

~ TIA
~ Ken

> -----Original Message-----
> From: Richard Huxton [mailto:dev@archonet.com]
> Sent: Tuesday, December 20, 2005 4:16 AM
> To: Ken Winter
> Cc: 'PostgreSQL pg-sql list'
> Subject: Re: [SQL] Rule causes baffling error
>
> Ken Winter wrote:
> > Richard ~
> >
> > Let me zoom out for a moment, for the bigger picture.
> >
> > As you have inferred, what I'm trying to do is develop a history-
> preserving
> > table ("my_data" in the example that started this thread).  *Most* user
> > programs would see and manipulate this table as if it contained only the
> > current rows (marked by effective_date_and_time <= 'now' and
> > expiration_date_and_time = 'infinity').
> >
> > When these programs do an INSERT, I need automatic actions that set the
> > expiration and date timestamps to 'now' and 'infinity'; when they do an
> > UPDATE, I need automatic actions that save the old data in a history
> record
> > and expire it as of 'now' and the new data in a record that's effective
> > 'now' and expires at 'infinity'; when they do a DELETE, I need an
> automatic
> > action to expire the target record as of 'now' rather than actually
> deleting
> > it.
...
>
> > However, I also need certain maintenance programs, designed to enable
> > certain users to correct inaccurately entered data.  These need to be
> able
> > to "rewrite history" by doing actions against "my_data" without these
> > automatic actions occurring.  It may prove advisable to provide some
> > automatic actions for these programs too, but they definitely won't be
> the
> > actions described above.  If the above actions were implemented as
> triggers,
> > all the ways I could think of to conditionally disable them (and
> possibly
> > replace them with other actions) seemed architecturally very klunky.
> That's
> > when I decided I needed the "my_data_now" view, and from that I inferred
> > (apparently correctly) that the actions would have to be implemented as
> > rewrite rules.
>
> > The cascading problem was solvable.  But the solution was a bit hard to
> > reach because the user-invoked UPDATE action triggered both an INSERT
> and an
> > UPDATE on the same table (and user DELETE triggered an UPDATE), and so
> one
> > had to take into account that all of these triggered actions would cause
> > their triggers to fire again.  Not a deal-killer, but the solution felt
> > brittle.
> >
> > Yes, I did consider having a "live" table and a separate "history"
> table.
> > The killer of that idea was my inability to find a way to implement
> foreign
> > keys that could refer to both tables and that could follow a record when
> it
> > was moved from "live" to "history".  Much of the history I'm trying to
> > preserve is not in the "my_data" table; it's in related tables that
> refer to
> > it.  I presumably could do this by not declaring the FKs to PostgreSQL,
> and
> > implementing the necessary referential integrity with triggers, but -
> well,
> > in a word, yuck.
>
> If you're going to do this with multiple tables you actually need (at
> least) three. For example, if you had different versions of e.g.
> documents being stored you would want:
>   document - invariants: the id, perhaps document-type.
>              FKeys link to this.
>              A row is only deleted from here if all live+history
>              is also deleted.
>   document_live - the one that gets edited.
>                   1:1 relationship with document if still live
>   document_hist - with timestamps. N:1 with document
>
> Have a google for Temporal Databases too - there's a lot of thinking
> been done about this.
>
> >
> > As it happens, I have found a rewrite of my UPDATE rule that works, so
> my
> > immediate need is past.  FYI, the old update rule was:
> >
> [snip]
>
> > The relevant change is that I'm now expiring the record with the old
> data
> > and inserting the one with the new data, rather than vice versa.  I
> still
> > don't know why the old rule didn't work and this one does, but hey,
> > whatever.  Another advantage of the new one is that I don't have to re-
> point
> > foreign keys that were already pointed to the record containing the old
> > data, because that record stays in place.
> >
> > (The other change, adding the lines
> >       AND effective_date_and_time <= CURRENT_TIMESTAMP
> >       AND expiration_date_and_time >= CURRENT_TIMESTAMP;
> > to the UPDATE, was necessary to keep updates to the "my_data_now" from
> > updating the expired rows as well.)
>
> Make sure you test it with inserts/updates of multiple rows too.
>
> --
>    Richard Huxton
>    Archonet Ltd




pgsql-sql by date:

Previous
From: "George Pavlov"
Date:
Subject: non-equi self-join optimization
Next
From: Ivan Steganov
Date:
Subject: Matching several rows