Still struggling with history tables - Mailing list pgsql-sql
From | Ken Winter |
---|---|
Subject | Still struggling with history tables |
Date | |
Msg-id | 004c01c61bdf$569b8ed0$6603a8c0@kenxp Whole thread Raw |
In response to | Re: Rule causes baffling error (Richard Huxton <dev@archonet.com>) |
Responses |
Re: Still struggling with history tables
|
List | pgsql-sql |
Friends ~ I'm still trying to implement a solution to the requirement to keep a complete history of data changes to a "person" table. (See earlier correspondence below.) I'm trying for a variant of the architecture suggested by Richard Huxton (also below). In my variant, I have this "identifier" table, carrying the id and invariant info about each person: /*==============================================================*/ /* Table: person_i */ /*==============================================================*/ create table person_i ( idi BIGSERIAL not null, date_of_birth DATE null, constraint PK_person_i_key_1 primary key (idi) ) ; And then I have this "history" table, carrying the time-varying info on which I want to keep a complete history: /*==============================================================*/ /* Table: person_h */ /*==============================================================*/ create table person_h ( idh INT8 not null, start_date DATE not null default 'now()', end_date DATE null, name VARCHAR(255) null, constraint PK_person_h_key_1 primary key (idh, start_date), constraint fk_reference_6 foreign key (idh) references person_i (idi)on delete restrict on update restrict ) ; Triggers are in place on the "person_h" table so that when an app does an update, the current h record is expired (with its old data) and a new record (wuth the updated data)is inserted and made effective "now". What I'm now trying to build is this view: /*==============================================================*/ /* View: person */ /*==============================================================*/ create view person as select idi, date_of_birth, start_date, end_date, name from person_i i, person_h h where i.idi = h.idh; I want to enable users (and apps) who need to deal only with current data to be able to treat "person" as a real table (i.e. to write to it as well as read from it). Specifically, the requirements are that when a user does: . Insert - The system inserts a record into the i table and the first record in the h table. . Select - The system returns attributes of i and h tables (not duplicating the identifier columns). . Update - The system allows updating of i attributes (update-in-place, not creating a new history record) and h attributes (creating a new history record). . Delete - The system deletes the i record and all of its h records. I'm stuck on how to implement the "insert" action, which I thought would be simple. The problem is this: The i table id is of type BIGSERIAL, i.e. sequence-assigned. I've tried writing the following rule to get both the i record and the first h record inserted: CREATE RULE ru AS ON INSERT TO person DO INSTEAD (INSERT INTO person_i DEFAULT VALUES; INSERT INTO person_h (idh) VALUES (NEW.idi) ); I thought this would grab the sequence-assigned value of person_i.idi to put into person_h.idh (this trick works in trigger functions), but instead it just assigns Null to person_h.idh, and the transaction fails with a "not null" violation on person_h.idh. And now that I look at the documentation (http://www.postgresql.org/docs/7.4/static/rules-update.html), it says that a column not assigned in the invoking query "is replaced by a null value (for an INSERT)" in the NEW pseudo-record. Bummer. Triggers behave nicely, but rules don't. I'd be willing to do it with a trigger function instead, but I can't attach a trigger to a view. I considered doing it with a trigger function on the person_i table, but I don't know how that could be made to cause an insert of the person_h table record - and the assignment of h table values such as "name" from the app's query. Suggestions? ~ TIA ~ Ken > -----Original Message----- > From: Richard Huxton [mailto:dev@archonet.com] > Sent: Tuesday, December 20, 2005 4:16 AM > To: Ken Winter > Cc: 'PostgreSQL pg-sql list' > Subject: Re: [SQL] Rule causes baffling error > > Ken Winter wrote: > > Richard ~ > > > > Let me zoom out for a moment, for the bigger picture. > > > > As you have inferred, what I'm trying to do is develop a history- > preserving > > table ("my_data" in the example that started this thread). *Most* user > > programs would see and manipulate this table as if it contained only the > > current rows (marked by effective_date_and_time <= 'now' and > > expiration_date_and_time = 'infinity'). > > > > When these programs do an INSERT, I need automatic actions that set the > > expiration and date timestamps to 'now' and 'infinity'; when they do an > > UPDATE, I need automatic actions that save the old data in a history > record > > and expire it as of 'now' and the new data in a record that's effective > > 'now' and expires at 'infinity'; when they do a DELETE, I need an > automatic > > action to expire the target record as of 'now' rather than actually > deleting > > it. ... > > > However, I also need certain maintenance programs, designed to enable > > certain users to correct inaccurately entered data. These need to be > able > > to "rewrite history" by doing actions against "my_data" without these > > automatic actions occurring. It may prove advisable to provide some > > automatic actions for these programs too, but they definitely won't be > the > > actions described above. If the above actions were implemented as > triggers, > > all the ways I could think of to conditionally disable them (and > possibly > > replace them with other actions) seemed architecturally very klunky. > That's > > when I decided I needed the "my_data_now" view, and from that I inferred > > (apparently correctly) that the actions would have to be implemented as > > rewrite rules. > > > The cascading problem was solvable. But the solution was a bit hard to > > reach because the user-invoked UPDATE action triggered both an INSERT > and an > > UPDATE on the same table (and user DELETE triggered an UPDATE), and so > one > > had to take into account that all of these triggered actions would cause > > their triggers to fire again. Not a deal-killer, but the solution felt > > brittle. > > > > Yes, I did consider having a "live" table and a separate "history" > table. > > The killer of that idea was my inability to find a way to implement > foreign > > keys that could refer to both tables and that could follow a record when > it > > was moved from "live" to "history". Much of the history I'm trying to > > preserve is not in the "my_data" table; it's in related tables that > refer to > > it. I presumably could do this by not declaring the FKs to PostgreSQL, > and > > implementing the necessary referential integrity with triggers, but - > well, > > in a word, yuck. > > If you're going to do this with multiple tables you actually need (at > least) three. For example, if you had different versions of e.g. > documents being stored you would want: > document - invariants: the id, perhaps document-type. > FKeys link to this. > A row is only deleted from here if all live+history > is also deleted. > document_live - the one that gets edited. > 1:1 relationship with document if still live > document_hist - with timestamps. N:1 with document > > Have a google for Temporal Databases too - there's a lot of thinking > been done about this. > > > > > As it happens, I have found a rewrite of my UPDATE rule that works, so > my > > immediate need is past. FYI, the old update rule was: > > > [snip] > > > The relevant change is that I'm now expiring the record with the old > data > > and inserting the one with the new data, rather than vice versa. I > still > > don't know why the old rule didn't work and this one does, but hey, > > whatever. Another advantage of the new one is that I don't have to re- > point > > foreign keys that were already pointed to the record containing the old > > data, because that record stays in place. > > > > (The other change, adding the lines > > AND effective_date_and_time <= CURRENT_TIMESTAMP > > AND expiration_date_and_time >= CURRENT_TIMESTAMP; > > to the UPDATE, was necessary to keep updates to the "my_data_now" from > > updating the expired rows as well.) > > Make sure you test it with inserts/updates of multiple rows too. > > -- > Richard Huxton > Archonet Ltd