Looking for advice on working with revisions - Mailing list pgsql-general

From Ayo
Subject Looking for advice on working with revisions
Date
Msg-id a0dbb3a40912050230x47f59bd8p4653ff56bad76d32@mail.gmail.com
Whole thread Raw
Responses Re: Looking for advice on working with revisions  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-general
Hello PostgreSQL users and developers,

I have the following database structure to store information about (game)
releases. As this information is contributed and edited by users, all changes
to the release information are recorded in the form of 'revisions'.

 -- this is a simplified representation of the database, the actual tables have
 -- more columns, and this SQL won't really work due to circular references

 -- each row is a (game) release
 CREATE TABLE releases (
   id SERIAL PRIMARY KEY,
   -- pointer to the current visible revision
   current integer REFERENCES releases_rev (id)
   -- (some other columns not related to my question)
 );

 -- each row is a revision (not necessarily a revision of a release, can
 -- be of other database entries as well)
 CREATE TABLE revisions (
   id SERIAL PRIMARY KEY,
   -- the type tells of what kind of entry this revision is of, in this example
   -- this is always of type 'release'
   type char(1) NOT NULL, -- (actually an ENUM, but that doesn't
matter for this example)
   uid integer NOT NULL, -- user who made the change
   date timestamp with timezone NOT NULL DEFAULT NOW()
 );

 -- each row is a revision of a release
 CREATE TABLE releases_rev (
   id integer NOT NULL PRIMARY KEY REFERENCES revisions (id),
   -- pointer back to the release, so we know for which release this revision is
   rid integer NOT NULL REFERENCES releases (id),
   -- some data columns
   title varchar(250) NOT NULL,
   release_date date
 );

 -- each release can be in multiple languages, so separate table from
releases_rev
 CREATE TABLE releases_lang (
   revision integer NOT NULL REFERENCES releases_rev (id),
   language char(2),
   PRIMARY KEY(revision, language)
 );
 -- and some more tables similar to releases_lang


Inserting a new release and its first revision is done as explained with the
following pseudocode:

 $rid = INSERT INTO releases DEFAULT VALUES RETURNING id;
 $rev = INSERT INTO revisions (type, uid) VALUES('r', $uid) RETURNING id;
 INSERT INTO releases_rev VALUES($rev, $rid, $title, $date);
 INSERT INTO releases_lang VALUES ($rev, $lang1), ($rev, $lang2);
 UPDATE releases SET current = $rev;


Inserting a new revision of an existing release is quite similar, except that
there is no INSERT INTO query for the releases table, and $rid is already known.

This system has worked quite well, but has one problem: every client that wants
to insert a new revision will have to manually copy over all releases_rev
columns and releases_lang rows of the previous revision, even if all they want
to change is just one single column. This in itself is not a problem, but
whenever a new column or table is added to the database, all clients will have
to be updated again so that they copy over the new information correctly, which
is very prone to errors.

In order to solve this I am thinking of adding a level of abstraction within
the database by using stored procedures that will generate temporary tables and
fill those with information about a previous revision. These temporary tables
will then not include any information about revisions, and can thus be modified
as if there were no revisions at all. For example, creating a new revision of a
release to change the title field and removing a language could be done as
such:

 -- creates the temporary tables and fills it with information from
revision $rev
 SELECT edit_release_init($rev);
 -- the edit_* are the temporary tables, which only have rows for the
selected release
 UPDATE edit_releases SET title = $new_title;
 DELETE FROM edit_releases_lang WHERE language = $language_to_remove;
 -- reads information from the temporary tables and inserts a new revision
 -- (using the procedure described above)
 SELECT edit_release_finish($uid);


All other release information will then be copied over automatically, and the
client will only have to know about the columns and tables it works with. Only
the edit_release_init() and edit_release_finished() functions will need to be
modified when adding a new table or column to the database.

While this would probably work, it will add a performance overhead of creating
and using temporary tables, and requires each client to call a function before
and after inserting a new revision. I'm wondering whether there's are nicer or
cleaner approach I somehow missed.

I'm also willing to change the database structure, but somehow doubt that I
could find an alternative structure which would be as easy to work with as the
current solution.

What would you do with a situation like this?

pgsql-general by date:

Previous
From: Sam Mason
Date:
Subject: Re: Array comparison & prefix search
Next
From: Peter Eisentraut
Date:
Subject: Re: pg_attribute.attnum - wrong column ordinal?