Thread: How to Implement Versioned Rows in PostgreSQL?

How to Implement Versioned Rows in PostgreSQL?

From
Alan Gutierrez
Date:
My application is gathering personel and patient data for a hospice. It must
keep track of all changes to patient stats, chart, and med sheet over time.
Therefore, I would like to keep versions of the rows in many of my tables.

This is my stab at a PostgreSQL implementation. I would greatly appreciate any
input, criticisms, dire warnings, etc.

I plan on adding a transaction table with a transaction_id column fed by a
sequence. The transaction_id will indicate the order of creation as well as
time of creation.

-- A row for each of our customers who subscribe to the application.

create table firm (                 -- firm => hospice, clinic, hospital
     firm_id integer not null,
     name varchar(32),               -- Just one example data column
     primary key (firm_id)
)
\g

-- A table to keep row of a transaction.

create table transaction (
     firm_id integer not null,
     transaction_id integer not null, -- Fed by sequence, one for each firm so
                                      -- we can part and merge databases by
                                      -- firm without collision!
     modified timestamp not null,
     modified_by person_id not null,
     primary key (firm_id, transaction_id),
     foreign key (firm_id) references firm
)
\g

-- Example versioned table.

create table person_history (           -- Base for patient and employee
     firm_id integer not null,
     person_id integer not null,
     transaction_id integer not null,
     first_name varchar(32),             -- Just two example data columns
     last_name varchar(32) not null,
     deleted boolean not null,
     primary key (firm_id, person_id, transaction_id)
)
\g

-- Show latest row view.

create view person as
select *
   from person_history
  where transaction_id  = ( -- In explain this subselect appears to use index!
            select max(transaction_id)
              from person_history as ph1
             where firm_id = ph1.firm_id
               and person_id = ph1.firm_id
        )
    and deleted = 0
\g
-- Time travel view.
create view person_as_of as
select *
   from person_history
  where transction_id = (
            select max(transaction_id)
              from person_history as ph1
             where firm_id = ph1.firm_id
               and person_id = ph1.firm_id
               and transaction_id <= (
                       select transaction_id
                         from past_transaction
                        limit 1
                   )
        )
    and deleted = 0
\g

In my application I can travel in time thus:

create temporary table past_transaction as
select transaction_id
   from transaction
  where modified <= '2002/12/2 17:59:00' -- the minute I turned 31
  order by firm_id desc, transaction_id desc,
  limit 1
\g
-- If only I could pass the view a parameter!
select * from person_as_of
\g

Thoughts:

* I can write a query and have it travel back in time by setting one variable.
   Neeto.

* No archive tables or such means no copying, new version is a simple insert.
   Good.

* With expliain the sub selects appear to use the indexes with aggregates, and
   if not I can alsways sort descending limit 1. Good.

* Even with complex joins on the latest view tables the query plans use
   the primary index for the sub select. Good.

* There is little need for vacuuming, since no updates are made to the
   busy tables of the application. Does this matter?

* Referenital integrity goes away from what I can see, since it won't
   understand the deleted column. Pity.

Questions:

* Is this viable or overly clever?

* Should I have a boolean latest column on a versioned table? This would mean
   update and vacuum, but potentially a faster query.

* Is there a penalty for long (how do you say?) concatenated keys in
   PostgreSQL?

* Any reason why this won't work with the coming distrubuted PostgreSQL?

* Where can I read about alternative implemenations for history/versioning?
   Nuances?

Thank you all for any input whatsoever.

Alan Gutierrez


Re: How to Implement Versioned Rows in PostgreSQL?

From
Tom Lane
Date:
Alan Gutierrez <ajglist@izzy.net> writes:
> * Where can I read about alternative implemenations for history/versioning?

Have you looked at contrib/spi/timetravel.* ?

            regards, tom lane

Typcasting

From
"Chris Boget"
Date:
I have a field in my table that is of type varchar.  I'm trying
to do a bit of tidying up and since that field contains only
number, I want to convert the field to an integer type.  So I
create a new table and modify/remove the fields as necessary
and then move the relevant data over to the new table.  But
when I try to insert the field that I modified from varchar to
integer, I get an error because the type of data doesn't match
up.  I'm told that I need to cast the value manually.
I've looked up the CAST function and tried everything that I
could think of but nothing seems to work.  I can't, for the life
of me, move the data over.
I can I cast a field value?  Is there any other way to do what
I'm trying to do?

Chris


Re: Typcasting

From
Richard Huxton
Date:
On Monday 06 Jan 2003 4:40 pm, Chris Boget wrote:
> I have a field in my table that is of type varchar.  I'm trying
> to do a bit of tidying up and since that field contains only
> number, I want to convert the field to an integer type.

> I've looked up the CAST function and tried everything that I
> could think of but nothing seems to work.

I take it you're getting something like:

richardh=> select cast('123'::varchar as int4);
ERROR:  Cannot cast type 'character varying' to 'integer'

You'll need to cast to text first, then to integer.

richardh=> select cast(cast('123'::varchar as text) as int4);
 int4
------
  123

So you'll have something like:

cast(cast(my_varchar_field as text) as int4)

The developers have reduced the number of casts to make expression behaviour
more predictable. This means you need to be more explicit about these things.
Can't help thinking that some form of varchar<=>text equivalence detection
would be useful IMHO.

--
  Richard Huxton

Re: How to Implement Versioned Rows in PostgreSQL?

From
Alan Gutierrez
Date:
Tom Lane wrote:
> Alan Gutierrez <ajglist@izzy.net> writes:
>
>>* Where can I read about alternative implemenations for history/versioning?
>
>
> Have you looked at contrib/spi/timetravel.* ?
>
>             regards, tom lane

Thank you.

The timetravel.* uses dates on each table. The nice thing about the transaction
table scheme I've concocted, I think, is that I get a nice round sequential
number. Dates seem so fiddly. I also get to see what changed during a
transaction. I can also store who changed it with the transaction record.

 From looking at timetravel.*, at least I know I am not too ambitious. It seems
acceptable to have a view on the tables filtering out records. I've not had
experience with *large* databases yet. My big concern is that this versioning is
not going to scale. That the extra query for each row will be a real dog.

But, hey, the application needs versioning, gotta pay to play, right?

Maybe I can hack timetravel.* to do my bidding?

If anyone wants to caution (or reassure) this newbie, I would be very grateful.

Thank you.

Alan Gutierrez