Re: temporal support patch - Mailing list pgsql-hackers

From Vlad Arkhipov
Subject Re: temporal support patch
Date
Msg-id 4FE8256C.3050208@dc.baikal.ru
Whole thread Raw
In response to Re: temporal support patch  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: temporal support patch
List pgsql-hackers
On 05/31/2012 11:52 AM, Jeff Davis wrote:
> On Wed, 2012-05-16 at 23:14 +0200, Miroslav Šimulčík wrote:
>> Hi all,
>>
>>
>> as a part of my master's thesis I have created temporal support patch
>> for PostgreSQL. It enables the creation of special temporal tables
>> with entries versioning. Modifying operations (UPDATE, DELETE,
>> TRUNCATE) on these tables don't cause permanent changes to entries,
>> but create new versions of them. Thus user can easily get to the past
>> states of the table.
>>
> I would be very interested to see this, thank you for working on it.
>
> There are quite a few aspects to a temporal database system, and you are
> working on a system-maintained transaction-time historical table, right?
> Or are there other aspects to your proposal?
>
> Some general comments:
>
> * I'd very much like to see you make use of Range Types from 9.2; in
> particular, TSTZRANGE would be much better than holding two timestamps.
> If a standard requires you to display two timestamps in certain
> situations, perhaps you could use ranges internally and display the
> boundaries as timestamps when needed.
It's not sufficient to store only a period of validity for a row. If two
transactions started in the same time change the same record, you have a
problem with TSTZRANGE type because it's normalized to empty interval.
The other issue is how to handle multiple changes of the same record
within the transaction. Should they be stored or not?
Also it's necessary to store some kind of operation type that was
applied to the record (insert/update/delete). For example, there is a
table with one record with validity period [0, ) and value 'A'.

First way
1. Delete this record in time 1, now there is [0, 1), A in the history
table.
2. Insert a new record in time 1, now there is [0, 1), A in the history
table and [1, ), B record in the current data table.

Second way
1. Update this record in time 1, now there is [0, 1), A in the history
table and [1, ), B record in the current data table.

So you have the same data in the tables but the actions that led to this
configuration were different and the history has been lost partly.

> * There is other useful information that could be recorded, such as the
> user who inserted/updated/deleted the record.
I'm not sure that the database user is the proper thing to be stored in
the history table. Many applications usually connect to a database using
some virtual user and have their own users/roles tables to handle with
privileges. There should be some way to substitute the stored user in
the history table with the application's one. It's also helpful to store
transaction id that inserted/updated/deleted the record.

> * For some purposes, it's very useful to keep track of the columns that
> changed. For instance, a query like "show me any time a salary was
> changed over the last month" (or some other rare event) would be very
> slow to run if there was not some explicit annotation on the historical
> records (e.g. a "columns changed" bitmap or something).
It's a great proposal but seems to be impossible to implement with
triggers only solution, isn't it? Is there any kind of hooks on ALTER
TABLE ... in PostgreSQL to update changed columns bitmaps when table
structure changes?
> * In general, I'm not fond of adorning queries with TRANSACTION TIME AS
> OF... kinds of things. Those constructs are redundant with a WHERE
> clause (on a range type, you'd use the "contains" operator). If a
> standard requires that, maybe it would be OK to allow such things as
> syntactic sugar.
In SQL2011 there is only one table with the all data, historical and
current. So it's not very convenient to specifiy WHERE condition on
system time everywhere and for all tables in the query. By default only
the current data is selected with a query like SELECT * FROM table.
> * As Jim mentioned, it might make sense to use something resembling
> inheritance so that selecting from the historical table includes the
> current data (but with no upper bound for the range).
We have a success experience with inheritance with our trigger-based
solution. It's completely transparent for the existing applications and
does not have any impact on performance.


pgsql-hackers by date:

Previous
From: Pavel Golub
Date:
Subject: Re: pg_tablespace.spclocation column removed in 9.2
Next
From: Peter Eisentraut
Date:
Subject: compiler warnings on mingw