Re: temporal support patch - Mailing list pgsql-hackers
From | Vlad Arkhipov |
---|---|
Subject | Re: temporal support patch |
Date | |
Msg-id | 4FE8256C.3050208@dc.baikal.ru Whole thread Raw |
In response to | Re: temporal support patch (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: temporal support patch
|
List | pgsql-hackers |
On 05/31/2012 11:52 AM, Jeff Davis wrote: > On Wed, 2012-05-16 at 23:14 +0200, Miroslav Šimulčík wrote: >> Hi all, >> >> >> as a part of my master's thesis I have created temporal support patch >> for PostgreSQL. It enables the creation of special temporal tables >> with entries versioning. Modifying operations (UPDATE, DELETE, >> TRUNCATE) on these tables don't cause permanent changes to entries, >> but create new versions of them. Thus user can easily get to the past >> states of the table. >> > I would be very interested to see this, thank you for working on it. > > There are quite a few aspects to a temporal database system, and you are > working on a system-maintained transaction-time historical table, right? > Or are there other aspects to your proposal? > > Some general comments: > > * I'd very much like to see you make use of Range Types from 9.2; in > particular, TSTZRANGE would be much better than holding two timestamps. > If a standard requires you to display two timestamps in certain > situations, perhaps you could use ranges internally and display the > boundaries as timestamps when needed. It's not sufficient to store only a period of validity for a row. If two transactions started in the same time change the same record, you have a problem with TSTZRANGE type because it's normalized to empty interval. The other issue is how to handle multiple changes of the same record within the transaction. Should they be stored or not? Also it's necessary to store some kind of operation type that was applied to the record (insert/update/delete). For example, there is a table with one record with validity period [0, ) and value 'A'. First way 1. Delete this record in time 1, now there is [0, 1), A in the history table. 2. Insert a new record in time 1, now there is [0, 1), A in the history table and [1, ), B record in the current data table. Second way 1. Update this record in time 1, now there is [0, 1), A in the history table and [1, ), B record in the current data table. So you have the same data in the tables but the actions that led to this configuration were different and the history has been lost partly. > * There is other useful information that could be recorded, such as the > user who inserted/updated/deleted the record. I'm not sure that the database user is the proper thing to be stored in the history table. Many applications usually connect to a database using some virtual user and have their own users/roles tables to handle with privileges. There should be some way to substitute the stored user in the history table with the application's one. It's also helpful to store transaction id that inserted/updated/deleted the record. > * For some purposes, it's very useful to keep track of the columns that > changed. For instance, a query like "show me any time a salary was > changed over the last month" (or some other rare event) would be very > slow to run if there was not some explicit annotation on the historical > records (e.g. a "columns changed" bitmap or something). It's a great proposal but seems to be impossible to implement with triggers only solution, isn't it? Is there any kind of hooks on ALTER TABLE ... in PostgreSQL to update changed columns bitmaps when table structure changes? > * In general, I'm not fond of adorning queries with TRANSACTION TIME AS > OF... kinds of things. Those constructs are redundant with a WHERE > clause (on a range type, you'd use the "contains" operator). If a > standard requires that, maybe it would be OK to allow such things as > syntactic sugar. In SQL2011 there is only one table with the all data, historical and current. So it's not very convenient to specifiy WHERE condition on system time everywhere and for all tables in the query. By default only the current data is selected with a query like SELECT * FROM table. > * As Jim mentioned, it might make sense to use something resembling > inheritance so that selecting from the historical table includes the > current data (but with no upper bound for the range). We have a success experience with inheritance with our trigger-based solution. It's completely transparent for the existing applications and does not have any impact on performance.
pgsql-hackers by date: