Re: temporal support patch - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: temporal support patch |
Date | |
Msg-id | 5033C06902000025000499AE@gw.wicourts.gov Whole thread Raw |
In response to | Re: temporal support patch (Gavin Flower <GavinFlower@archidevsys.co.nz>) |
Responses |
Re: temporal support patch
|
List | pgsql-hackers |
Gavin Flower <GavinFlower@archidevsys.co.nz> wrote: > So if I understand correctly... > > If there is a very long running transaction, say 1 hour, then all > (or just some? - depending) transactions that nominally start and > finish within that time, can not have definitive start times until > the very long running transaction finishes, even if they are > successfully committed? That's not correct. Any transaction which started can certainly have a start time. Any transaction which completed can certainly have a commit or rollback time. What they *can't* have is a known position in the apparent order of execution for serializable transactions, which might be different from the order of start and commit times. The fact that it has an unknown sequence number or timestamp for purposes of ordering visibility of transactions doesn't mean you can't show that it completed in an audit log. In other words, I think the needs for a temporal database are significantly different from the needs of an auditing system. And keep in mind, we are only talking about seeing read-write serializable transactions which might yet conflict with other read-write serializable transactions when choosing to look at the prior state within a temporal serializable transaction. That's easy enough to avoid if you want to do so. > So if someone looks at the audit log they might not see all the > transactions they expect to see. I would assume an audit log would have very different needs from tracking changes for a temporal database view. It even seems possible that you might want to see what people *looked* at, versus just changes. You might want to see transactions which were rolled back, which are irrelevant for a temporal view. If we're talking about an auditing system, we're talking about an almost completely different animal from a temporal view of the database. > So, if I had an automatic query A which updated statistics based > on on transactions committed over the last 10 minutes, then many > (all?) transactions starting and successfully completing during > the time of the very long running transaction will never show up! A statistics capture process like that doesn't seem like a place where you care about the apparent order of execution of serializable transactions. > Here I am envisioning a query fired off every ten minutes looking > for audit records with timestamps within the previous ten minutes. Which timestamp would make sense for that? > However, if I ran a query B looking at audit record numbers with > in 10 minute intervals for a week, but kicked off 24 hours after > the week finished -- then I would see the records I did not see in > query A. > > Hmm... if I am at all right, then probably best to have some > suitably worded 'government health warning' prominent in the > documentation! We're clearly talking at cross purposes. I'm discussing what is needed to be able to see a past state of the database in a transaction which would only see states of the database which are consistent with some serial execution of serializable transactions which modified the database, and you're talking about an audit table. If we're actually talking about an audit system, I have a whole different set of concerns, and I would not be bringing this one up. The whole point of my concern is that if you have a business rule enforced by database triggers that rows in a certain table contain some batch identifier and no rows can be added to a batch after some database change flags that batch as closed, then *without* what I suggest, you could view a closed batch and see one set of rows, and view the batch at a later point in time and magically see rows appear in violation of the enforced business rules. I'm talking about *preventing* surprising appearances of data "after the fact". You would need a big warning if you *don't* have what I suggest. Now, it is quite possible that one table (or set of tables) could do double-duty for both temporal queries and auditing, but the fact that something is not needed there for one purpose doesn't mean it isn't needed for the other. There are still some fuzzy areas around how things would look with a *mix* of serializable and other transactions updating the database; but I think in the long run we will find that people either want to do all of their modifications through SSI, or none of them. The guarantees get pretty weak if you don't know that all transactions were part of the review of "dangerous structures" which can cause anomalies. If anyone is still unclear about serializable transactions in PostgreSQL and wants to invest the time needed to read 12 pages (including footnotes and pretty colored performance graphs) to learn about it -- the paper which is going to be presented at the VLDB conference next week goes at it from a different angle than I usually approach it, and it may "click" with many people where my discussions have fallen short. http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf -Kevin
pgsql-hackers by date: