RE: postgres 7.2 features. - Mailing list pgsql-hackers
From | Mikheev, Vadim |
---|---|
Subject | RE: postgres 7.2 features. |
Date | |
Msg-id | 8F4C99C66D04D4118F580090272A7A23018C50@SECTORBASE1 Whole thread Raw |
In response to | postgres 7.2 features. (Chris Bitmead <chrisb@nimrod.itg.telstra.com.au>) |
List | pgsql-hackers |
> The bottom line is that the original postgres time-travel > implementation was totally cost-free. I disagree. I can't consider additional > 8 bytes per tuple + pg_time (4 bytes per transaction... please remember that ppl complain even about pg_log - 2 bits per transaction) as totally cost-free for half-useful built-in feature used by 10% of users. Note that I don't talk about overwriting/non-overwriting smgr at all! It's not issue. There are no problems with keeping dead tuples in files as long as required. When I told about new smgr I meant ability to re-use space without vacuum and store > 1 tables per file. But I'll object storing transaction commit times in tuple header and old-designed pg_time. If you want to do TT - welcome... but make it optional, without affect for those who need not in TT. > Actually it may have even speeded things up since vacuum would have > less work to do. This would make happy only *TT users* -:) > Can you convince me that triggers can compare anywhere near for performance? No, they can't. But this is bad only for *TT users* -:) > I can't see how. All I'm asking is don't damage anything that is in postgres > now that is relevant to time-travel in your quest for WAL.... It's not related to WAL! Though... With WAL pg_log is not required to be permanent: we could re-use transaction IDs after db restart... Well, seems we can handle this. > > With the original TT: > > > > - you are not able to use indices to fetch tuples on time base; > > Sounds not very hard to fix.. Really? Commit time is unknown till commit - so you would have to insert index tuples just before commit... how to know what insert? > > - you are not able to control tuples life time; > > From the docs... "Applications that do not want to save > historical data can sepicify a cutoff point for a relation. > Cutoff points are defined by the discard command" I meant another thing: when I have to deal with history, I need sometimes to change historical date-s (c) -:)) Probably we can handle this as well, just some additional complexity -:) > > - you have to store commit time somewhere; > > Ok, so? Space. > > - you have to store additional 8 bytes for each tuple; > > A small price for time travel. Not for those who aren't going to use TT at all. Lower performance of trigger implementation is smaller price for me. > > - 1 sec could be tooo long time interval for some uses of TT. > > So someone in the future can implement finer grains. If time travel > disappears this option is not open. Opened, with triggers -:) As well as Colour-Travel and all other travels -:) > > And, btw, what could be *really* very useful it's TT + > > referential integrity feature. How could it be implemented without triggers? > > In what way does TT not have referential integrity? As long as the > system assures that every transaction writes the same timestamp to all > tuples then referential integrity continues to exist. The same tuple of a table with PK may be updated many times by many transactions in 1 second. For 1 sec grain you would read *many* historical tuples with the same PK all valid in the same time. So, we need in "finer grains" right now... > > Imho, triggers can give you much more flexible and useful TT... > > > > Also note that TT was removed from Illustra and authors wrote that > > built-in TT could be implemented without non-overwriting smgr. > > Of course it can be, but can it be done anywhere near as efficiently? But without losing efficiency where TT is not required. > > > > It was mentioned here that triggers could be used for async > > > > replication, as well as WAL. > > > > > > Same story. Major inefficency. Replication is tough enough without > > > mucking around with triggers. Once the trigger executes you've got > > > to go and store the data in the database again anyway. Then figure > > > out when to delete it. > > > > What about reading WAL to get and propagate changes? I > > don't think that reading tables will be more efficient and, btw, > > how to know what to read (C) -:) ? > > Maybe that is a good approach, but it's not clear that it is the best. > More research is needed. With the no-overwrite storage manager there > exists a mechanism for deciding how long a tuple exists and this > can easily be tapped into for replication purposes. Vacuum could This "mechanism" (just additional field in pg_class) can be used for WAL based replication as well. > serve two purposes of vacuum and replicate. Vacuum is already slow, it's better to make it faster than ever slower... I see vacuum as *optional* command someday... when we'll be able to re-use space. Vadim
pgsql-hackers by date: