RE: postgres 7.2 features. - Mailing list pgsql-hackers

From Mikheev, Vadim
Subject RE: postgres 7.2 features.
Date
Msg-id 8F4C99C66D04D4118F580090272A7A23018C50@SECTORBASE1
Whole thread Raw
In response to postgres 7.2 features.  (Chris Bitmead <chrisb@nimrod.itg.telstra.com.au>)
List pgsql-hackers
> The bottom line is that the original postgres time-travel 
> implementation was totally cost-free. 

I disagree. I can't consider additional > 8 bytes per tuple +
pg_time (4 bytes per transaction... please remember that ppl
complain even about pg_log - 2 bits per transaction) as
totally cost-free for half-useful built-in feature used
by 10% of users.
Note that I don't talk about overwriting/non-overwriting smgr at all!
It's not issue. There are no problems with keeping dead tuples in files
as long as required. When I told about new smgr I meant ability to re-use
space without vacuum and store > 1 tables per file.
But I'll object storing transaction commit times in tuple header and
old-designed pg_time. If you want to do TT - welcome... but make
it optional, without affect for those who need not in TT.

> Actually it may have even speeded things up since vacuum would have
> less work to do.

This would make happy only *TT users* -:)

> Can you convince me that triggers can compare anywhere near for
performance?

No, they can't. But this is bad only for *TT users* -:)

> I can't see how. All I'm asking is don't damage anything that is in
postgres
> now that is relevant to time-travel in your quest for WAL....

It's not related to WAL!
Though... With WAL pg_log is not required to be permanent: we could re-use
transaction IDs after db restart... Well, seems we can handle this.

> > With the original TT:
> > 
> > - you are not able to use indices to fetch tuples on time base;
> 
> Sounds not very hard to fix..

Really? Commit time is unknown till commit - so you would have to insert
index tuples just before commit... how to know what insert?

> > - you are not able to control tuples life time;
> 
> From the docs... "Applications that do not want to save 
> historical data can sepicify a cutoff point for a relation.
> Cutoff points are defined by the discard command"

I meant another thing: when I have to deal with history,
I need sometimes to change historical date-s (c) -:))
Probably we can handle this as well, just some additional
complexity -:)

> > - you have to store commit time somewhere;
> 
> Ok, so?

Space.

> > - you have to store additional 8 bytes for each tuple;
> 
> A small price for time travel.

Not for those who aren't going to use TT at all.
Lower performance of trigger implementation is smaller price for me.

> > - 1 sec could be tooo long time interval for some uses of TT.
> 
> So someone in the future can implement finer grains. If time travel
> disappears this option is not open.

Opened, with triggers -:)
As well as Colour-Travel and all other travels -:)

> > And, btw, what could be *really* very useful it's TT + 
> > referential integrity feature. How could it be implemented without
triggers?
> 
> In what way does TT not have referential integrity? As long as the
> system assures that every transaction writes the same timestamp to all
> tuples then referential integrity continues to exist.

The same tuple of a table with PK may be updated many times by many
transactions
in 1 second. For 1 sec grain you would read *many* historical tuples with
the same
PK all valid in the same time. So, we need in "finer grains" right now...

> > Imho, triggers can give you much more flexible and useful TT...
> > 
> > Also note that TT was removed from Illustra and authors wrote that
> > built-in TT could be implemented without non-overwriting smgr.
> 
> Of course it can be, but can it be done anywhere near as efficiently?

But without losing efficiency where TT is not required.

> > > > It was mentioned here that triggers could be used for async
> > > > replication, as well as WAL.
> > >
> > > Same story. Major inefficency. Replication is tough enough without
> > > mucking around with triggers. Once the trigger executes you've got
> > > to go and store the data in the database again anyway. Then figure
> > > out when to delete it.
> > 
> > What about reading WAL to get and propagate changes? I 
> > don't think that reading tables will be more efficient and, btw,
> > how to know what to read (C) -:) ?
> 
> Maybe that is a good approach, but it's not clear that it is the best.
> More research is needed. With the no-overwrite storage manager there
> exists a mechanism for deciding how long a tuple exists and this
> can easily be tapped into for replication purposes. Vacuum could 

This "mechanism" (just additional field in pg_class) can be used
for WAL based replication as well.

> serve two purposes of vacuum and replicate.

Vacuum is already slow, it's better to make it faster than ever slower...
I see vacuum as *optional* command someday... when we'll be able to
re-use space.

Vadim


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: md5 again
Next
From: "Mikheev, Vadim"
Date:
Subject: RE: Storage Manager (was postgres 7.2 features.)