Thread: New feature request: FlashBack Query

New feature request: FlashBack Query

From
RPK
Date:
PostgreSQL, already a mature database, needs to have more options for
recovery as compared to proprietary databases. I just worked with Oracle's
FlashBack query feature in Oracle 9i and FlashBack Table feature in 10g.

Future versions of PostgreSQL must have similar features which enable users
to bring Table(s) and/or Database(s) to a desired Time Stamp.
-- 
View this message in context: http://www.nabble.com/New-feature-request%3A-FlashBack-Query-tf3245023.html#a9020502
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: New feature request: FlashBack Query

From
"Joshua D. Drake"
Date:
RPK wrote:
> PostgreSQL, already a mature database, needs to have more options for
> recovery as compared to proprietary databases. I just worked with Oracle's
> FlashBack query feature in Oracle 9i and FlashBack Table feature in 10g.
> 
> Future versions of PostgreSQL must have similar features which enable users
> to bring Table(s) and/or Database(s) to a desired Time Stamp.

We can do it with databases, we can't do it with tables. Nor should we
do it with tables as it would require that all tables in relation are
also flashed backed.

Joshua D. Drake

-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: New feature request: FlashBack Query

From
Tom Lane
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes:
> RPK wrote:
>> Future versions of PostgreSQL must have similar features which enable users
>> to bring Table(s) and/or Database(s) to a desired Time Stamp.

> We can do it with databases, we can't do it with tables. Nor should we
> do it with tables as it would require that all tables in relation are
> also flashed backed.

AFAICT this is a request to re-instate Time Travel, which is a feature
we removed more than ten years ago because the overhead was utterly
unacceptable.  And the project's idea of acceptable performance then
was orders of magnitude weaker than it is now.  So it's not going to
happen, at least not in the general release.  You might take a look at
contrib/spi/README.timetravel, though, for a prototype of how something
similar can be achieved without any changes to the core system.  That
module is a bit unmaintained and could doubtless do with some updates
--- for starters, it should be using timestamptz instead of the old
deprecated abstime.  If you're interested, feel free to work on it.
No one else has taken an interest in a long time.
        regards, tom lane


Re: New feature request: FlashBack Query

From
elein
Date:
On Sat, Feb 17, 2007 at 11:48:55AM -0500, Tom Lane wrote:
> "Joshua D. Drake" <jd@commandprompt.com> writes:
> > RPK wrote:
> >> Future versions of PostgreSQL must have similar features which enable users
> >> to bring Table(s) and/or Database(s) to a desired Time Stamp.
> 
> > We can do it with databases, we can't do it with tables. Nor should we
> > do it with tables as it would require that all tables in relation are
> > also flashed backed.
> 
> AFAICT this is a request to re-instate Time Travel, which is a feature
> we removed more than ten years ago because the overhead was utterly
> unacceptable.  And the project's idea of acceptable performance then
> was orders of magnitude weaker than it is now.  So it's not going to
> happen, at least not in the general release.  You might take a look at
> contrib/spi/README.timetravel, though, for a prototype of how something
> similar can be achieved without any changes to the core system.  That
> module is a bit unmaintained and could doubtless do with some updates
> --- for starters, it should be using timestamptz instead of the old
> deprecated abstime.  If you're interested, feel free to work on it.
> No one else has taken an interest in a long time.
> 

For other recent time travel ideas see: http://www.varlena.com/GeneralBits/122.php
Time travel is not cheap, though.

--elein
elein@varlena.com


Re: New feature request: FlashBack Query

From
"Chad Wagner"
Date:
On 2/17/07, elein <elein@varlena.com> wrote:
For other recent time travel ideas see: http://www.varlena.com/GeneralBits/122.php
Time travel is not cheap, though.


I am sure this topic has probably been beaten to death in the past, but has anyone talked about the advantages of Oracle's MVCC model versus PostgreSQL's MVCC model?  Oracle achieves multiversioning by using rollback/undo segments, where PostgreSQL appears to place (essentially) the undo in the same space as the table.

If I were to guess this is probably a major thing to change.  Clearly there are advantages to both, with Oracle essentially the space consumed by a modified row is immediately available for reuse and generally there is little row migration assuming there is enough space on the block so you should be able to avoid updates to the index and the bloating that seems to go along with vacuuming.

Is there any previous discussions that folks could point out here?

Re: New feature request: FlashBack Query

From
"Joshua D. Drake"
Date:
Chad Wagner wrote:
> On 2/17/07, elein <elein@varlena.com> wrote:
>>
>> For other recent time travel ideas see:
>> http://www.varlena.com/GeneralBits/122.php
>> Time travel is not cheap, though.
>>
> 
> 
> I am sure this topic has probably been beaten to death in the past, but has
> anyone talked about the advantages of Oracle's MVCC model versus
> PostgreSQL's MVCC model?  Oracle achieves multiversioning by using
> rollback/undo segments, where PostgreSQL appears to place (essentially) the
> undo in the same space as the table.

My understanding is that the main difference is that rollbacks are
inexpensive for us, but expensive for Oracle. Talk to an Oracle DBA
about their Rollback logs :0.

However, they don't have vacuum, we do.

Joshua D. Drake

> 
> If I were to guess this is probably a major thing to change.  Clearly there
> are advantages to both, with Oracle essentially the space consumed by a
> modified row is immediately available for reuse and generally there is
> little row migration assuming there is enough space on the block so you
> should be able to avoid updates to the index and the bloating that seems to
> go along with vacuuming.
> 
> Is there any previous discussions that folks could point out here?
> 


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: New feature request: FlashBack Query

From
"Chad Wagner"
Date:
On 2/17/07, Joshua D. Drake <jd@commandprompt.com> wrote:
My understanding is that the main difference is that rollbacks are
inexpensive for us, but expensive for Oracle. Talk to an Oracle DBA
about their Rollback logs :0.

Yes, I have seen cases where undo segments are thrashed.  Generally it works well, and I agree it likely much cheaper on PostgreSQL as you would expect fewer scattered reads because the old version is inline with the rest of the data. 

But if I recall undo segments are cached in Oracle, usually where I see problems is where the DBA is completely incompetent and has undersized the buffer cache.  Oracle does direct reads (afaik) -- so undersizing the buffer cache can be brutal.  A very common mistake with Oracle was undersizing the buffer cache and oversizing the shared pool (when the shared plans are stored), and with 9i and later they tried to have the management tools suggest the ideal values or have it automatically managed by the database.  Probably a step in the right direction, but I know they still have bumps to iron it.  :)
 

However, they don't have vacuum, we do.

Right, and I think that is more or less because Oracle doesn't need it.  Vacuum's main purpose (correct me if I am wrong) is to recover/mark rows that are no longer used, and Oracle essentially reuses the space immediately.

Obviously with Oracle if you bloat out a table and delete a ton of rows then you have to rebuild the table, but that is more or less the same problem that PostgreSQL has and where vacuum full comes into play.

The only benefit with the Oracle model is that you can achieve flashback, which is a very rarely used feature in my book.  The disadvantages is likely overhead to perform the "rollback" and possibly more scattered reads.  I can say that I have used it, and it has come in handy, but hardly worth it.  The benefit with the PostgreSQL model is the likelihood of the old rows being inline with the rest of the table data, potentially reducing scattered reads.  The disadvantage is vacuuming, it seems to be often overlooked -- possibly solved by defaulting autovacuum to on? (seems to be the way Oracle is heading, defaulting statistics collection to on and other management features).

Re: New feature request: FlashBack Query

From
Tom Lane
Date:
"Chad Wagner" <chad.wagner@gmail.com> writes:
> I am sure this topic has probably been beaten to death in the past, but has
> anyone talked about the advantages of Oracle's MVCC model versus
> PostgreSQL's MVCC model?

Yes, we've been all through that.  We like ours.  See the archives.
        regards, tom lane


Re: New feature request: FlashBack Query

From
Warren Turkal
Date:
On Saturday 17 February 2007 07:49, RPK wrote:
> PostgreSQL, already a mature database, needs to have more options for
> recovery as compared to proprietary databases. I just worked with Oracle's
> FlashBack query feature in Oracle 9i and FlashBack Table feature in 10g.
>
> Future versions of PostgreSQL must have similar features which enable users
> to bring Table(s) and/or Database(s) to a desired Time Stamp.

Check out my proposal[1] for Temporal extensions. Ultimately, creating valid 
time and transaction time tables would be possible through my proposal. Please 
check it out.

[1]http://archives.postgresql.org/pgsql-hackers/2007-02/msg00540.php

wt
-- 
Warren Turkal (w00t)


Re: New feature request: FlashBack Query

From
Hannu Krosing
Date:
Ühel kenal päeval, L, 2007-02-17 kell 22:49, kirjutas Chad Wagner:

> 
> 
>         However, they don't have vacuum, we do.
> 
> Right, and I think that is more or less because Oracle doesn't need
> it.  Vacuum's main purpose (correct me if I am wrong) is to
> recover/mark rows that are no longer used, and Oracle essentially
> reuses the space immediately. 
> 
> Obviously with Oracle if you bloat out a table and delete a ton of
> rows then you have to rebuild the table, but that is more or less the
> same problem that PostgreSQL has and where vacuum full comes into
> play.
> 
> The only benefit with the Oracle model is that you can achieve
> flashback, which is a very rarely used feature in my book.

We can have flashbacks up to the last vacuum. It is just not exposed.
Don't vacuum, and you have the whole history. (Actually you can't go for
more than 2G transactions, or you get trx id rollover).

To get a flashback query, you "just" have to construct a snapshot from
that time and you are done. We don't store transaction times anywere, so
the flashback has to be by transaction id, but there is very little
extra work involved. We just don't have syntax for saying "SELECT ... AS
SEEN BY TRANSACTION XXX"

AFAIK, Oracles flashbacks also can go as far back as there are rollback
segments.

Postgres' original design prescribed, that VACUUM would not delete dead
tuples, but just move them to history tables on cheap(er) WORM storage.
Doing that would have very little overhead (except writing the old
tuples) and would not need any fundamental changes to how we do things
currently.

>   The disadvantages is likely overhead to perform the "rollback" and
> possibly more scattered reads.  

I've also heard reports, that doing concurrent data loading and big
analysis queries is a royal pain in Oracle.

> I can say that I have used it, and it has come in handy, but hardly
> worth it.  The benefit with the PostgreSQL model is the likelihood of
> the old rows being inline with the rest of the table data, potentially
> reducing scattered reads.  The disadvantage is vacuuming, it seems to
> be often overlooked -- possibly solved by defaulting autovacuum to on?
> (seems to be the way Oracle is heading, defaulting statistics
> collection to on and other management features). 
> 
-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



Re: New feature request: FlashBack Query

From
"Joshua D. Drake"
Date:
Hannu Krosing wrote:
> Ühel kenal päeval, L, 2007-02-17 kell 22:49, kirjutas Chad Wagner:
> 
>>
>>         However, they don't have vacuum, we do.
>>
>> Right, and I think that is more or less because Oracle doesn't need
>> it.  Vacuum's main purpose (correct me if I am wrong) is to
>> recover/mark rows that are no longer used, and Oracle essentially
>> reuses the space immediately. 
>>
>> Obviously with Oracle if you bloat out a table and delete a ton of
>> rows then you have to rebuild the table, but that is more or less the
>> same problem that PostgreSQL has and where vacuum full comes into
>> play.
>>
>> The only benefit with the Oracle model is that you can achieve
>> flashback, which is a very rarely used feature in my book.
> 
> We can have flashbacks up to the last vacuum. It is just not exposed.
> Don't vacuum, and you have the whole history. (Actually you can't go for
> more than 2G transactions, or you get trx id rollover).
> 
> To get a flashback query, you "just" have to construct a snapshot from
> that time and you are done. We don't store transaction times anywere, so
> the flashback has to be by transaction id, but there is very little
> extra work involved. We just don't have syntax for saying "SELECT ... AS
> SEEN BY TRANSACTION XXX"

Well this is certainly interesting. What do we think it would take to
enable the functionality?

Joshua D. Drake



-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: New feature request: FlashBack Query

From
Andreas 'ads' Scherbaum
Date:
Hello,

On Sat, 17 Feb 2007 06:49:42 -0800 (PST)
RPK <rohitprakash123@indiatimes.com> wrote:

> PostgreSQL, already a mature database, needs to have more options for
> recovery as compared to proprietary databases. I just worked with Oracle's
> FlashBack query feature in Oracle 9i and FlashBack Table feature in 10g.
> 
> Future versions of PostgreSQL must have similar features which enable users
> to bring Table(s) and/or Database(s) to a desired Time Stamp.

There is a pgfoundry project which tries to achieve this:

http://pgfoundry.org/projects/tablelog/


Kind regards

--             Andreas 'ads' Scherbaum
German PostgreSQL Usergroup: http://www.pgug.de


Re: New feature request: FlashBack Query

From
Hannu Krosing
Date:
Ühel kenal päeval, P, 2007-02-18 kell 14:27, kirjutas Joshua D. Drake:
> Hannu Krosing wrote:
> > Ühel kenal päeval, L, 2007-02-17 kell 22:49, kirjutas Chad Wagner:
> > 
> >>
> >>         However, they don't have vacuum, we do.
> >>
> >> Right, and I think that is more or less because Oracle doesn't need
> >> it.  Vacuum's main purpose (correct me if I am wrong) is to
> >> recover/mark rows that are no longer used, and Oracle essentially
> >> reuses the space immediately. 
> >>
> >> Obviously with Oracle if you bloat out a table and delete a ton of
> >> rows then you have to rebuild the table, but that is more or less the
> >> same problem that PostgreSQL has and where vacuum full comes into
> >> play.
> >>
> >> The only benefit with the Oracle model is that you can achieve
> >> flashback, which is a very rarely used feature in my book.
> > 
> > We can have flashbacks up to the last vacuum. It is just not exposed.
> > Don't vacuum, and you have the whole history. (Actually you can't go for
> > more than 2G transactions, or you get trx id rollover).
> > 
> > To get a flashback query, you "just" have to construct a snapshot from
> > that time and you are done. We don't store transaction times anywere, so
> > the flashback has to be by transaction id, but there is very little
> > extra work involved. We just don't have syntax for saying "SELECT ... AS
> > SEEN BY TRANSACTION XXX"
> 
> Well this is certainly interesting. What do we think it would take to
> enable the functionality?

First we must run the query in serializable mode and replace the
snapshot with a synthetic one, which defines visibility at the start of
the desired transaction

probably it is a good idea to take a lock on all tables involved to
avoid a vacuum to be started on them when the query is running.

also, we can't trust the DELETED flags in index pages, so we should
forbid index scans, or just always re-check the visibility in heap.

Otherways it would probably be enough to just scan tuples as usual, and
check if they were visible to desired transaction, that is they were
inserted before that transaction and they are not deleted before that
trx.

Of course this will not be true, once we have HOT/WIP with in-page
vacuuming.

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



Re: New feature request: FlashBack Query

From
"Florian G. Pflug"
Date:
Hannu Krosing wrote:
> Ühel kenal päeval, P, 2007-02-18 kell 14:27, kirjutas Joshua D. Drake:
>> Hannu Krosing wrote:
>>> Ühel kenal päeval, L, 2007-02-17 kell 22:49, kirjutas Chad Wagner:
>>> To get a flashback query, you "just" have to construct a snapshot from
>>> that time and you are done. We don't store transaction times anywere, so
>>> the flashback has to be by transaction id, but there is very little
>>> extra work involved. We just don't have syntax for saying "SELECT ... AS
>>> SEEN BY TRANSACTION XXX"
>> Well this is certainly interesting. What do we think it would take to
>> enable the functionality?
> 
> First we must run the query in serializable mode and replace the
> snapshot with a synthetic one, which defines visibility at the start of
> the desired transaction
> 
> probably it is a good idea to take a lock on all tables involved to
> avoid a vacuum to be started on them when the query is running.
Would the xmin exported by that transaction prevent vacuum from removing
any tuples still needed for the flashback snapshot?

greetings, Florian Pflug


Re: New feature request: FlashBack Query

From
Alvaro Herrera
Date:
Florian G. Pflug escribió:
> Hannu Krosing wrote:
> >Ühel kenal päeval, P, 2007-02-18 kell 14:27, kirjutas Joshua D. Drake:
> >>Hannu Krosing wrote:
> >>>Ühel kenal päeval, L, 2007-02-17 kell 22:49, kirjutas Chad Wagner:
> >>>To get a flashback query, you "just" have to construct a snapshot from
> >>>that time and you are done. We don't store transaction times anywere, so
> >>>the flashback has to be by transaction id, but there is very little
> >>>extra work involved. We just don't have syntax for saying "SELECT ... AS
> >>>SEEN BY TRANSACTION XXX"
> >>Well this is certainly interesting. What do we think it would take to
> >>enable the functionality?
> >
> >First we must run the query in serializable mode and replace the
> >snapshot with a synthetic one, which defines visibility at the start of
> >the desired transaction
> >
> >probably it is a good idea to take a lock on all tables involved to
> >avoid a vacuum to be started on them when the query is running.
> Would the xmin exported by that transaction prevent vacuum from removing
> any tuples still needed for the flashback snapshot?

Sure, and that makes the mentioned lock unnecessary.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: New feature request: FlashBack Query

From
"Zeugswetter Andreas ADI SD"
Date:
> > Well this is certainly interesting. What do we think it
> would take to
> > enable the functionality?
>
> First we must run the query in serializable mode and replace
> the snapshot with a synthetic one, which defines visibility
> at the start of the desired transaction

We could use something that controls "global xmin".
It would ensure, that global xmin does not advance bejond
what still needs to be visible. This would probably be a
sliding time window, or a fixed point in time that is
released by the dba/user.

Then all below is not really different from a situation where
you had a long running tx.

> probably it is a good idea to take a lock on all tables
> involved to avoid a vacuum to be started on them when the
> query is running.
>
> also, we can't trust the DELETED flags in index pages, so we
> should forbid index scans, or just always re-check the
> visibility in heap.
>
> Otherways it would probably be enough to just scan tuples as
> usual, and check if they were visible to desired transaction,
> that is they were inserted before that transaction and they
> are not deleted before that trx.
>
> Of course this will not be true, once we have HOT/WIP with
> in-page vacuuming.

Currently I think HOT does honor "global xmin". There is no
lookup for relevant xids, so parts of an update chain where
only a previous tuple or a later tuple can be visible are reused.
Else Hot would need to be told not to, in a scenario where
a backend can choose a snapshot at will.

Andreas


Re: New feature request: FlashBack Query

From
"Zeugswetter Andreas ADI SD"
Date:
> > >First we must run the query in serializable mode and replace the
> > >snapshot with a synthetic one, which defines visibility at the
start
> > >of the desired transaction
> > >
> > >probably it is a good idea to take a lock on all tables involved to

> > >avoid a vacuum to be started on them when the query is running.
> > Would the xmin exported by that transaction prevent vacuum from
> > removing any tuples still needed for the flashback snapshot?
>
> Sure, and that makes the mentioned lock unnecessary.

Problem is, that that transaction sets a historic snapshot at a later
time, so it is not yet running when vacuum looks at "global xmin".
So something else needs to hold up global xmin (see prev post).

Andreas


Re: New feature request: FlashBack Query

From
"Florian G. Pflug"
Date:
Zeugswetter Andreas ADI SD wrote:
>>>> First we must run the query in serializable mode and replace the 
>>>> snapshot with a synthetic one, which defines visibility at the
> start 
>>>> of the desired transaction
>>>>
>>>> probably it is a good idea to take a lock on all tables involved to
> 
>>>> avoid a vacuum to be started on them when the query is running.
>>> Would the xmin exported by that transaction prevent vacuum from 
>>> removing any tuples still needed for the flashback snapshot?
>> Sure, and that makes the mentioned lock unnecessary.
> 
> Problem is, that that transaction sets a historic snapshot at a later
> time, so it is not yet running when vacuum looks at "global xmin".
> So something else needs to hold up global xmin (see prev post).

I think to make this flashback stuff fly, you'd need to know the 
earliest xmin that you can still flashback too. Vacuum would advance
that xmin, as soon as it starts working. So the case you'd need to
protect against would be a race condition when you start a vacuum
and a flashback transaction at the same time. But for that, some simple
semaphore should suffice, and a well-thought-out ordering of the actions
taken.

In the long run, you'd probably want to store the commit-times of 
transactions somewhere, and add some guc that makes a vacuum assume
that recently comitted transaction (say, in the last hour) are still
considered active. That allow the dba to guarantee that he can always
flashback at least a hour.

greetings, Florian Pflug


Re: New feature request: FlashBack Query

From
Gregory Stark
Date:
"Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes:

>> First we must run the query in serializable mode and replace 
>> the snapshot with a synthetic one, which defines visibility 
>> at the start of the desired transaction
>
> We could use something that controls "global xmin".
> It would ensure, that global xmin does not advance bejond
> what still needs to be visible. This would probably be a 
> sliding time window, or a fixed point in time that is
> released by the dba/user.

Well there's another detail you have to cover aside from rolling back your
xmin. You have to find the rest of the snapshot including knowing what other
transactions were in-progress at the time you want to flash back to.

If you just roll back xmin and set xmax to the same value you'll get a
consistent view of the database but it may not match a view that was ever
current. That is, some of the transactions after the target xmin may have
committed before that xmin. So there was never a time in the database when
they were invisible but your new xmin was visible.

I think to do this you'll need to periodically record a snapshot and then
later restore one of those saved snapshots. Not sure where would be a good
place to record them. The WAL seems like a handy place but digging through the
WAL would be annoying.

Incidentally this is one of the things that would be useful for read-only
access to PITR warm standby machines.


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: New feature request: FlashBack Query

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, Feb 19, 2007 at 04:00:09PM +0100, Florian G. Pflug wrote:
[...]
> In the long run, you'd probably want to store the commit-times of 
> transactions somewhere, and add some guc that makes a vacuum assume
> that recently comitted transaction (say, in the last hour) are still
> considered active [...]

Funny how some things recur:
 <http://archives.postgresql.org/pgsql-hackers/2007-01/msg01301.php>

(says I and seeks shelter beneath a big rock ;-)

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFF2c3RBcgs9XrR2kYRAh1PAJ442IXzr0CjN0w5a3BpwBrKgVGvsgCcCmyh
mnM5AUTHo4uIZ/WCnWxLVM0=
=1aUG
-----END PGP SIGNATURE-----



Re: New feature request: FlashBack Query

From
August Zajonc
Date:
Gregory Stark wrote:
> "Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes:
> 
>>> First we must run the query in serializable mode and replace 
>>> the snapshot with a synthetic one, which defines visibility 
>>> at the start of the desired transaction
>> We could use something that controls "global xmin".
>> It would ensure, that global xmin does not advance bejond
>> what still needs to be visible. This would probably be a 
>> sliding time window, or a fixed point in time that is
>> released by the dba/user.
> 
> Well there's another detail you have to cover aside from rolling back your
> xmin. You have to find the rest of the snapshot including knowing what other
> transactions were in-progress at the time you want to flash back to.
> 
> If you just roll back xmin and set xmax to the same value you'll get a
> consistent view of the database but it may not match a view that was ever
> current. That is, some of the transactions after the target xmin may have
> committed before that xmin. So there was never a time in the database when
> they were invisible but your new xmin was visible.
> 
>[...]
> Incidentally this is one of the things that would be useful for read-only
> access to PITR warm standby machines.
> 

Couldn't you define things simply to be that you get a consistent view
including all transactions started before x transaction? This is time
travel lite, but low overhead which I think is a key benefit of this
approach.

A huge value for this would be in the oops, I deleted my data category.
Postgresql rarely looses data, but clients seem to have a habit of doing
so, and then going oops. This seems to happen most often when facing
something like a reporting deadline where they are moving lots of stuff
around and making copies and sometimes delete the wrong "company"
recordset or equivalent, even with confirmation dialogs at the app level.

This would give a quick and easy oops procedure to the client. DBA set's
guc to 1hr, tells client, if you make a big mistake, stop database
server as follows and call. Frankly, would bail a few DBA's out as well.

The key is how lightweight the setup could be, which matters because
clients are not always willing to pay for a PITR setup. The low overhead
would mean you'd feel fine about setting guc to 1hr or so.

As a % of total installed instances I suspect the % with PITR is small.
I've got stuff I snapshot nightly, but that's it. So don't have an easy
out from the oops query either.

- August







Re: New feature request: FlashBack Query

From
"Florian G. Pflug"
Date:
August Zajonc wrote:
> Gregory Stark wrote:
>> "Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes:
>>
>>>> First we must run the query in serializable mode and replace 
>>>> the snapshot with a synthetic one, which defines visibility 
>>>> at the start of the desired transaction
>>> We could use something that controls "global xmin".
>>> It would ensure, that global xmin does not advance bejond
>>> what still needs to be visible. This would probably be a 
>>> sliding time window, or a fixed point in time that is
>>> released by the dba/user.
>> Well there's another detail you have to cover aside from rolling back your
>> xmin. You have to find the rest of the snapshot including knowing what other
>> transactions were in-progress at the time you want to flash back to.
>>
>> If you just roll back xmin and set xmax to the same value you'll get a
>> consistent view of the database but it may not match a view that was ever
>> current. That is, some of the transactions after the target xmin may have
>> committed before that xmin. So there was never a time in the database when
>> they were invisible but your new xmin was visible.
>>
>> [...]
>> Incidentally this is one of the things that would be useful for read-only
>> access to PITR warm standby machines.
>>
> 
> Couldn't you define things simply to be that you get a consistent view
> including all transactions started before x transaction? This is time
> travel lite, but low overhead which I think is a key benefit of this
> approach.

I was thinking along the same line. Flashback is probably ony really
usefull on databases that are mostly read-only, but with a few users
who update data. You'd use flashback to undo catastrophic changes done
by accident, and probably will gladly accept that you undo a little
more work than strictly necessary.

On the contrary, if you're running a online shop were people buy stuff
24/7, and, say, somebody accidentally deletes some producs, than you
won't want to loose the orders happened during that last hour, but will
rather try to regenerate that products from your last backup.

So I don't think that it's too important what snapshot you get exactly,
making the xmin=xmax idea feasable.

The same holds true for PITR warm standby (readonly queries on pitr 
slaves). This would be used for reporting, or load-balancing of searches
in fairly static data - all of which won't depend on the exact snapshot 
you get.

greetings, Florian Pflug



Re: New feature request: FlashBack Query

From
"August Zajonc"
Date:
On Mon, 19 Feb 2007 20:30:59 +0100, "Florian G. Pflug" <fgp@phlo.org>
said:
> August Zajonc wrote:
> > Gregory Stark wrote:
> > 
> > Couldn't you define things simply to be that you get a consistent view
> > including all transactions started before x transaction? This is time
> > travel lite, but low overhead which I think is a key benefit of this
> > approach.
> 
> I was thinking along the same line. Flashback is probably ony really
> usefull on databases that are mostly read-only, but with a few users
> who update data. You'd use flashback to undo catastrophic changes done
> by accident, and probably will gladly accept that you undo a little
> more work than strictly necessary.
> 
> On the contrary, if you're running a online shop were people buy stuff
> 24/7, and, say, somebody accidentally deletes some producs, than you
> won't want to loose the orders happened during that last hour, but will
> rather try to regenerate that products from your last backup.

Hopefully people doing order systems are using PITR or similar :) 

For the time travel light case, it's just a matter of clear definition.
You get all transactions that were *started* before and up to x trx. If
the transaction rolled back you still won't see it, so you're still
getting a consistent view. But if it committed after your marker you
will see it. That seems ok to me. In fact, I suspect folks think of
transactions as happening more or less when they get sent to the DB, so
this may map more directly to what people expect.

The one caveat would be that if you started a long running transaction,
then did the oops trx 5 minutes later, and then started time travel
*before* the long running trx committed. In that case you wouldn't see
that long running trx, so the definition would need to be modified to be
something like all trx started before x, that were no longer running
when you time travel. Don't know if it is worth a NOTICE in the logs if
you time travel back, but there are id's of transactions from before
your xmin that are still running (and if you waited a bit might become
visable in your time travel view). 

If Jan gets his way with a timestamp on trx commit, then you can do
started before x time, which may be more user friendly. 

For PITR I'd imagine you might actually be able to get the visability
right no? Havn't looked deeply enough into the wal logs to understand
how the partial playback scanario works. If the wal logs are ordered on
trx commit time, then you'd get proper visability. 

- August


Re: New feature request: FlashBack Query

From
Tom Lane
Date:
August Zajonc <augustz@augustz.com> writes:
> The key is how lightweight the setup could be, which matters because
> clients are not always willing to pay for a PITR setup. The low overhead
> would mean you'd feel fine about setting guc to 1hr or so.

This would have exactly the same performance consequences as always
having an hour-old open transaction.  I'm afraid that describing it
as "low overhead" is mere wishful thinking: it would cripple vacuuming
of high-update tables and greatly increase the typical load on pg_clog
and pg_subtrans.  We already know that pg_subtrans contention can be a
source of context-swap storms, with the size of the window back to
GlobalXmin being the controlling factor for how bad it gets.

It's possible that this last could be addressed by separating the
concept of "old enough to be vacuumed" from GlobalXmin, but it's
certainly not a trivial thing.
        regards, tom lane


Re: New feature request: FlashBack Query

From
"Jonah H. Harris"
Date:
On 2/17/07, Joshua D. Drake <jd@commandprompt.com> wrote:
> My understanding is that the main difference is that rollbacks are
> inexpensive for us, but expensive for Oracle.

Yes, Oracle is optimized for COMMIT, we're optimized for ROLLBACK :)

In all seriousness, last time I checked Oracle's MVCC was covered by
two patents.

-- 
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation            | fax: 732.331.1301
33 Wood Ave S, 3rd Floor            | jharris@enterprisedb.com
Iselin, New Jersey 08830            | http://www.enterprisedb.com/


Re: New feature request: FlashBack Query

From
Gregory Stark
Date:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:

> On 2/17/07, Joshua D. Drake <jd@commandprompt.com> wrote:
>> My understanding is that the main difference is that rollbacks are
>> inexpensive for us, but expensive for Oracle.
>
> Yes, Oracle is optimized for COMMIT, we're optimized for ROLLBACK :)

I used to say that too but I've since realized it's not really true. It's more
like Oracle is optimized for data that's committed long in the past and we're
optimized for data that's been recently updated. 

In Oracle the data that's been committed long in the past requires no
transactional overhead but the data that's been recently updated requires lots
of work to fetch the right version. 

In Postgres it's the other way around. data that's been committed deleted long
ago requires extra work to clean up but data that's been recently changed
requires little additional work to see the correct version.

In a sense then it's the opposite of what we usually say. Oracle is optimized
for mostly static data. Postgres is optimized for changing data.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: New feature request: FlashBack Query

From
RPK
Date:
I agree that TimeStamp creates an overhead, but I just want to know if an
accidental update happened to a table and this incident got traced three
days after, what facility PGSQL provide to bring the table to its original
condition. You can't wait regretting on why you did not run ROLLBACK before
COMMIT. (Correct me. I am only a user).

When talking about Oracle's technology and that it creates overhead, it is
true, Oracle's database is not for ordinary machines. You can't expect
performance on a normal 256 MB machine with Oracle. But still the more the
options of recovery the best for mission critical environments.

The feature of enabling/disabling TimeStamp logging is acceptable. A user
must be able to decide whether FlashBack type option is needed or not. In
Oracle 10g we can switch off "FlashBack" feature if we are low on resources.
If PGSQL is to be used in a mission-critical situation then no company will
rely on low-end machines. For these type of situations best environment is
chosen and I think PGSQL must have this type of recovery options. PGSQL
installer can ask the user during setup to enable/disable TimeStamp Logging.

Restoring the database from a backup file that was created three days ago is
not feasible. The changes in other tables and the new things created need to
be done again at the price of just undoing the last update on a particular
table.


Warren Turkal-5 wrote:
> 
> On Saturday 17 February 2007 07:49, RPK wrote:
>> PostgreSQL, already a mature database, needs to have more options for
>> recovery as compared to proprietary databases. I just worked with
>> Oracle's
>> FlashBack query feature in Oracle 9i and FlashBack Table feature in 10g.
>>
>> Future versions of PostgreSQL must have similar features which enable
>> users
>> to bring Table(s) and/or Database(s) to a desired Time Stamp.
> 
> Check out my proposal[1] for Temporal extensions. Ultimately, creating
> valid 
> time and transaction time tables would be possible through my proposal.
> Please 
> check it out.
> 
> [1]http://archives.postgresql.org/pgsql-hackers/2007-02/msg00540.php
> 
> wt
> -- 
> Warren Turkal (w00t)
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
> 
> 

-- 
View this message in context: http://www.nabble.com/New-feature-request%3A-FlashBack-Query-tf3245023.html#a9059865
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: New feature request: FlashBack Query

From
"Andrew Dunstan"
Date:
RPK wrote:
>
> I agree that TimeStamp creates an overhead, but I just want to know if an
> accidental update happened to a table and this incident got traced three
> days after, what facility PGSQL provide to bring the table to its original
> condition. You can't wait regretting on why you did not run ROLLBACK
> before
> COMMIT. (Correct me. I am only a user).
>

Why the heck can't you create a reversing transaction? That's what
ordinary mortals do. Demanding unlimited undo at some time that is
arbitrarilly distant in the future strikes me as wholly unreasonable.

What do you mean by "accidental update"? What you really appear to mean is
that a program or a human operator has made an error, and incorrectly told
the database to commit a transaction. The answer surely is to correct the
behaviour of the program or human, rather than wanting the database to
provide an undo facility. Alternatively, this should be handled at the
application layer, using something like table_log.

Some things just don't work well with this sort of facility. Just ask your
bookie if you can undo a bet that you "accidentally" placed with him and
which, three days later, you discover (after the race) was a mistake.


cheers

andrew





Re: New feature request: FlashBack Query

From
"Jonah H. Harris"
Date:
On 2/20/07, Gregory Stark <stark@enterprisedb.com> wrote:
> I used to say that too but I've since realized it's not really true.

Heh, take a joke man... I was following up on Drake's email :)

But, since you want to discuss your view of the systems openly... I'll
gladly reply :)

> It's more like Oracle is optimized for data that's committed
> long in the past and we're optimized for data that's
> been recently updated.

Wrong.  When Oracle says it's committed, it's committed.  No
difference between when, where, and how.  In Oracle, the committed
version is *always* the first presented to the user... it takes time
to go back and look at older versions; but why shouldn't that be a bit
slower, it isn't common practice anyway.  Same with rollbacks... why
should they optimize for them when 97% of transactions commit?

> In Oracle the data that's been committed long in the past requires no
> transactional overhead but the data that's been recently updated requires lots
> of work to fetch the right version.

Wrong.  The same transactional overhead applies to *all* data in
Oracle no matter of when it was committed.  Similarly, the only
overhead required occurs when someone is querying in serializable
isolation or on read-committed data before or during a commit.  On
short OLTP-type transactions, Oracle has the most optimized solution.

> In Postgres it's the other way around. data that's been committed deleted long
> ago requires extra work to clean up but data that's been recently changed
> requires little additional work to see the correct version.

PostgreSQL has little additional work?  Like, checking the validity of
every tuple?  Oracle checks visibility at the block level, so there's
*much* less overhead.  Take most of the benchmarks which can hold ~200
tuples per block.  Tables in those benchmarks are 100+ million rows.
On a sequential scan, Oracle would perform 500K checks, PostgreSQL
would perform *all* 100M checks (not counting dead versions due to row
updates and the like).  On an index scan, Oracle not only has a
smaller index and less to check, but also knows the tuple will be
committed and will, in most cases, not have to perform additional
physical I/O to find the latest version of a row.

Of course, Oracle's design is much more complicated in its ability to
build read-committed versions of the blocks at runtime; something the
simplicity of PostgreSQL's MVCC design eliminates.

> In a sense then it's the opposite of what we usually say. Oracle is optimized
> for mostly static data. Postgres is optimized for changing data.

Care to share an example to prove it?

Like always, there are pros and cons with both designs, but denying
facts gets us nowhere.  We're off-topic now... so we should either
move this off line or to another thread.  I personally don't see much
of a reason to continue discussing MVCC designs anymore as Oracle's is
patented and PostgreSQL's is highly unlikely to change drastically.

As always, I'd suggest discussing improvements, not the status quo.
Likewise, discussing Oracle's design, drawbacks, and limitations
without having used it extensively is quite obvious to anyone familiar
with Oracle.  Don't get me wrong, it's fine to prefer one design to
another, but pushing discussion items comparing Oracle to PostgreSQL
because of things you've heard or read somewhere isn't the same as
understanding them because you've used them.

-- 
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation            | fax: 732.331.1301
33 Wood Ave S, 3rd Floor            | jharris@enterprisedb.com
Iselin, New Jersey 08830            | http://www.enterprisedb.com/


Re: New feature request: FlashBack Query

From
Rod Taylor
Date:
>
> Wrong.  When Oracle says it's committed, it's committed.  No
> difference between when, where, and how.  In Oracle, the committed
> version is *always* the first presented to the user... it takes time
> to go back and look at older versions; but why shouldn't that be a bit
> slower, it isn't common practice anyway.  Same with rollbacks... why
> should they optimize for them when 97% of transactions commit?

Do 97% of transactions commit because Oracle has slow rollbacks and  
developers are working around that performance issue, or because they  
really commit?

I have watched several developers that would prefer to issue numerous  
selects to verify things like foreign keys in the application in  
order to avoid a rollback.

Anyway, I don't have experience with big Oracle applications but I'm  
not so sure that 97% of transactions would commit if rollbacks were  
cheaper.




Re: New feature request: FlashBack Query

From
Hannu Krosing
Date:
Ühel kenal päeval, T, 2007-02-20 kell 10:20, kirjutas Jonah H. Harris:
> On 2/20/07, Gregory Stark <stark@enterprisedb.com> wrote:
> > I used to say that too but I've since realized it's not really true.
> 
> Heh, take a joke man... I was following up on Drake's email :)
> 
> But, since you want to discuss your view of the systems openly... I'll
> gladly reply :)
> 
> > It's more like Oracle is optimized for data that's committed
> > long in the past and we're optimized for data that's
> > been recently updated.
> 
> Wrong.  When Oracle says it's committed, it's committed.  No
> difference between when, where, and how.  In Oracle, the committed
> version is *always* the first presented to the user... it takes time
> to go back and look at older versions; 

Older versions are also committed :)

He probably meant longer transactions and several versions visible to
different backends.

> but why shouldn't that be a bit slower, it isn't common practice anyway.

Not for pure OLAP, at least when you have fairly fast transactions. But
it can slow things down when you have some hotspot tables.

> Same with rollbacks... why
> should they optimize for them when 97% of transactions commit?

Or other way around, - you should write code, where most transactions
commit ;)

> > In Oracle the data that's been committed long in the past requires no
> > transactional overhead but the data that's been recently updated requires lots
> > of work to fetch the right version.
> 
> Wrong.  The same transactional overhead applies to *all* data in
> Oracle no matter of when it was committed.  Similarly, the only
> overhead required occurs when someone is querying in serializable
> isolation or on read-committed data before or during a commit.  On
> short OLTP-type transactions, Oracle has the most optimized solution.
> 
> > In Postgres it's the other way around. data that's been committed deleted long
> > ago requires extra work to clean up but data that's been recently changed
> > requires little additional work to see the correct version.
> 
> PostgreSQL has little additional work?  Like, checking the validity of
> every tuple?  Oracle checks visibility at the block level, so there's
> *much* less overhead. 

Hmm. How can it check visibility at block level and at the same time do
in-place updates on single tuples ?

>  Take most of the benchmarks which can hold ~200
> tuples per block.  Tables in those benchmarks are 100+ million rows.
> On a sequential scan, Oracle would perform 500K checks, PostgreSQL
> would perform *all* 100M checks (not counting dead versions due to row
> updates and the like). 

My proposal of keeping visibility info in a separate heap would help to
get similar results, that is mostly 1 check per page. That would also
cover much of the index lookup cases below.

> On an index scan, Oracle not only has a
> smaller index and less to check, but also knows the tuple will be
> committed and will, in most cases, not have to perform additional
> physical I/O to find the latest version of a row.

It is also the reason why you can forget about doing simultaneous data
loading and queries on the same table. If you know avoid doing that,
then it "will, in most cases, not have to perform additional physical
I/O to find the latest version of a row" ;)

> Of course, Oracle's design is much more complicated in its ability to
> build read-committed versions of the blocks at runtime; something the
> simplicity of PostgreSQL's MVCC design eliminates.
> 
> > In a sense then it's the opposite of what we usually say. Oracle is optimized
> > for mostly static data. Postgres is optimized for changing data.
> 
> Care to share an example to prove it?
> 
> Like always, there are pros and cons with both designs, but denying
> facts gets us nowhere.  We're off-topic now... so we should either
> move this off line or to another thread.  I personally don't see much
> of a reason to continue discussing MVCC designs anymore as Oracle's is
> patented and PostgreSQL's is highly unlikely to change drastically.

I don't think we will ever move to rollback segments, but for some
use-cases moving visibility to a separate heap could make sense.

And if we want to bring back time travel (see another thread about
"Flashback Queries"), then we may end up implementing the original
postgresql's design spec and make VACUUM spihon dead tuples over to
archive relations, which already starts looking a little like rollback
segments, only for other purposes :)

> As always, I'd suggest discussing improvements, not the status quo.
> Likewise, discussing Oracle's design, drawbacks, and limitations
> without having used it extensively is quite obvious to anyone familiar
> with Oracle.

Using a system extensively can also create blind spots about some of the
systems (mis)features. One learns to avoid doing some things without
consciously knowing about it.

> Don't get me wrong, it's fine to prefer one design to
> another, but pushing discussion items comparing Oracle to PostgreSQL
> because of things you've heard or read somewhere isn't the same as
> understanding them because you've used them.

To get a really meaningful discussion we should involve someone who has
*designed* them, not merely used them .

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



Re: New feature request: FlashBack Query

From
Gregory Stark
Date:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:

> On 2/20/07, Gregory Stark <stark@enterprisedb.com> wrote:
>
>> It's more like Oracle is optimized for data that's committed
>> long in the past and we're optimized for data that's
>> been recently updated.
>
> Wrong.  When Oracle says it's committed, it's committed.  No
> difference between when, where, and how.  In Oracle, the committed
> version is *always* the first presented to the user... 

Sure, and if it was committed long in the past then you can use it. If it's
committed recently then you'll have to start looking up rollback data instead.

The rest of your post seems to all be predicated on the idea that if data is
committed then that's all you'll need to look at. But that's missing precisely
the point of what I was saying:

>> In a sense then it's the opposite of what we usually say. Oracle is optimized
>> for mostly static data. Postgres is optimized for changing data.

By "changing data" I meant "data in flux", not the action of making changes to
the data. 

Looking at data in flux in Oracle -- even other data that's unchanged but
lives on the same page as some record that's in flux -- will require you to
look up rollback data and possibly even have to follow many pages of chained
rollback data. Looking at data in Postgres has no additional overhead when
it's data in flux versus old static data.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: New feature request: FlashBack Query

From
August Zajonc
Date:
Tom Lane wrote:
> August Zajonc <augustz@augustz.com> writes:
>> The key is how lightweight the setup could be, which matters because
>> clients are not always willing to pay for a PITR setup. The low overhead
>> would mean you'd feel fine about setting guc to 1hr or so.
> 
> This would have exactly the same performance consequences as always
> having an hour-old open transaction.  I'm afraid that describing it
> as "low overhead" is mere wishful thinking: it would cripple vacuuming
> of high-update tables and greatly increase the typical load on pg_clog
> and pg_subtrans.  We already know that pg_subtrans contention can be a
> source of context-swap storms, with the size of the window back to
> GlobalXmin being the controlling factor for how bad it gets.
> 
> It's possible that this last could be addressed by separating the
> concept of "old enough to be vacuumed" from GlobalXmin, but it's
> certainly not a trivial thing.

Isn't globalxmin for open transactions? I thought the idea was that
everything goes as usual, but you can flip a knob and say that vacuum
doesn't vacuum anything more recent then GlobalXmin less x transactions.

Then you can look at your transactions per second and get a rough window
to work within. Or if there are timestamps on commits, that would switch
to a time interval more user friendly.

You end up simply delaying when 1hrs worth of transactions gets
vacuumed. For folks doing nightly cron job vacuums, not too bad.
Autovacuum isn't on by default :)

Of course, this will be clumsy if not per database.

But the thought might be to take advantage of the flashback data already
present under the MVCC model as long as vacuum hasn't hit things (and
being willing to stop activity on a database etc). Given that you are
delaying a vacuum rather then being more aggressive, and know you can
already vacuum up to a more recent transaction xmin, I dunno... Does
anything depend (other then performance) on vacuum actually vacuuming as
far as it can?

- August


Re: New feature request: FlashBack Query

From
RPK
Date:
Andrew,

>> Demanding unlimited undo at some time that is arbitrarilly distant in the
>> future strikes me as wholly unreasonable. 

I did not mean asking for undo from a life-time log. Since FlashBack
Technology is already there, I just mean that world's most advanced database
(PostgreSQL, as they say), must have an optimized way for undoing of at
least a week changes. A week log is enough and PostgreSQL can keep on
removing old logs automatically.

Secondly, it must be left to the user to decide for the number of days of
archive he want to store. Again upto a week max.


RPK wrote:
>
> I agree that TimeStamp creates an overhead, but I just want to know if an
> accidental update happened to a table and this incident got traced three
> days after, what facility PGSQL provide to bring the table to its original
> condition. You can't wait regretting on why you did not run ROLLBACK
> before
> COMMIT. (Correct me. I am only a user).
>

Why the heck can't you create a reversing transaction? That's what
ordinary mortals do. Demanding unlimited undo at some time that is
arbitrarilly distant in the future strikes me as wholly unreasonable.

What do you mean by "accidental update"? What you really appear to mean is
that a program or a human operator has made an error, and incorrectly told
the database to commit a transaction. The answer surely is to correct the
behaviour of the program or human, rather than wanting the database to
provide an undo facility. Alternatively, this should be handled at the
application layer, using something like table_log.

Some things just don't work well with this sort of facility. Just ask your
bookie if you can undo a bet that you "accidentally" placed with him and
which, three days later, you discover (after the race) was a mistake.


cheers

andrew




---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate      subscribe-nomail command to
majordomo@postgresql.orgso that your      message can get through to the mailing list cleanly
 



-- 
View this message in context: http://www.nabble.com/New-feature-request%3A-FlashBack-Query-tf3245023.html#a9067564
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: New feature request: FlashBack Query

From
Tom Lane
Date:
RPK <rohitprakash123@indiatimes.com> writes:
> I did not mean asking for undo from a life-time log. Since FlashBack
> Technology is already there, I just mean that world's most advanced database
> (PostgreSQL, as they say), must have an optimized way for undoing of at
> least a week changes.

You're living in a dream world.  Do you know any Oracle DBs who keep
enough rollback segments to go back a week?
        regards, tom lane


Re: New feature request: FlashBack Query

From
Theo Schlossnagle
Date:
On Feb 20, 2007, at 1:40 PM, Tom Lane wrote:

> RPK <rohitprakash123@indiatimes.com> writes:
>> I did not mean asking for undo from a life-time log. Since FlashBack
>> Technology is already there, I just mean that world's most  
>> advanced database
>> (PostgreSQL, as they say), must have an optimized way for undoing  
>> of at
>> least a week changes.
>
> You're living in a dream world.  Do you know any Oracle DBs who keep
> enough rollback segments to go back a week?

Ours go for a good 6 hours sometimes :-D

// Theo Schlossnagle
// Esoteric Curio: http://www.lethargy.org/~jesus/



Re: New feature request: FlashBack Query

From
"Jonah H. Harris"
Date:
On 2/20/07, Rod Taylor <rod.taylor@gmail.com> wrote:
> Do 97% of transactions commit because Oracle has slow rollbacks and
> developers are working around that performance issue, or because they
> really commit?

Again, off-topic, but 97% of all transactions commit according to Jim
Gray and his research... not anything related to Oracle.

> I have watched several developers that would prefer to issue numerous
> selects to verify things like foreign keys in the application in
> order to avoid a rollback.

That's just bad development.

> Anyway, I don't have experience with big Oracle applications but I'm
> not so sure that 97% of transactions would commit if rollbacks were
> cheaper.

Again, stats not related to Oracle, but databases in general.


-- 
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation            | fax: 732.331.1301
33 Wood Ave S, 3rd Floor            | jharris@enterprisedb.com
Iselin, New Jersey 08830            | http://www.enterprisedb.com/


Re: New feature request: FlashBack Query

From
"Jonah H. Harris"
Date:
On 2/20/07, Hannu Krosing <hannu@skype.net> wrote:
> He probably meant longer transactions and several versions visible to
> different backends.

Yes, he may have... but I was responding to the statements he made.

> > but why shouldn't that be a bit slower, it isn't common practice anyway.
>
> Not for pure OLAP, at least when you have fairly fast transactions. But
> it can slow things down when you have some hotspot tables.

True, but hotspots are hotspots and no matter what caused them or
where they are, they slow down performance in one area or another.
Limiting hotspots is generally an application-level design decision
anyway.

> > Same with rollbacks... why
> > should they optimize for them when 97% of transactions commit?
>
> Or other way around, - you should write code, where most transactions
> commit ;)

That's what I said, Oracle shouldn't optimize for rollbacks when most
transactions commit.

> Hmm. How can it check visibility at block level and at the same time do
> in-place updates on single tuples ?

In most cases, the block-level SCN determines transaction-level
visibility.  Now, row locks can exist within that page, but they don't
determine visibility... they determine the UNDO location which
contains the data required to rebuild a read-consistent version of the
block.

> My proposal of keeping visibility info in a separate heap would help to
> get similar results, that is mostly 1 check per page. That would also
> cover much of the index lookup cases below.

Most definitely.

> I don't think we will ever move to rollback segments, but for some
> use-cases moving visibility to a separate heap could make sense.

Yes.

> And if we want to bring back time travel (see another thread about
> "Flashback Queries"), then we may end up implementing the original
> postgresql's design spec and make VACUUM spihon dead tuples over to
> archive relations, which already starts looking a little like rollback
> segments, only for other purposes :)

Yes.

> Using a system extensively can also create blind spots about some of the
> systems (mis)features. One learns to avoid doing some things without
> consciously knowing about it.

I've used 'em all and can certainly name issues with Oracle.  However,
we're discussing improving PostgreSQL, I was responding to Greg's
statements, and I don't see the need to bring up unrelated Oracle
implementation details which will just lead to a general anti-Oracle
discussion.

> To get a really meaningful discussion we should involve someone who has
> *designed* them, not merely used them .

True.  My comment is more along the lines of uninformed discussion
which leads to Oracle-bashing.  Those who have at least used and
administered Oracle in production tend to understand Oracle's design
decisions and related issues better than those who have just heard of
Oracle's issues.

I live in the real world and can admit certain failures of any
database system regardless of which I prefer.  No single database is
best for every task.  I just didn't want the discussion going where it
normally goes, to being one of, "we're right and they're wrong".

Can we move offline or to another thread if we want to continue
discussing Oracle-specifics; otherwise... let's focus on
flashback-like functionality in this thread.

-- 
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation            | fax: 732.331.1301
33 Wood Ave S, 3rd Floor            | jharris@enterprisedb.com
Iselin, New Jersey 08830            | http://www.enterprisedb.com/


Re: New feature request: FlashBack Query

From
August Zajonc
Date:
RPK wrote:
> Andrew,
> 
>>> Demanding unlimited undo at some time that is arbitrarilly distant in the
>>> future strikes me as wholly unreasonable. 
> 
> I did not mean asking for undo from a life-time log. Since FlashBack
> Technology is already there, I just mean that world's most advanced database
> (PostgreSQL, as they say), must have an optimized way for undoing of at
> least a week changes. A week log is enough and PostgreSQL can keep on
> removing old logs automatically.
> 
> Secondly, it must be left to the user to decide for the number of days of
> archive he want to store. Again upto a week max.

You might look at storing delta's or similar (perhaps with a check table
) if you need to change data a week back. Then you can just find the row
representing the problematic change and delete it. If you really want to
track what happens, do deltas and then instead of deleting them, put a
reversing delta in, keeping your entire audit trail.

You can put materialized views on top of this if you need performance.

- August


Re: New feature request: FlashBack Query

From
Csaba Nagy
Date:
> Do 97% of transactions commit because Oracle has slow rollbacks and  
> developers are working around that performance issue, or because they  
> really commit?
> 
> I have watched several developers that would prefer to issue numerous  
> selects to verify things like foreign keys in the application in  
> order to avoid a rollback.

Most of the code we have will not afford a rollback because it can be
part of a much bigger transaction which would have much higher
performance penalty if retried than a simple rollback. And you know that
in postgres you can't roll back just the last insert, you will crash the
whole transaction with it... and it's simply a performance bottleneck to
retry in a high contention scenario (which is usually so in our case).

So I would say we don't avoid rollbacks because of the cost of the
rollback, but because of the cost of the retry...

Cheers,
Csaba.




Re: New feature request: FlashBack Query

From
"Florian G. Pflug"
Date:
Theo Schlossnagle wrote:
> 
> On Feb 20, 2007, at 1:40 PM, Tom Lane wrote:
> 
>> RPK <rohitprakash123@indiatimes.com> writes:
>>> I did not mean asking for undo from a life-time log. Since FlashBack
>>> Technology is already there, I just mean that world's most advanced 
>>> database
>>> (PostgreSQL, as they say), must have an optimized way for undoing of at
>>> least a week changes.
>>
>> You're living in a dream world.  Do you know any Oracle DBs who keep
>> enough rollback segments to go back a week?
> 
> Ours go for a good 6 hours sometimes :-D

Eeven if it's just one hour, it's certainly better than nothing.
I fully agree that I'd not be acceptable to introduce performance
problems for _everyone_ by introducing flashback. But if you only
experience a drop in performance if you actually enable flashback
(by, let's say setting vacuum_min_deadtime=1h), then I don't see
why anyone would object to having support for some kind of flashback.

However, I just realized that doing this is much harder than I initially
thought, because catalog access always happens with SnapshotNow, and
e.g. "drop table" deletes datafiles at commit time, and not during vacuum.

Supporting "begin; drop table mytable; commit; begin; set transaction 
flashback 1 hour; select * from mytable; commit" would therefore be
really hard...

greetings, Florian Pflug


Re: New feature request: FlashBack Query

From
Alvaro Herrera
Date:
Florian G. Pflug wrote:

> However, I just realized that doing this is much harder than I initially
> thought, because catalog access always happens with SnapshotNow, and
> e.g. "drop table" deletes datafiles at commit time, and not during vacuum.

Not to mention the likenesses of CLUSTER and TRUNCATE, which would need
to be taught how to keep the old datafiles for an additional week/hour.

What I don't understand is why people isn't working in improving
contrib/spi/timetravel.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: New feature request: FlashBack Query

From
"Florian G. Pflug"
Date:
Alvaro Herrera wrote:
> Florian G. Pflug wrote:
> 
>> However, I just realized that doing this is much harder than I initially
>> thought, because catalog access always happens with SnapshotNow, and
>> e.g. "drop table" deletes datafiles at commit time, and not during vacuum.
> 
> Not to mention the likenesses of CLUSTER and TRUNCATE, which would need
> to be taught how to keep the old datafiles for an additional week/hour.
> 
> What I don't understand is why people isn't working in improving
> contrib/spi/timetravel.

Because it serves different usecase I think - flashback is an 
administrative tool, not something you design your application around.
Flashback is more similar to PITR recovery than to contrib/spi/timetravel.

greetings, Florian Pflug


Re: New feature request: FlashBack Query

From
August Zajonc
Date:
Florian G. Pflug wrote:
> Alvaro Herrera wrote:
>> Florian G. Pflug wrote:
>>
>>> However, I just realized that doing this is much harder than I initially
>>> thought, because catalog access always happens with SnapshotNow, and
>>> e.g. "drop table" deletes datafiles at commit time, and not during
>>> vacuum.
>>
>> Not to mention the likenesses of CLUSTER and TRUNCATE, which would need
>> to be taught how to keep the old datafiles for an additional week/hour.
>>
>> What I don't understand is why people isn't working in improving
>> contrib/spi/timetravel.
> 
> Because it serves different usecase I think - flashback is an
> administrative tool, not something you design your application around.
> Flashback is more similar to PITR recovery than to contrib/spi/timetravel.

Drat. I remember when truncate wasn't even transaction safe, but I think
it was since cut so that the non-rollbackable portion happened after
commit.

Ultimately, anything that changed data would need to basically deferred
into the vacuum or other cycle. Basically, super MVCC, a truncate would
basically do the tuple type action on the underlying files. Catalog
stuff too, HOT all would need those semantics. Not doable.

A lot of places that grab an AccessExclusiveLock are probably subject to
this issue.

Unless there was a bog standard way of doing this, and I don't see a
good option, no go.

So fun to think about and probably all sorts of neat things you could do
with super MVCC, TRUNCATE a table with open transactions concurrent,
but way too work for the gain of this tiny use feature...

contrib/timetravel I think has some of these same issues (ie, drop
table, can you still time travel?) along with a fair bit of trigger
based overhead...

- August