Re: snapshot too old, configured by time - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: snapshot too old, configured by time
Date
Msg-id CAPpHfds0Qey08fmu=fF_89do2FeEE2hpGOHYogn4m0+uomw_VQ@mail.gmail.com
Whole thread Raw
In response to Re: snapshot too old, configured by time  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Sat, Apr 23, 2016 at 5:20 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Sat, Apr 23, 2016 at 12:48:08PM +0530, Amit Kapila wrote:
> On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian <bruce@momjian.us> wrote:
> >
> > I kind of agreed with Tom about just aborting transactions that held
> > snapshots for too long, and liked the idea this could be set per
> > session, but the idea that we abort only if a backend actually touches
> > the old data is very nice.  I can see why the patch author worked hard
> > to do that.
> >
> > How does/did Oracle handle this?
> >
>
> IIRC then Oracle gives this error when the space in undo tablespace (aka
> rollback segment) is low.  When the rollback segment gets full, it overwrites
> the changed data which might be required by some old snapshot and when that old
> snapshot statement tries to access the data (which is already overwritten), it
> gets "snapshot too old" error.  Assuming there is enough space in rollback
> segment, Oracle seems to provide a way via Alter System set undo_retention =
> <time_in_secs>. 
>
> Now, if the above understanding of mine is correct, then I think the current
> implementation done by Kevin is closer to what Oracle provides.

But does the rollback only happen if the long-running Oracle transaction
tries to _access_ specific data that was in the undo segment, or _any_
data that potentially could have been in the undo segment?  If the
later, it seems Kevin's approach is better because you would have to
actually need to access old data that was there to be canceled, not just
any data that could have been overwritten based on the xid.
 
I'm not sure that we should rely that much on Oracle behavior.  It has very different MVCC model.
Thus we can't apply same features one-by-one: they would have different pro and cons for us.

Also, it seems we have similar behavior already in applying WAL on the
standby --- we delay WAL replay when there is a long-running
transaction.  Once the time expires, we apply the WAL.  Do we cancel the
long-running transaction at that time, or wait for the long-running
transaction to touch some WAL we just applied?  If the former, does
Kevin's new code allow us to do the later?

That makes sense for me.  If we could improve read-only queries on slaves this way, Kevin's new code becomes much more justified.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

pgsql-hackers by date:

Previous
From: Christian Ullrich
Date:
Subject: Re: pgsql: Add putenv support for msvcrt from Visual Studio 2013
Next
From: Amit Kapila
Date:
Subject: Re: Move PinBuffer and UnpinBuffer to atomics