Re: Deriving Recovery Snapshots - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Deriving Recovery Snapshots
Date
Msg-id 1224103863.3808.224.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: Deriving Recovery Snapshots  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Wed, 2008-10-15 at 12:58 -0700, Jeff Davis wrote:
> On Tue, 2008-10-14 at 18:50 +0100, Simon Riggs wrote:
> > I've worked out what I think is a workable, efficient process for
> > deriving snapshots during recovery. I will be posting a patch to show
> > how this works tomorrow [Wed 15 Oct], just doing cleanup now.
> 
> How will this interact with an idea like this?:
> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00400.php

pg_snapclone should work fine, since it is orthogonal to this work.

> > I've had to change the way XidInMVCCSnapshot() works. We search the
> > snapshot even if it has overflowed. This is actually a performance win
> > in cases where only a few xids have overflowed but most haven't. This is
> > essential because if we were forced to check in subtrans *and*
> > unobservedxids existed then the snapshot would be invalid. (I could have
> > made it this way *just* in recovery, but the change seems better both
> > ways).
> 
> I don't entirely understand this. Can you explain the situation that
> would result in an invalid snapshot?

In recovery the snapshot consists of two sets of xids:
* ones we have seen as running e.g. xid=43
* ones we know exist, but haven't seen yet (e.g. xid=42)
(I call this latter kind Unobserved Transactions).

Both kinds of xids *must* be in the snapshot for MVCC to work.

The current way of checking snapshots is to say "if *any* of the running
transactions has overflowed, check subtrans".

Unobserved transactions are not in subtrans, so if you checked for them
there you would fail to find them. Currently we assume that means it is
a top-level transaction and then check the top-level xids.

Why are unobserved transactions not in subtrans? Because they are
unobserved, so we can't assign their parent xid. (By definition, because
they are unobserved).

There isn't always enough space in the snapshot to allow all the
unobserved xids to be added as if they were top-level transactions, so
we put them into the subtrans cache as a secondary location and then
change the algorithm in XidInMVCCSnapshot(). We don't want to increase
the size of the snapshot because it already contains wasted space in
subtrans cache, nor do we wish to throw errors when people try to take
snapshots. 

The XidInMVCCSnapshot() changes make sense of themselves for most cases,
since we don't want one transaction to cause us to thrash subtrans, as
happened in 8.1.

This took me some time to think through...

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Deriving Recovery Snapshots
Next
From: Tom Lane
Date:
Subject: Re: Is autovacuum too noisy about orphan temp tables?