On Thu, 2008-09-25 at 12:34 +0200, Zeugswetter Andreas OSB sIT wrote:
> > > I wonder whether the cancel can be delayed until a tuple/page is actually accessed
> > > that shows a too new xid.
> >
> > Yes, its feasible and is now part of the design.
> >
> > This is all about what happens *if* we need to remove rows that a query
> > can still see.
>
> I was describing a procedure for exactly that case.
Ok, I see, sorry.
> If a slave backend has a snapshot that we cannot guarantee any more
> (because max_slave_delay has been exceeded):
>
> > > Instead of cancel, the backend gets a message with a lsn_horizon.
> > > From there on, whenever the backend reads a page/tuple with a LSN > lsn_horizon it cancels.
>
> but not before (at the time max_slave_delay has been reached), as described earlier.
Like that.
OK, so in full:
Each WAL record that cleans tuples has a latestxid on it. If latestxid
is later than a running query on standby then we wait. When we stop
waiting we tell all at-risk queries the LSN of the first WAL record that
has potentially removed tuples they can see. If they see a block with a
higher LSN they cancel *themselves*.
This works OK, since SeqScans never read blocks at end of file that
didn't exist when they started, so long queries need not be cancelled
when they access growing tables.
That combines all the suggested approaches into one. It still leaves the
possibility of passing the standby's OldestXmin to the primary, but does
not rely upon it.
-- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support