On Mon, 2009-01-19 at 14:00 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > GetSnapshotData() sets
> > RecentGlobalXmin to the result of the snapshot's xmin.
>
> No. RecentGlobalXmin is set to the oldest *xmin* observed, across all
> running transactions. TransactionXmin is the xid of the oldest running
> transaction. IOW, RecentGlobalXmin is the xid of transaction that the
> oldest running transaction sees as running.
OK. That was fun.
These WAL records are annoying, no matter what the exact value of
latestRemovedXid is and they seem likely to conflict with many queries
on the standby.
If we don't use RecentGlobalXmin then I can't see any easily derived
value that we can use in its place. It isn't worth the effort on the
master to derive a more exact value, not when we don't even know if it
matters yet.
I suggest we handle this on the recovery side, not on the master, by
deriving the xmin at the point the WAL record arrives. We would
calculate it by looking at recovery procs only. That will likely give us
a later value than we would get from the master, but that can't be
helped.
For me, this makes it essential now that I put in place the deferred
cancellation mechanism. Some refactoring in this area is also required
because we need to handle two other types of conflict to correctly
support drop database and drop user, which is now underway.
Btree deletes were an important optimisation when it first went it, but
now we have HOT it is much less important. Another route might be to put
an option to turn off btree delete on the master, default = on. We
probably should consider turning it off entirely when it doesn't yield
significant benefit. Lots of scanning to remove the odd row is probably
pretty wasteful and likely adds contention at the very point we don't
want it - index splits.
Thoughts?
-- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support