Re: Hot Standby dev build (v8) - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Hot Standby dev build (v8) |
Date | |
Msg-id | 1232359728.2327.6.camel@ebony.2ndQuadrant Whole thread Raw |
In response to | Re: Hot Standby dev build (v8) (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: Hot Standby dev build (v8)
|
List | pgsql-hackers |
On Mon, 2009-01-19 at 09:16 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Fri, 2009-01-16 at 22:09 +0200, Heikki Linnakangas wrote: > > > >>>> RecentGlobalXmin is just a hint, it lags behind the real oldest xmin > >>>> that GetOldestXmin() would return. If another backend has a more recent > >>>> RecentGlobalXmin value, and has killed more recent tuples on the page, > >>>> the latestRemovedXid written here is too old. > >>> What do you think we should do instead? > >> Dunno. Maybe call GetOldestXmin(). > > > > We are discussing btree deletes, not btree vacuums. > > Pardon my ignorance, but what's the difference? In terms of current HEAD, not much. In terms of Hot Standby, a significant difference - the two actions have been split, rather than continuing to share the same WAL record. XLOG_BTREE_VACUUM removes index tuples as a result of a vacuum. The initial scan of the heap already generated an XLOG_HEAP2_CLEANUP_INFO which gives the latestRemovedXid for that vacuum. So we don't need to worry about putting a latestRemovedXid on XLOG_BTREE_VACUUM. The WAL records also differ because the XLOG_BTREE_VACUUM contains details of blocks that need to be pinned but not otherwise touched. XLOG_BTREE_DELETE is different in 3 ways. It isn't part of a vacuum, so: * we don't need to take a cleanup lock * it doesn't contain info about other blocks we need to scan beforehand for correctness purposes * it wasn't preceded by an XLOG_HEAP2_CLEANUP_INFO record, so it must have a *correct* (even if too conservative) value for latestRemovedXid set. So the only time we need to set latestRemovedXid correctly is during a normal transaction, not during a vacuum. > > If we are doing > > btree delete then we have an unreleased snapshot therefore we also have > > a non-zero xmin. How can another backend have a later RecentGlobalXmin > > or result from GetOldestXmin() than we do? > > Sure it can, for example: > > 1. Transaction 1 begins in backend A > 2. Transaction 2 begins in backend B, xmin = 1 > 3. Transaction 1 ends > 4. Transaction 3 begins in backend C, xmin = 2 > 5. Backend C gets snapshot, TransactionXmin = 2, RecentGlobalXmin = 1 > 6. Transaction 2 ends. > 7. Transaction 4 begins in backend A, gets snapshot TransactionXmin = 2, > RecentGlobalXmin = 2 > 8. Transaction 4 kills tuple, using its RecentGlobalxmin of 1 > 9. Transaciont 3 splits the page, emits a delete xlog record, setting > latestRemovedXid to its RecentGlobalXmin of 2 Well, steps 7 and 8 don't make sense. Your earlier comment was that it was possible for a WAL record to be written with a RecentGlobalXmin that was lower than other backends values. In step 9 the RecentGlobalXmin is *not* lower than any other backend, it is the same. So if there is a proof, this isn't it. But I can't see how there can be one: Two concurrent vacuums can have different OldestXmin values, but two concurrent transactions cannot. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
pgsql-hackers by date: