Re: Proposal for CSN based snapshots - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: Proposal for CSN based snapshots |
Date | |
Msg-id | 20140710212531.GW6390@eldon.alvh.no-ip.org Whole thread Raw |
In response to | Re: Proposal for CSN based snapshots (Heikki Linnakangas <hlinnakangas@vmware.com>) |
List | pgsql-hackers |
Heikki Linnakangas wrote: > Attached is a new patch. It now keeps the current pg_clog unchanged, > but adds a new pg_csnlog besides it. pg_csnlog is more similar to > pg_subtrans than pg_clog: it's not WAL-logged, is reset at startup, > and segments older than GlobalXmin can be truncated. > > This addresses the disk space consumption, and simplifies pg_upgrade. > > There are no other significant changes in this new version, so it's > still very much WIP. But please take a look! Thanks. I've been looking at this. It needs a rebase; I had to apply to commit 89cf2d52030 (Wed Jul 2 13:11:05 2014 -0400) because of conflicts with the commit after that; it applies with some fuzz. I read the discussion in this thread and the README, to try to understand how this is supposed to work. It looks pretty good. One thing not quite clear to me is the now-static RecentXmin, GetRecentXmin(), AdvanceGlobalRecentXmin, and the like. I found the whole thing pretty confusing. I noticed that ShmemVariableCache->recentXmin is read without a lock in GetSnapshotData(), but exclusive ProcArrayLock is taken to write to it in GetRecentXmin(). I think we can write it without a lock also, since we assume that storing an xid is atomic. With that change, I think you can make the acquisition of ProcArray Lock to walk the pgproc list use shared mode, not exclusive. Another point is that RecentXmin is no longer used directly, except in TransactionIdIsActive(). But that routine is only used for an Assert() now, so I think it's fine to just have GetRecentXmin() return the value and not set RecentXmin as a side effect; my point is that ISTM RecentXmin can be removed altogether, which makes that business simpler. As far as GetOldestXmin() is concerned, that routine now ignores its arguments. Is that okay? I imagine it's just a natural consequence of how things work now. [ ... reads some more code ... ] Oh, now it's only used to determine a freeze limit. I think you should just remove the arguments and make that whole thing simpler. I was going to suggest that AdvanceRecentGlobalXmin() could receive a possibly-NULL Relation argument to pass down to GetOldestSnapshotGuts() which can make use of it for a tigther limit, but since OldestXmin is only for freezing, maybe there is no point in this extra complication. Regarding the pendingGlobalXmin stuff, I didn't find any documentation. I think you just need to write a very long comment on top of AdvanceRecentGlobalXmin() to explain what it's doing. After going over the code half a dozen times I think I understand what's idea, and it seems sound. Not sure about the advancexmin_counter thing. Maybe in some cases sessions will only run 9 transactions or less and then finish, in which case it will only get advanced at checkpoint time, which would be way too seldom. Maybe it would work to place the counter in shared memory: acquire(spinlock); if (ShmemVariableCache->counter++ >= some_number) { SI don't understand this:hmemVariableCache->counter = 0; do_update = true; } release(spinlock); if (do_update) AdvanceRecentGlobalXmin(); (Maybe we can forgo the spinlock altogether? If the value gets out of hand once in a while, it shouldn't really matter) Perhaps the "some number" should be scaled to some fraction of MaxBackends, but not less than X (32?) and no more than Y (128?). Or maybe just use constant 10, as the current code does. But it'd be total number of transactions, not transaction in the current backend. I think it'd be cleaner to have a distinct typedef to use when XLogRecPtr is used as a snapshot representation. Right now there is no clarity on whether we're interested in the xlog position itself or it's a snapshot value. HeapTupleSatisfiesVacuum[X]() and various callers needs update to their comments: when OldestXmin is mentioned, it should be OldestSnapshot. I noticed that SubTransGetTopmostTransaction() is now only called from predicate.c, and it's pretty constrained in what it wants. I'm not implying that we want to do anything in your patch about it, other than perhaps add a note to it that we may want to examine it later for possible changes. I haven't gone over the transam.c, clog.c changes yet. I attach a couple of minor tweaks to the README, mostly for clarity (but also update clog -> csnlog in a couple of places). -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: