Re: GetOldestXmin going backwards is dangerous after all - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: GetOldestXmin going backwards is dangerous after all |
Date | |
Msg-id | CA+TgmobX8fO1mXXWMbr_TpvHyVH9K2i5xjz=Xf48CxcWjD1UrA@mail.gmail.com Whole thread Raw |
In response to | Re: GetOldestXmin going backwards is dangerous after all (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Fri, Feb 1, 2013 at 7:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Having said that, I agree that a fix in GetOldestXmin() would be nice >> if we could find one, but since the comment describes at least three >> different ways the value can move backwards, I'm not sure that there's >> really a practical solution there, especially if you want something we >> can back-patch. > > Actually, wait a second. As you say, the comment describes three known > ways to make it go backwards. It strikes me that all three are fixable: > > * if allDbs is FALSE and there are no transactions running in the current > * database, GetOldestXmin() returns latestCompletedXid. If a transaction > * begins after that, its xmin will include in-progress transactions in other > * databases that started earlier, so another call will return a lower value. > > The reason this is a problem is that GetOldestXmin ignores XIDs of > processes that are connected to other DBs. It now seems to me that this > is a flat-out bug. It can ignore their xmins, but it should include > their XIDs, because the point of considering those XIDs is that they may > contribute to the xmins of snapshots computed in the future by processes > in our own DB. And snapshots never exclude any XIDs on the basis of > which DB they're in. (They can't really, since we can't know when the > snap is taken whether it might be used to examine shared catalogs.) This idea is superficially appealing, but I can't get excited about it, because it will cause GetOldestXmin() to compute older (that is, more conservative) XIDs than it does today, which means we'll vacuum away less. Considering the amount of pain we have around vacuum today, that's surely got to be going the wrong direction. I'm inclined to think that the root of the problem is that the computation is too conservative already. I mean, what do we really need to include in the computation in order not to vacuum away any data that's still needed? There are basically two concerns: - You can't remove data that any snapshot in the current database might try to read. So you have to include the xmins of those snapshots in your oldest-xmin value. But, not really, because those xmins were computed by looking at all running XIDs across the whole database, and if their oldest xmin value came from some XID in another database, it's not relevant. What you want (but of course, can't get) is the db-specific xmin horizon for each other snapshot. - You have to worry about new snapshots that are about to appear but haven't yet. But, really, how big of a concern is this? No transactions can exit the running set while we hold ProcArrayLock, so it seems to me that there ought to be a pretty tight limit on what can show up here. For example, if we've already taken a snapshot, surely a subsequently taken snapshot can't really need an older xmin horizon than the one we computed for our own snapshot. In theory, it seems like we might be able to solve these problems by having each backend advertise TWO xmin horizons: one representing the oldest xmin it might be able to see in a system catalog, and the other representing the oldest xmin it might be able to see in its own database. This would allow significantly more precise tracking of what is really needed in the current database, but there's an obvious implementation concern, which is that GetSnapshotData is hot enough that individual instructions matter. > * There are also replication-related effects: a walsender > * process can set its xmin based on transactions that are no longer running > * in the master but are still being replayed on the standby, thus possibly > * making the GetOldestXmin reading go backwards. In this case there is a > * possibility that we lose data that the standby would like to have, but > * there is little we can do about that --- data is only protected if the > * walsender runs continuously while queries are executed on the standby. > * (The Hot Standby code deals with such cases by failing standby queries > * that needed to access already-removed data, so there's no integrity bug.) > > This is just bogus. Why don't we make it a requirement on walsenders > that they never move their advertised xmin backwards (or initially set > it to less than the prevailing global xmin)? There's no real benefit to > allowing them to try to move the global xmin backwards, because any data > that they might hope to protect that way could be gone already. I think there is already some code that aims to do this, but I might be misremembering. IOW, this comment might already be incorrect. > * The return value is also adjusted with vacuum_defer_cleanup_age, so > * increasing that setting on the fly is another easy way to make > * GetOldestXmin() move backwards, with no consequences for data integrity. > > And as for that, it's been pretty clear for awhile that allowing > vacuum_defer_cleanup_age to change on the fly was a bad idea we'd > eventually have to undo. The day of reckoning has arrived: it needs > to be PGC_POSTMASTER. We might even want to consider getting rid of it altogether at this point, but neither that nor a change to PGC_POSTMASTER seems a likely candidate for back-patching. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: