Re: snapshot too old issues, first around wraparound and then more. - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: snapshot too old issues, first around wraparound and then more. |
Date | |
Msg-id | 20200401150914.p5amnha5qfbthdrz@alap3.anarazel.de Whole thread Raw |
In response to | Re: snapshot too old issues, first around wraparound and then more. (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: snapshot too old issues, first around wraparound and then more.
Re: snapshot too old issues, first around wraparound and then more. |
List | pgsql-hackers |
Hi, On 2020-04-01 10:01:07 -0400, Robert Haas wrote: > On Wed, Apr 1, 2020 at 2:40 AM Andres Freund <andres@anarazel.de> wrote: > > The problem is that there's no protection again the xids in the > > ringbuffer getting old enough to wrap around. Given that practical uses > > of old_snapshot_threshold are likely to be several hours to several > > days, that's not particularly hard to hit. > > Presumably this could be fixed by changing it to use FullTransactionId. That doesn't exist in all the back branches. Think it'd be easier to add code to explicitly prune it during MaintainOldSnapshotTimeMapping(). > > The problem, as far as I can tell, is that > > oldSnapshotControl->head_timestamp appears to be intended to be the > > oldest value in the ring. But we update it unconditionally in the "need > > a new bucket, but it might not be the very next one" branch of > > MaintainOldSnapshotTimeMapping(). > > I agree, that doesn't look right. It's correct, I think, for the "if > (advance >= OLD_SNAPSHOT_TIME_MAP_ENTRIES)" case, but not in the > "else" case. In the "else" case, it should advance by 1 (wrapping if > needed) each time we take the "if (oldSnapshotControl->count_used == > OLD_SNAPSHOT_TIME_MAP_ENTRIES)" branch, and should remain unchanged in > the "else" branch for that if statement. Yea. > > As far as I can tell, this code has been wrong since the feature has > > been committed. The tests don't show a problem, because none of this > > code is reached when old_snapshot_threshold = 0 (which has no real world > > use, it's purely for testing). > > I'm pretty sure I complained about the fact that only the > old_snapshot_threshold = 0 case was tested at the time this went in, > but I don't think Kevin was too convinced that we needed to do > anything else, and realistically, if he'd tried for a regression test > that ran for 15 minutes, Tom would've gone ballistic. I think it's not just Tom that'd have gone ballistic. I think it's the reason why, as I think is pretty clear, the feature was *never* actually tested. The results of whats being removed are not quite random, but it's not far from it. And there's long stretches of time where it never removes things. It's also a completely self-made problem. There's really no reason at all to have bins of one minute. As it's a PGC_POSTMASTER GUC, it should just have didided time into bins of (old_snapshot_threshold * USEC_PER_SEC) / 100 or such. For a threshold of a week there's no need to keep 10k bins, and the minimum threshold of 1 minute obviously is problematic. > > I really don't know what to do here. The feature never worked and will > > silently cause wrong query results. Fixing it seems like a pretty large > > task - there's a lot more bugs. But ripping out a feature in stable > > branches is pretty bad too. > > I don't know what other bugs there are, but the two you mention above > look fixable. They probably are fixable. But there's a lot more, I think: Looking at TransactionIdLimitedForOldSnapshots() I think the ts == update_ts threshold actually needs to be ts >= update_ts, right now we don't handle being newer than the newest bin correctly afaict (mitigated by autovacuum=on with naptime=1s doing a snapshot more often). It's hard to say, because there's no comments. The whole lock nesting is very hazardous. Most (all?) TestForOldSnapshot() calls happen with locks on on buffers held, and can acquire lwlocks itself. In some older branches we do entire *catalog searches* with the buffer lwlock held (for RelationHasUnloggedIndex()). GetSnapshotData() using snapshot->lsn = GetXLogInsertRecPtr(); as the basis to detect conflicts seems dangerous to me. Isn't that ignoring inserts that are already in progress? > Even if we decide that the feature can't be salvaged, I would vote > against ripping it out in back branches. I would instead argue for > telling people not to use it and ripping it out in master. It currently silently causes wrong query results. There's no infrastructure to detect / protect against that (*). I'm sure we can fix individual instances of problems. But I don't know how one is supposed to verify that the fixes actually work. There's currently no tests for the actual feature. And manual tests are painful due to the multi-minute thresholds needed, and it's really hard to manually verify that only the right rows are removed due to the feature, and that all necessary errors are thrown. Given e.g. the bugs in my email upthread, there's periods of several minutes where we'd not see any row removed and then periods where the wrong ones would be removed, so the manual tests would have to be repeated numerous times to actually ensure anything. If somebody wants to step up to the plate and fix these, it'd perhaps be more realistic to say that we'll keep the feature. But even if somebody does, I think it'd require a lot of development in the back branches. On a feature whose purpose is to eat data that is still required. I think even if we decide that we do not want to rip the feature out, we should seriously consider hard disabling it in the backbranches. At least I don't see how the fixed code is tested enough to be entrusted with users data. Do we actually have any evidence of this feature ever beeing used? I didn't find much evidence for that in the archives (except Thomas finding a problem). Given that it currently will switch between not preventing bloat and causing wrong query results, without that being noticed... (*) perhaps I just am not understanding the protection however. To me it's not at all clear what: /* * Failsafe protection against vacuuming work of active transaction. * * This is not an assertion because we avoid the spinlock for * performance, leaving open the possibility that xlimit could advance * and be more current; but it seems prudent to apply this limit. It * might make pruning a tiny bit less aggressive than it could be, but * protects against data loss bugs. */ if (TransactionIdIsNormal(latest_xmin) && TransactionIdPrecedes(latest_xmin, xlimit)) xlimit = latest_xmin; if (NormalTransactionIdFollows(xlimit, recentXmin)) return xlimit; actually provides in the way of a protection. > However, > much as I'm not in love with all of the complexity this feature adds, > I don't see the problems you've reported here as serious enough to > justify ripping it out. > > What exactly is the interaction of this patch with your snapshot > scalability work? Post my work there's no precise RecentOldestXmin anymore (since accessing the frequently changing xmin of other backends is what causes a good chunk of the scalability issues). But heap_page_prune_opt() has to determine what to use as the threshold for being able to prune dead rows. Without snapshot_too_old we can initially rely on the known boundaries to determine whether we can prune, and only determine an "accurate" boundary when encountering a prune xid (or a tuple, but that's an optimization) that falls in the range where we don't know for certain we can prune. But that's not easy to do with the way the old_snapshot_threshold stuff currently works. It's not too hard to implement a crude version that just determines an accurate xmin horizon whenever pruning with old_snapshot_threshold set. But that seems like gimping performance for old_snapshot_threshold, which didn't seem nice. Additionally, the current implementation of snapshot_too_old is pretty terrible about causing unnecessary conflicts when hot pruning. Even if there was no need at all for the horizon to be limited to be able to prune the page, or if there was nothing to prune on the page (note that the limiting happens before checking if the space on the page even makes pruning useful), we still cause a conflict for future accesses, because TransactionIdLimitedForOldSnapshots() will SetOldSnapshotThresholdTimestamp() to a recent timestamp. Greetings, Andres Freund
pgsql-hackers by date: