Re: snapshot too old issues, first around wraparound and then more. - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: snapshot too old issues, first around wraparound and then more. |
Date | |
Msg-id | 20200403190309.xltgvtkcn5bug5dx@alap3.anarazel.de Whole thread Raw |
In response to | Re: snapshot too old issues, first around wraparound and then more. (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: snapshot too old issues, first around wraparound and then more.
|
List | pgsql-hackers |
Hi, On 2020-04-03 14:32:09 +0530, Amit Kapila wrote: > On Fri, Apr 3, 2020 at 6:52 AM Peter Geoghegan <pg@bowt.ie> wrote: > > > > On Thu, Apr 2, 2020 at 5:17 PM Andres Freund <andres@anarazel.de> wrote: > > > Since this is a feature that can result in wrong query results (and > > > quite possibly crashes / data corruption), I don't think we can just > > > leave this unfixed. But given the amount of code / infrastructure > > > changes required to get this into a working feature, I don't see how we > > > can unleash those changes onto the stable branches. > > > > As per my initial understanding, the changes required are (a) There > seem to be multiple places where TestForOldSnapshot is missing, (b) > TestForOldSnapshot itself need to be reviewed carefully to see if it > has problems, (c) Some of the members of OldSnapshotControlData like > head_timestamp and xid_by_minute are not maintained accurately, (d) > handling of wraparound for xids in the in-memory data-structure for > this feature is required, (e) test infrastructure is not good enough > to catch bugs or improve this feature. And a bunch more correctness issues. But basically, yes. When you say "(c) Some of the members of OldSnapshotControlData like head_timestamp and xid_by_minute are not maintained accurately)" - note that that's the core state for the whole feature. With regards to test: "not good enough" is somewhat of an understatement. Not a *single* tuple is removed in the tests due to old_snapshot_threshold - and removing tuples is the entire point. > Now, this sounds like a quite of work but OTOH, if we see most of the > critical changes required will be in only a few functions like > TransactionIdLimitedForOldSnapshots(), > MaintainOldSnapshotTimeMapping(), TestForOldSnapshot(). I don't think that's really the case. Every place reading a buffer needs to be inspected, and new calls added. They aren't free, and I'm not sure all of them have the relevant snapshot available. To fix the issue of spurious errors, we'd likely need changes outside of those, and it'd quite possibly have performance / bloat implications. > I don't deny the possibility that we might need much more work or we > need to come up with quite a different design to address all these > problems but unless Kevin or someone else doesn't come up with a > solution to address all of these problems, we can't be sure of that. > > > I don't think that the feature can be allowed to remain in anything > > like its current form. The current design is fundamentally unsound. > > > > Agreed, but OTOH, not giving time to Kevin or others who might be > interested to support this work is also not fair. I think once > somebody comes up with patches for problems we can decide whether this > feature can be salvaged in back-branches or we need to disable it in a > hard-way. Now, if Kevin himself is not interested in fixing or nobody > shows up to help, then surely we can take the decision sooner but > giving time for a couple of weeks (or even till we are near for PG13 > release) in this case doesn't seem like a bad idea. It'd certainly be great if somebody came up with fixes, yes. Even if we had to disable it in the back branches, that'd allow us to keep the feature around, at least. The likelihood of regressions even when the feature is not on does not seem that low. But you're right, we'll be able to better judge it with fixes to look at. Greetings, Andres Freund
pgsql-hackers by date: