Re: snapshot too old issues, first around wraparound and then more. - Mailing list pgsql-hackers

From Andres Freund
Subject Re: snapshot too old issues, first around wraparound and then more.
Date
Msg-id 20200401153947.lchk4yf3g63rw7wm@alap3.anarazel.de
Whole thread Raw
In response to Re: snapshot too old issues, first around wraparound and then more.  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2020-04-01 11:15:14 -0400, Robert Haas wrote:
> On Wed, Apr 1, 2020 at 2:40 AM Andres Freund <andres@anarazel.de> wrote:
> > I added some debug output to print the mapping before/after changes by
> > MaintainOldSnapshotTimeMapping() (note that I used timestamps relative
> > to the server start in minutes/seconds to make it easier to interpret).
> >
> > And the output turns out to be something like:
> >
> > WARNING:  old snapshot mapping at "before update" with head ts: 7, current entries: 8 max entries: 15, offset: 0
> >   entry 0 (ring 0): min 7: xid 582921233
> >   entry 1 (ring 1): min 8: xid 654154155
> >   entry 2 (ring 2): min 9: xid 661972949
> >   entry 3 (ring 3): min 10: xid 666899382
> >   entry 4 (ring 4): min 11: xid 644169619
> >   entry 5 (ring 5): min 12: xid 644169619
> >   entry 6 (ring 6): min 13: xid 644169619
> >   entry 7 (ring 7): min 14: xid 644169619
> >
> > WARNING:  head 420 s: updating existing bucket 4 for sec 660 with xmin 666899382
> >
> > WARNING:  old snapshot mapping at "after update" with head ts: 7, current entries: 8 max entries: 15, offset: 0
> >   entry 0 (ring 0): min 7: xid 582921233
> >   entry 1 (ring 1): min 8: xid 654154155
> >   entry 2 (ring 2): min 9: xid 661972949
> >   entry 3 (ring 3): min 10: xid 666899382
> >   entry 4 (ring 4): min 11: xid 666899382
> >   entry 5 (ring 5): min 12: xid 644169619
> >   entry 6 (ring 6): min 13: xid 644169619
> >   entry 7 (ring 7): min 14: xid 644169619
> >
> > It's pretty obvious that the xids don't make a ton of sense, I think:
> > They're not monotonically ordered. The same values exist multiple times,
> > despite xids being constantly used. Also, despite the ringbuffer
> > supposedly having 15 entries (that's snapshot_too_old = 5min + the 10 we
> > always add), and the workload having run for 14min, we only have 8
> > entries.
> 
> The function header comment for MaintainOldSnapshotTimeMapping could
> hardly be more vague, as it's little more than a restatement of the
> function name. However, it looks to me like the idea is that this
> function might get called multiple times for the same or similar
> values of whenTaken. I suppose that's the point of this code:

Right. We enforce whenTaken to be monotonic
(cf. GetSnapshotCurrentTimestamp()), but since
GetSnapshotCurrentTimestamp() reduces the granularity of the timestamp
to one-minute (the AlignTimestampToMinuteBoundary() call), it's
obviously possible to end up in the same bin as a previous


> What I interpret this to be doing is saying - if we got a new call to
> this function with a rounded-to-the-minute timestamp that we've seen
> previously and for which we still have an entry, and if the XID passed
> to this function is newer than the one passed by the previous call,
> then advance the xid_by_minute[] bucket to the newer value. Now that
> begs the question - what does this XID actually represent? The
> comments don't seem to answer that question, not even the comments for
> OldSnapshotControlData, which say that we should "Keep one xid per
> minute for old snapshot error handling." but don't say which XIDs we
> should keep or how they'll be used. However, the only call to
> MaintainOldSnapshotTimeMapping() is in GetSnapshotData(). It appears
> that we call this function each time a new snapshot is taken and pass
> the current time (modulo some fiddling) and snapshot xmin. Given that,
> one would expect that any updates to the map would be tight races,
> i.e. a bunch of processes that all took their snapshots right around
> the same time would all update the same map entry in quick succession,
> with the newest value winning.

Right.


> And that make the debugging output which I quoted from your message
> above really confusing. At this point, the "head timestamp" is 7
> minutes after this facility started up. The first we entry we have is
> for minute 7, and the last is for minute 14. But the one we're
> updating is for minute 11. How the heck can that happen?

If I undestand what your reference correctly, I think that is because,
due to the bug, the "need a new bucket" branch doesn't just extend by
one bucket, it extends it by many in common cases. Basically filling
buckets "into the future".

the advance = ... variable in the branch will not always be 1, even when
we continually call Maintain*.  Here's some debug output showing that
(slightly modified from the patch I previously sent):

WARNING:  old snapshot mapping at "before update" with head ts: 1, current entries: 2 max entries: 15, offset: 0
  entry 0 (ring 0): min 1: xid 1089371384
  entry 1 (ring 1): min 2: xid 1099553206

WARNING:  head 1 min: filling 2 buckets starting at 0 for whenTaken 3 min, with xmin 1109840204

WARNING:  old snapshot mapping at "after update" with head ts: 3, current entries: 4 max entries: 15, offset: 0
  entry 0 (ring 0): min 3: xid 1089371384
  entry 1 (ring 1): min 4: xid 1099553206
  entry 2 (ring 2): min 5: xid 1109840204
  entry 3 (ring 3): min 6: xid 1109840204

Note how the two new buckets have the same xid, and how we're inserting
for "whenTaken 3 min", but we've filled the mapping up to minute 6.


I don't think the calculation of the 'advance' variable is correct as
is, even if we ignore the wrong setting of the head_timestamp variable.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: Removing unneeded self joins
Next
From: David Steele
Date:
Subject: Re: BufFileRead() error signalling