Thread: wraparound dangers in snapshot too old

wraparound dangers in snapshot too old

From
Andres Freund
Date:
Hi,

I am trying to change the snapshot too old infrastructure so it
cooperates with my snapshot scalability patch. While trying to
understand the code sufficiently, I think I found a fairly serious
issue:

To map the time-based old_snapshot_threshold to an xid that can be used
as a cutoff for heap_page_prune(), we maintain a ringbuffer of
old_snapshot_threshold + 10 entries in
oldSnapshotControlData->xid_by_minute[]. TransactionIdLimitedForOldSnapshots()
uses that to (at least that's the goal) increase the horizon used for
pruning.

The problem is that there's no protection again the xids in the
ringbuffer getting old enough to wrap around. Given that practical uses
of old_snapshot_threshold are likely to be several hours to several
days, that's not particularly hard to hit.

That then has the consequence that we can use an xid that's either from
"from the future" (if bigger than the current xid), or more recent than
appropriate (if it wrapped far enough to be below nextxid, but not yet
older than OldestXmin) as the OldestXmin argument to heap_page_prune().

Which in turn means that we can end up pruning much more recently
removed rows than intended.


While that'd be bad on its own, the big problem is that we won't detect
that case on access, in contrast to the way old_snapshot_threshold is
intended to work. The reason for that is detecting these errors happens
on the basis of timestamps - which obviously do not wrap around.

Greetings,

Andres Freund



Re: wraparound dangers in snapshot too old

From
Andres Freund
Date:
Hi,

On 2020-03-31 21:53:04 -0700, Andres Freund wrote:
> I am trying to change the snapshot too old infrastructure so it
> cooperates with my snapshot scalability patch. While trying to
> understand the code sufficiently, I think I found a fairly serious
> issue:

I accidentally sent this email, I was intending to instead only send
https://www.postgresql.org/message-id/20200401064008.qob7bfnnbu4w5cw4%40alap3.anarazel.de

It started to happen because of some default keybinding changes between
mutt / emacs / magit, leading to emacs sending a buffer as an email,
which I didn't intend.

Sorry for that,

Andres Freund