Re: snapshot too old issues, first around wraparound and then more. - Mailing list pgsql-hackers

From Andres Freund
Subject Re: snapshot too old issues, first around wraparound and then more.
Date
Msg-id 20200401150914.p5amnha5qfbthdrz@alap3.anarazel.de
Whole thread Raw
In response to Re: snapshot too old issues, first around wraparound and then more.  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: snapshot too old issues, first around wraparound and then more.
Re: snapshot too old issues, first around wraparound and then more.
List pgsql-hackers
Hi,

On 2020-04-01 10:01:07 -0400, Robert Haas wrote:
> On Wed, Apr 1, 2020 at 2:40 AM Andres Freund <andres@anarazel.de> wrote:
> > The problem is that there's no protection again the xids in the
> > ringbuffer getting old enough to wrap around. Given that practical uses
> > of old_snapshot_threshold are likely to be several hours to several
> > days, that's not particularly hard to hit.
> 
> Presumably this could be fixed by changing it to use FullTransactionId.

That doesn't exist in all the back branches. Think it'd be easier to add
code to explicitly prune it during MaintainOldSnapshotTimeMapping().


> > The problem, as far as I can tell, is that
> > oldSnapshotControl->head_timestamp appears to be intended to be the
> > oldest value in the ring. But we update it unconditionally in the "need
> > a new bucket, but it might not be the very next one" branch of
> > MaintainOldSnapshotTimeMapping().
> 
> I agree, that doesn't look right. It's correct, I think, for the "if
> (advance >= OLD_SNAPSHOT_TIME_MAP_ENTRIES)" case, but not in the
> "else" case. In the "else" case, it should advance by 1 (wrapping if
> needed) each time we take the "if (oldSnapshotControl->count_used ==
> OLD_SNAPSHOT_TIME_MAP_ENTRIES)" branch, and should remain unchanged in
> the "else" branch for that if statement.

Yea.


> > As far as I can tell, this code has been wrong since the feature has
> > been committed.  The tests don't show a problem, because none of this
> > code is reached when old_snapshot_threshold = 0 (which has no real world
> > use, it's purely for testing).
> 
> I'm pretty sure I complained about the fact that only the
> old_snapshot_threshold = 0 case was tested at the time this went in,
> but I don't think Kevin was too convinced that we needed to do
> anything else, and realistically, if he'd tried for a regression test
> that ran for 15 minutes, Tom would've gone ballistic.

I think it's not just Tom that'd have gone ballistic. I think it's the
reason why, as I think is pretty clear, the feature was *never* actually
tested. The results of whats being removed are not quite random, but
it's not far from it. And there's long stretches of time where it never
removes things.

It's also a completely self-made problem.

There's really no reason at all to have bins of one minute. As it's a
PGC_POSTMASTER GUC, it should just have didided time into bins of
(old_snapshot_threshold * USEC_PER_SEC) / 100 or such. For a threshold
of a week there's no need to keep 10k bins, and the minimum threshold of
1 minute obviously is problematic.


> > I really don't know what to do here. The feature never worked and will
> > silently cause wrong query results. Fixing it seems like a pretty large
> > task - there's a lot more bugs. But ripping out a feature in stable
> > branches is pretty bad too.
> 
> I don't know what other bugs there are, but the two you mention above
> look fixable.

They probably are fixable. But there's a lot more, I think:

Looking at TransactionIdLimitedForOldSnapshots() I think the ts ==
update_ts threshold actually needs to be ts >= update_ts, right now we
don't handle being newer than the newest bin correctly afaict (mitigated
by autovacuum=on with naptime=1s doing a snapshot more often). It's hard
to say, because there's no comments.

The whole lock nesting is very hazardous. Most (all?)
TestForOldSnapshot() calls happen with locks on on buffers held, and can
acquire lwlocks itself. In some older branches we do entire *catalog
searches* with the buffer lwlock held (for RelationHasUnloggedIndex()).

GetSnapshotData() using snapshot->lsn = GetXLogInsertRecPtr(); as the
basis to detect conflicts seems dangerous to me. Isn't that ignoring
inserts that are already in progress?


> Even if we decide that the feature can't be salvaged, I would vote
> against ripping it out in back branches. I would instead argue for
> telling people not to use it and ripping it out in master.

It currently silently causes wrong query results. There's no
infrastructure to detect / protect against that (*).

I'm sure we can fix individual instances of problems. But I don't know
how one is supposed to verify that the fixes actually work. There's
currently no tests for the actual feature. And manual tests are painful
due to the multi-minute thresholds needed, and it's really hard to
manually verify that only the right rows are removed due to the feature,
and that all necessary errors are thrown. Given e.g. the bugs in my
email upthread, there's periods of several minutes where we'd not see
any row removed and then periods where the wrong ones would be removed,
so the manual tests would have to be repeated numerous times to actually
ensure anything.

If somebody wants to step up to the plate and fix these, it'd perhaps be
more realistic to say that we'll keep the feature. But even if somebody
does, I think it'd require a lot of development in the back branches. On
a feature whose purpose is to eat data that is still required.

I think even if we decide that we do not want to rip the feature out, we
should seriously consider hard disabling it in the backbranches. At
least I don't see how the fixed code is tested enough to be entrusted
with users data.

Do we actually have any evidence of this feature ever beeing used? I
didn't find much evidence for that in the archives (except Thomas
finding a problem). Given that it currently will switch between not
preventing bloat and causing wrong query results, without that being
noticed...

(*) perhaps I just am not understanding the protection however. To me
it's not at all clear what:

        /*
         * Failsafe protection against vacuuming work of active transaction.
         *
         * This is not an assertion because we avoid the spinlock for
         * performance, leaving open the possibility that xlimit could advance
         * and be more current; but it seems prudent to apply this limit.  It
         * might make pruning a tiny bit less aggressive than it could be, but
         * protects against data loss bugs.
         */
        if (TransactionIdIsNormal(latest_xmin)
            && TransactionIdPrecedes(latest_xmin, xlimit))
            xlimit = latest_xmin;

        if (NormalTransactionIdFollows(xlimit, recentXmin))
            return xlimit;

actually provides in the way of a protection.


> However,
> much as I'm not in love with all of the complexity this feature adds,
> I don't see the problems you've reported here as serious enough to
> justify ripping it out.
> 
> What exactly is the interaction of this patch with your snapshot
> scalability work?

Post my work there's no precise RecentOldestXmin anymore (since
accessing the frequently changing xmin of other backends is what causes
a good chunk of the scalability issues). But heap_page_prune_opt() has
to determine what to use as the threshold for being able to prune dead
rows. Without snapshot_too_old we can initially rely on the known
boundaries to determine whether we can prune, and only determine an
"accurate" boundary when encountering a prune xid (or a tuple, but
that's an optimization) that falls in the range where we don't know for
certain we can prune.  But that's not easy to do with the way the
old_snapshot_threshold stuff currently works.

It's not too hard to implement a crude version that just determines an
accurate xmin horizon whenever pruning with old_snapshot_threshold
set. But that seems like gimping performance for old_snapshot_threshold,
which didn't seem nice.

Additionally, the current implementation of snapshot_too_old is pretty
terrible about causing unnecessary conflicts when hot pruning. Even if
there was no need at all for the horizon to be limited to be able to
prune the page, or if there was nothing to prune on the page (note that
the limiting happens before checking if the space on the page even makes
pruning useful), we still cause a conflict for future accesses, because
TransactionIdLimitedForOldSnapshots() will
SetOldSnapshotThresholdTimestamp() to a recent timestamp.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: control max length of parameter values logged
Next
From: Robert Haas
Date:
Subject: Re: snapshot too old issues, first around wraparound and then more.