Re: snapshot too old issues, first around wraparound and then more. - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: snapshot too old issues, first around wraparound and then more.
Date
Msg-id CAH2-Wz=dOHuefXH_weCKjO6sq9TcZoaPJsTuKLyW7m_R5EsThg@mail.gmail.com
Whole thread Raw
In response to Re: snapshot too old issues, first around wraparound and then more.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Jun 15, 2021 at 12:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > My general point here is that I would like to know whether we have a
> > finite number of reasonably localized bugs or a three-ring disaster
> > that is unrecoverable no matter what we do. Andres seems to think it
> > is the latter, and I *think* Peter Geoghegan agrees, but I think that
> > the point might be worth a little more discussion.
>
> TBH, I am not clear on that either.

I don't know for sure which it is, but that in itself isn't actually
what matters to me. The most concerning thing is that I don't really
know how to *assess* the design now. The clear presence of at least
several very severe bugs doesn't necessarily prove anything (it just
*hints* at major design problems).

If I could make a very clear definitive statement on this then I'd
probably have to do ~1/3 of the total required work -- that'd be my
guess. If it was easy to be quite sure here then we wouldn't still be
here 12 months later. In any case I don't think that the feature
deserves to be treated all that differently to something that was
committed much more recently, given what we know. Frankly it took me
about 5 minutes to find a very serious bug in the feature, pretty much
without giving it any thought. That is not a good sign.

> I think it's a klugy, unprincipled solution to a valid real-world
> problem.  I suspect the implementation issues are not unrelated to
> the kluginess of the concept.  Thus, I would really like to see us
> throw this away and find something better.  I admit I have nothing
> to offer about what a better solution to the problem would look like.
> But I would really like it to not involve random-seeming query failures.

I would be very happy to see somebody take this up, because it is
important. The reality is that anybody that undertakes this task
should start with the assumption that they're starting from scratch,
at least until they learn otherwise. So ISTM that it might as well be
true that it needs a total rewrite, even if it turns out to not be
strictly true.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Improving the isolationtester: fewer failures, less delay
Next
From: Peter Geoghegan
Date:
Subject: Re: snapshot too old issues, first around wraparound and then more.