Re: disfavoring unparameterized nested loops - Mailing list pgsql-hackers

From Mike Klaas
Subject Re: disfavoring unparameterized nested loops
Date
Msg-id CABOs6N2=cZo42rd5sXC5Ao1MuG4fCqL1Xw080y4E6=ZeHhonyg@mail.gmail.com
Whole thread Raw
In response to Re: disfavoring unparameterized nested loops  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
I think that it is worth paying more than nothing to avoid plans that are so far from optimal that they might as well take forever to execute

I recently came across this article from 2016 that expounded upon a bad plan of the sort discussed in this thread: https://heap.io/blog/when-to-avoid-jsonb-in-a-postgresql-schema

(The proximate cause in this case was Postgresql not collecting statistics for fields in a JSONB column, estimating rowcount of 1, and thus creating a pathological slowdown.)

–Mike


On Tue, Jun 22, 2021 at 7:37 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Jun 22, 2021 at 2:53 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Yeah, I like the insurance analogy - it gets to the crux of the problem, because insurance is pretty much exactly about managing risk.

The user's exposure to harm is what truly matters. I admit that that's very hard to quantify, but we should at least try to do so.

We sometimes think about a plan that is 10x slower as if it's infinitely slow, or might as well be. But it's usually not like that
-- it is generally meaningfully much better than the plan being 100x slower, which is itself sometimes appreciably better than 1000x slower. And besides, users often don't get anything like the optimal plan, even on what they would consider to be a good day (which is most days). So maybe 10x slower is actually the baseline good case already, without anybody knowing it. Most individual queries are not executed very often, even on the busiest databases. The extremes really do matter a lot.

If a web app or OLTP query is ~10x slower than optimal then it might be the practical equivalent of an outage that affects the query alone
(i.e. "infinitely slow") -- but probably not. I think that it is worth paying more than nothing to avoid plans that are so far from optimal that they might as well take forever to execute. This is not meaningfully different from a database outage affecting one particular query. It kind of is in a category of its own that surpasses "slow plan", albeit one that is very difficult or impossible to define formally.

There may be a huge amount of variation in risk tolerance among basically reasonable people. For example, if somebody chooses to engage in some kind of extreme sport, to me it seems understandable. It's just not my cup of tea. Whereas if somebody chooses to never wear a seatbelt while driving, then to me they're simply behaving foolishly. They're not willing to incur the tiniest inconvenience in order to get a huge reduction in potential harm -- including a very real risk of approximately the worst thing that can happen to you. Sure, refusing to wear a seatbelt can theoretically be classified as just another point on the risk tolerance spectrum, but that seems utterly contrived to me. Some things (maybe not that many) really are like that, or can at least be assumed to work that way as a practical matter.

But making
everything slower will be a hard sell, because wast majority of workloads already running on Postgres don't have this issue at all, so for them it's not worth the expense.

I think that we're accepting too much risk here. But I bet it's also true that we're not taking enough risk in other areas. That was really my point with the insurance analogy -- we can afford to take lots of individual risks as long as they don't increase our exposure to truly disastrous outcomes -- by which I mean queries that might as well take forever to execute as far as the user is concerned. (Easier said than done, of course.)

A simple trade-off between fast and robust doesn't seem like a universally helpful thing. Sometimes it's a very unhelpful way of looking at the situation. If you make something more robust to extreme bad outcomes, then you may have simultaneously made it *faster* (not slower) for all practical purposes. This can happen when the increase in robustness allows the user to tune the system aggressively, and only take on new risks that they can truly live with (which wouldn't have been possible without the increase in robustness). For example, I imagine that the failsafe mechanism added to VACUUM will actually make it possible to tune VACUUM much more aggressively -- it might actually end up significantly improving performance for all practical purposes, even though technically it has nothing to do with performance.

Having your indexes a little more bloated because the failsafe kicked-in is a survivable event -- the DBA lives to fight another day, and *learns* to tune vacuum/the app so it doesn't happen again and again. An anti-wraparound failure is perhaps not a survivable event -- the DBA gets fired. This really does seem like a fundamental difference to me.

Following the insurance analogy,
selling tornado insurance in Europe is mostly pointless.

Principled skepticism of this kind of thing is of course necessary and welcome. It *could* be taken too far.

And the lack of data also plays role - the insurance company will ask for higher rates when it does not have enough accurate data about the phenomenon, or when there's a lot of unknowns. Maybe this would allow some basic measure of uncertainty, based on the number and type of restrictions, joins, etc.

I don't think that you can really model uncertainty. But you can have true certainty (or close to it) about a trade-off that makes the system fundamentally more robust over time. You can largely be certain about both the cost of the insurance, as well as how it ameliorates the problem in at least some cases.

So maybe some fairly rough measure of uncertainty might work, and the user might specify how much risk it's willing to tolerate.

I think that most or all of the interesting stuff is where you have this extreme asymmetry -- places where it's much more likely to be true that basically everybody wants that. Kind of like wearing seatbelts -- things that we really can claim are a universal good without too much controversy. There might be as few as one or two things in the optimizer that this could be said of. But they matter.

--
Peter Geoghegan


pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Use generation context to speed up tuplesorts
Next
From: "Bossart, Nathan"
Date:
Subject: Re: archive status ".ready" files may be created too early