Re: disfavoring unparameterized nested loops - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: disfavoring unparameterized nested loops
Date
Msg-id CAH2-WznOY=SkO4L=DZAZbEC3Jo4fbG_DCY7QOSt7mF=N+eLSjA@mail.gmail.com
Whole thread Raw
In response to Re: disfavoring unparameterized nested loops  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sun, Oct 2, 2022 at 3:43 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Sep 30, 2022 at 2:24 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > It's not just that the risks are ludicrously high, of course. The
> > potential benefits must *also* be very low. It's both factors,
> > together.
>
> Hmm, maybe. But it also wouldn't surprise me very much if someone can
> come up with a test case where a nested loop with a single row (or
> maybe no rows) on one side or the other and it's significantly faster
> than any alternative plan.

That's certainly possible, but wouldn't the difference all come from
fixed startup costs? If we're talking about a single row, with a
minimal test case, then the potential downside of this more
conservative strategy might indeed amount to something like a 2x or 3x
slowdown, if we look at it in isolation. But why measure it that way?
I think that absolute differences like milliseconds of execution time
are much more relevant.

Real production databases have many queries with very diverse
characteristics -- there is a lot going on at any given moment. The
proportion of queries that will be affected either way by avoiding
unparamaterized nested loop joins is probably going to be very small.
Nobody is going to notice if only a small subset or all queries are
maybe 1 ms or 2 ms slower. As long as it's well within the margin of
noise in 100% of all cases, it really shouldn't matter.

AFAICT the choice is one of "bounded, low upside versus unbounded,
high downside".

> I believe, though, that even if such cases
> exist, they are probably relatively few in number compared to the
> cases where parameterized nested loops hurt, and the savings are
> probably small compared to the multiple-orders-of-magnitude slowdowns
> that you can get when a nested loop goes bad. But they might still be
> relatively large -- 2x, 3x? -- in absolute terms.

I suspect it won't even matter if disallowing unparamaterized nested
loop joins loses on average.

I am reminded of this:
https://en.wikipedia.org/wiki/St._Petersburg_paradox#Expected_utility_theory

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: [RFC] building postgres with meson - v13
Next
From: Tom Lane
Date:
Subject: Re: Question: test "aggregates" failed in 32-bit machine