Re: disfavoring unparameterized nested loops - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: disfavoring unparameterized nested loops
Date
Msg-id CAH2-WzmDEFUXLFWLn1FuxCA28=HU7ZVLVrqCZiHxOas5Xyur_g@mail.gmail.com
Whole thread Raw
In response to Re: disfavoring unparameterized nested loops  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Jun 15, 2021 at 12:31 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Yes, I think it is. Reading the paper really helped me crystallize my
> thoughts about this, because when I've studied the problems myself, I
> came, as you postulate here, to the conclusion that there's a lot of
> stuff the planner does where there is risk and uncertainty, and thus
> that a general framework would be necessary to deal with it.

It is an example (perhaps the only example in the optimizer) of an
oasis of certainty in an ocean of uncertainty. As uncertain as
everything is, we seemingly can make strong robust statements about
the relative merits of each strategy *in general*, just in this
particular instance. It's just not reasonable to make such a reckless
choice, no matter what your general risk tolerance is.

Goetz Graefe is interviewed here, and goes into his philosophy on
robustness -- it seems really interesting to me:

https://sigmodrecord.org/publications/sigmodRecord/2009/pdfs/05_Profiles_Graefe.pdf

> In defense of that approach, note that this is a
> case where we know both that the Nested Loop is risky and that Hash
> Join is a similar alternative with probably similar cost. I am not
> sure there are any other cases where we can say quite so generally
> both that a certain thing is risky and what we could do instead.

I tend to think of a hash join as like a nested loop join with an
inner index scan where you build the index yourself, dynamically. That
might be why I find it easy to make this mental leap. In theory you
could do this by giving the nestloop join runtime smarts -- make it
turn into a hash join adaptively. Like Graefe's G-Join design. That
way you could do this in a theoretically pure way.

I don't think that that's actually necessary just to deal with this
case -- it probably really is as simple as it seems. I point this out
because perhaps it's useful to have that theoretical anchoring.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: snapshot too old issues, first around wraparound and then more.
Next
From: Tom Lane
Date:
Subject: Re: Improving the isolationtester: fewer failures, less delay