Thread: apply outer->inner join optimisation to OR clauses

apply outer->inner join optimisation to OR clauses

From
Bradley Baetz
Date:
The attached patch applies the optimisation translating outer joins to
inner joins (where safe) to the cases where the WHERE clause has OR bits
in it too, if the column is present (and not null) in all of the OR
bits.

This allows, for example:

bbaetz=# explain analyze select bugs.bug_id from bugs left join
longdescs using (bug_id) where who IN (1,2);
                                                            QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=149629.84..172021.09 rows=100000 width=12)
(actual time=20102.71..23694.30 rows=105 loops=1)
   Merge Cond: ("outer".bug_id = "inner".bug_id)
   Filter: (("inner".who = 1) OR ("inner".who = 2))
   ->  Index Scan using bugs_pkey on bugs  (cost=0.00..2142.00
rows=100000 width=4) (actual time=6.17..310.94 rows=100000 loops=1)
   ->  Sort  (cost=149629.84..152129.84 rows=1000000 width=8) (actual
time=19969.66..21317.62 rows=1000000 loops=1)
         Sort Key: longdescs.bug_id
         ->  Seq Scan on longdescs  (cost=0.00..14902.00 rows=1000000
width=8) (actual time=0.03..4225.04 rows=1000000 loops=1)
 Total runtime: 23739.90 msec
(8 rows)

to become:

bbaetz=# explain analyze select bugs.bug_id from bugs left join
longdescs using (bug_id) where who IN (1,2);

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..691.64 rows=99 width=8) (actual
time=42.54..1289.34 rows=105 loops=1)
   ->  Index Scan using longdescs_who_idx, longdescs_who_idx on
longdescs  (cost=0.00..395.09 rows=98 width=4) (actual time=7.31..547.09
rows=105 loops=1)
         Index Cond: ((who = 1) OR (who = 2))
   ->  Index Scan using bugs_pkey on bugs  (cost=0.00..3.01 rows=1
width=4) (actual time=7.06..7.06 rows=1 loops=105)
         Index Cond: (bugs.bug_id = "outer".bug_id)
 Total runtime: 1289.60 msec
(6 rows)

I wanted to add a regression test, but it doesn't look like theres
infrastructure to test that an optimisation is being applied.

Thanks,

Bradley

Attachment

Re: apply outer->inner join optimisation to OR clauses

From
Tom Lane
Date:
Bradley Baetz <bbaetz@acm.org> writes:
> The attached patch applies the optimisation translating outer joins to
> inner joins (where safe) to the cases where the WHERE clause has OR bits
> in it too, if the column is present (and not null) in all of the OR
> bits.

Your change for AND is obviously incorrect, and I don't think I believe
the OR case either.  Why is it safe to pass down a TRUE top_level flag?

            regards, tom lane


Re: apply outer->inner join optimisation to OR clauses

From
Bradley Baetz
Date:
On Sat, May 03, 2003 at 01:22:11PM -0400, Tom Lane wrote:
> Bradley Baetz <bbaetz@acm.org> writes:
> > The attached patch applies the optimisation translating outer joins to
> > inner joins (where safe) to the cases where the WHERE clause has OR bits
> > in it too, if the column is present (and not null) in all of the OR
> > bits.
>
> Your change for AND is obviously incorrect, and I don't think I believe
> the OR case either.  Why is it safe to pass down a TRUE top_level flag?

Its safe because

WHERE (foo=1) OR (foo=2) OR (foo IS NOT NULL)

still means that foo is non_nullable. Without my patch, this is done for
AND anyway, via the List node. I don't see the difference in this
context between

(List (Op = foo 1) (Op = foo 2) (NullTest isNotNull foo))

and

(BoolExpr And (Op = foo 1) (Op = foo 2) (NullTest isNotNull foo))

since the second is simplified into the first earlier on.

As I understand it, the top_level flag is intended to prevent against
|NOT (foo IS NOT NULL)|, which does not make foo non_nullable, and |NOT
(foo IS True/False/etc)|, which also doesn't prove that foo isn't
nullable.

For the opExpr, we want to stop ((foo IS NOT NULL)=False) from making
foo nonnullable. For the BooleanExpr's OR and AND I don't see a
combination which can take place that way - none of the operands are
combined with the other so that they can 'interfere' with each other
like that.

I don't think that the fact that |SELECT foo>2 OR NULL| gives NULL, not
FALSE, if foo <=2 is an issue either, since they're both not a true
value.

>
>             regards, tom lane

Bradley


Re: apply outer->inner join optimisation to OR clauses

From
Tom Lane
Date:
Bradley Baetz <bbaetz@acm.org> writes:
> I don't think that the fact that |SELECT foo>2 OR NULL| gives NULL, not
> FALSE, if foo <=2 is an issue either, since they're both not a true
> value.

But that is exactly the distinction that we have to worry about when not
at top level.  The error in the AND part of the proposed patch is
exhibited by
        WHERE NOT ((a.a1 > 1) AND (b.b1 > 1))
When a.a1 is NULL, the AND can't yield TRUE --- but it can yield FALSE,
which will become TRUE at the top level.  So neither a nor b can be
considered non-nullable in this expression.

You do have an insight here, which is that if the same rel can be shown
to null *all* the arms of an OR, it nulls the OR.  The same holds true
of an AND, I think.

You might be right that passing top_level through the AND or OR would be
safe in the present state of the code, but it strikes me as fragile and
confusing to leave it that way, because it no longer means what its name
implies and the comments say.  ISTM what we need to do if we want to
extend this code is to come up with a stronger definition of the context
in which each recursive level gets invoked.  I think it might work to
define a three-way context value instead of a boolean:

SAFE_IF_NULL        a rel is nonnullable if nulling it forces this
            expression to return NULL
SAFE_IF_NULL_OR_FALSE    a rel is nonnullable if nulling it forces this
            expression to return NULL or FALSE
SAFE_IF_NULL_OR_TRUE    a rel is nonnullable if nulling it forces this
            expression to return NULL or TRUE

The present notion of top-level would be replaced by context
SAFE_IF_NULL_OR_FALSE.  Descent through a strict operator or function
always narrows the context to SAFE_IF_NULL (since we don't know what
the op/func might do with other values, and we can't even be sure we're
talking about booleans anymore anyway).  Descent through NOT exchanges
SAFE_IF_NULL_OR_FALSE and SAFE_IF_NULL_OR_TRUE.  You could make
appropriate extensions of the code for NullTest and BooleanTest, and
I think this would allow sane handling of the AND and OR cases: for
example, AND with SAFE_IF_NULL_OR_FALSE context really can union the
inputs' nonnullable sets (since forcing any one input to null or false
guarantees the AND result is null or false) but AND in the other
contexts has to intersect the inputs' sets.  I think also that AND and
OR might be able to pass down more liberal contexts than they get,
but am not sure about that part.

(BTW, I don't quite like the adjective SAFE here; can you think of a
better term?)

The SAFE_IF_NULL_OR_TRUE context might prove not to be worth the trouble
of implementing --- I'm not sure if either AND or OR can really exploit
it any better than they can exploit SAFE_IS_NULL.  If not, we'd be back
to just a boolean context value, but we'd have a clearer understanding
of what it really means.

I don't have time to slog through the details, but if you want to run
with that, go for it ...

            regards, tom lane


Re: apply outer->inner join optimisation to OR clauses

From
Bradley Baetz
Date:
On Sat, May 03, 2003 at 08:53:14PM -0400, Tom Lane wrote:
> Bradley Baetz <bbaetz@acm.org> writes:
> > I don't think that the fact that |SELECT foo>2 OR NULL| gives NULL, not
> > FALSE, if foo <=2 is an issue either, since they're both not a true
> > value.
>
> But that is exactly the distinction that we have to worry about when not
> at top level.  The error in the AND part of the proposed patch is
> exhibited by
>         WHERE NOT ((a.a1 > 1) AND (b.b1 > 1))
> When a.a1 is NULL, the AND can't yield TRUE --- but it can yield FALSE,
> which will become TRUE at the top level.  So neither a nor b can be
> considered non-nullable in this expression.

I'm about to run out for a bit, so I'll reply to the rest later.

I was considering WHERE NOT (a.a1>1), which was already acceptable, but
you're right in that your example can have either of them being null.

Its really just NOT which is the problem. I wonder if its safe to do
this for stuff inside a NULL iff there is only one nonnullable_rel from
the expression? I'll think about that a bit.

I'll read through your comments in more detail in a few hours.

> (BTW, I don't quite like the adjective SAFE here; can you think of a
> better term?)

NULL_STRICT, perhaps?

>
> The SAFE_IF_NULL_OR_TRUE context might prove not to be worth the trouble
> of implementing --- I'm not sure if either AND or OR can really exploit
> it any better than they can exploit SAFE_IS_NULL.  If not, we'd be back
> to just a boolean context value, but we'd have a clearer understanding
> of what it really means.

The main use (which got me to look into this) is IN.

Bradley


Re: apply outer->inner join optimisation to OR clauses

From
Bradley Baetz
Date:
On Sat, May 03, 2003 at 08:53:14PM -0400, Tom Lane wrote:
> Bradley Baetz <bbaetz@acm.org> writes:
> > I don't think that the fact that |SELECT foo>2 OR NULL| gives NULL, not
> > FALSE, if foo <=2 is an issue either, since they're both not a true
> > value.
>
> But that is exactly the distinction that we have to worry about when not
> at top level.  The error in the AND part of the proposed patch is
> exhibited by
>         WHERE NOT ((a.a1 > 1) AND (b.b1 > 1))
> When a.a1 is NULL, the AND can't yield TRUE --- but it can yield FALSE,
> which will become TRUE at the top level.  So neither a nor b can be
> considered non-nullable in this expression.

I tried testing this, but the problem is that this is rewritten as

WHERE (a.a1<=1 OR b.b1<=1)

so it doesn't trigger. (I was sure I'd tested that before submiting the
patch) Is there a way to turn that transformation off for testing
purposes? Or are we guaranteed that there wont' be OR/AND expressions
inside a NOT, because they will have always been expanded?

>
> You do have an insight here, which is that if the same rel can be shown
> to null *all* the arms of an OR, it nulls the OR.  The same holds true
> of an AND, I think.

Yes, but with an AND, you can use any of them (modulo the NOT issue, of
course)

> SAFE_IF_NULL        a rel is nonnullable if nulling it forces this
>             expression to return NULL
> SAFE_IF_NULL_OR_FALSE    a rel is nonnullable if nulling it forces this
>             expression to return NULL or FALSE
> SAFE_IF_NULL_OR_TRUE    a rel is nonnullable if nulling it forces this
>             expression to return NULL or TRUE

I'll have to think abotu this a bit more.

Do we agree that if the thing inside the NOT is a single expression,
then what I've doing is safe? IS it safe even if theres and/or involved
as the argument to a (strict) operator?

>             regards, tom lane

Bradley


Re: apply outer->inner join optimisation to OR clauses

From
Tom Lane
Date:
Bradley Baetz <bbaetz@acm.org> writes:
> Or are we guaranteed that there wont' be OR/AND expressions
> inside a NOT, because they will have always been expanded?

No, we're not, because the CNF-ifying code is optional.  If you study
the heuristics in prepqual.c you could doubtless devise a query that has
such an expression somewhere in it.

Even if that was a safe assumption today, writing code that would fail
in as subtle a way as this when presented with such a tree won't do.
The optimizer gets revamped constantly, and so I don't want to see any
assumptions that fragile embedded into pieces of it.

> Do we agree that if the thing inside the NOT is a single expression,
> then what I've doing is safe? IS it safe even if theres and/or involved
> as the argument to a (strict) operator?

I'm going to take the position "fix it right or don't touch it at all".
The routine as it stands does what it was intended to do.  If you want
to invest the effort to take it to the next level of intelligence, great
--- but let's not put in a half-baked attempt.

            regards, tom lane