Thread: [Patch] optimizer - simplify $VAR1 IS NULL AND $VAR1 IS NOT NULL

[Patch] optimizer - simplify $VAR1 IS NULL AND $VAR1 IS NOT NULL

From
Pierre Ducroquet
Date:
Hello

In several queries relying on views, I noticed that the optimizer miss a quite
simple to implement optimization. My views contain several branches, with
different paths that are simplified by the caller of the view. This
simplification is based on columns to be null or not.

Today, even with a single table, the following (silly) query is not optimized
away:
    SELECT * FROM test WHERE a IS NULL AND a IS NOT NULL;

In more complex cases, it of course isn't any better:
    SELECT * FROM (
        SELECT a, NULL::integer AS b FROM foo
      UNION ALL
        SELECT a, b FROM bar WHERE b IS NOT NULL
    ) WHERE a = 1 AND b IS NULL;

The attached patch handles both situations. When flattening and simplifying
the AND clauses, a list of the NullChecks is built, and subsequent NullChecks
are compared to the list. If opposite NullChecks on the same variable are
found, the whole AND is optimized away.
This lead to nice boosts, since instead of having 'never executed' branches,
the optimizer can go even further. Right now, the algorithmic complexity of
this optimization is not great: it is in O(n²), with n being the number of
NullCheck in a given AND clause. But compared to the possible benefits, and
the very low risk of n being high enough to have a real planification-time
impact, I feel this optimization would be worth it.


Regards

 Pierre

Attachment

Re: [Patch] optimizer - simplify $VAR1 IS NULL AND $VAR1 IS NOT NULL

From
Andrew Gierth
Date:
>>>>> "Pierre" == Pierre Ducroquet <p.psql@pinaraf.info> writes:

 Pierre> Hello

 Pierre> In several queries relying on views, I noticed that the
 Pierre> optimizer miss a quite simple to implement optimization. My
 Pierre> views contain several branches, with different paths that are
 Pierre> simplified by the caller of the view. This simplification is
 Pierre> based on columns to be null or not.

 Pierre> Today, even with a single table, the following (silly) query is
 Pierre> not optimized away:

 Pierre>     SELECT * FROM test WHERE a IS NULL AND a IS NOT NULL;

Actually it can be, but only if you set constraint_exclusion=on (rather
than the default, 'partition').

postgres=# explain select * from foo where id is null and id is not null;
                     QUERY PLAN                      
-----------------------------------------------------
 Seq Scan on foo  (cost=0.00..35.50 rows=13 width=4)
   Filter: ((id IS NULL) AND (id IS NOT NULL))
(2 rows)

postgres=# set constraint_exclusion=on;
SET

postgres=# explain select * from foo where id is null and id is not null;
                QUERY PLAN                
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

In fact when constraint_exclusion=on, the planner should detect any case
where some condition in the query refutes another condition. There is
some downside, though, which is why it's not enabled by default:
planning may take longer.

 Pierre> The attached patch handles both situations. When flattening and
 Pierre> simplifying the AND clauses, a list of the NullChecks is built,
 Pierre> and subsequent NullChecks are compared to the list. If opposite
 Pierre> NullChecks on the same variable are found, the whole AND is
 Pierre> optimized away.

That's all very well but it's very specific to a single use-case. The
existing code, when you enable it, can detect a whole range of possible
refutations (e.g. foo > 1 AND foo < 1).

-- 
Andrew (irc:RhodiumToad)



Re: [Patch] optimizer - simplify $VAR1 IS NULL AND $VAR1 IS NOT NULL

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Pierre" == Pierre Ducroquet <p.psql@pinaraf.info> writes:
>  Pierre> The attached patch handles both situations. When flattening and
>  Pierre> simplifying the AND clauses, a list of the NullChecks is built,
>  Pierre> and subsequent NullChecks are compared to the list. If opposite
>  Pierre> NullChecks on the same variable are found, the whole AND is
>  Pierre> optimized away.

> That's all very well but it's very specific to a single use-case. The
> existing code, when you enable it, can detect a whole range of possible
> refutations (e.g. foo > 1 AND foo < 1).

Yeah.  Just for the record, if we were interested in taking a patch
for this purpose, simplify_and_arguments is a poor choice of where
to do it anyway.  That would only find contradictions between clauses
that were in the same expression at eval_const_expressions time, which
is pretty early and will miss a lot of logically-equivalent situations
(e.g. if one clause is in a JOIN...ON and the other is in WHERE).
The constraint exclusion code looks for contradictions between clauses
that have been pushed down to the same relation during jointree
deconstruction, ie they have the same set of referenced relations.
That would be a much better place for this type of logic.

            regards, tom lane