Thread: "WHERE 1 = 2 OR ..." makes planner choose a very inefficient plan

"WHERE 1 = 2 OR ..." makes planner choose a very inefficient plan

From
dmitry potapov
Date:
Hello,

I recently stumbled upon on what could be a planner bug or a corner case. If "<false condition> OR ..." is added to WHERE clause of SELECT query, then the planner chooses a very inefficient plan. Consider a query:

SELECT count(k0.id)
FROM k0
WHERE 1 = 2
    OR k0.id IN (
        SELECT k1.k0_id
        FROM k1
        WHERE k1.k1k2_id IN (
                SELECT k2.k1k2_id
                FROM k2
                WHERE k2.t = 2
                    AND (coalesce(k2.z, '')) LIKE '%12%'
                )
        );

EXPLAIN (ANALYZE, BUFFERS) for this query:
http://explain.depesz.com/s/tcn
Execution time: 2037872.420 ms (almost 34 minutes!!)

If I comment out "1=2 OR", then the plan changes dramatically:
http://explain.depesz.com/s/5rsW
Execution time: 617.778 ms


I know LEFT JOIN or EXISTS instead of NOT IN in this case will give better plans. What bothers me is not performance of this particular query, but the strange behavior of query planner. Is this behavior considered normal, or should I file a bug?

database schema: http://pgsql.privatepaste.com/b297e685c5
version: PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
package: postgresql91-server.x86_64 9.1.9-1PGDG.rhel6
os: Scientific Linux 6.3
postgresql
.conf: http://pgsql.privatepaste.com/e3e75bb789

--



Re: "WHERE 1 = 2 OR ..." makes planner choose a very inefficient plan

From
Richard Huxton
Date:
On 18/04/13 15:20, dmitry potapov wrote:
> Hello,
>
> I recently stumbled upon on what could be a planner bug or a corner
> case. If "<false condition> OR ..." is added to WHERE clause of SELECT
> query, then the planner chooses a very inefficient plan. Consider a query:

> If I comment out "1=2 OR", then the plan changes dramatically:

What happens if you substitute:
1. 1=3   OR
2. false OR

--
   Richard Huxton
   Archonet Ltd


Re: "WHERE 1 = 2 OR ..." makes planner choose a very inefficient plan

From
Tom Lane
Date:
dmitry potapov <potapov.dmitry@gmail.com> writes:
> I recently stumbled upon on what could be a planner bug or a corner case.
> If "<false condition> OR ..." is added to WHERE clause of SELECT query,
> then the planner chooses a very inefficient plan. Consider a query:

> SELECT count(k0.id)
> FROM k0
> WHERE 1 = 2
>     OR k0.id IN (
>         SELECT k1.k0_id
>         FROM k1
>         WHERE k1.k1k2_id IN (
>                 SELECT k2.k1k2_id
>                 FROM k2
>                 WHERE k2.t = 2
>                     AND (coalesce(k2.z, '')) LIKE '%12%'
>                 )
>         );

Perhaps you should fix your application to not generate such incredibly
silly SQL.  Figuring out that 1=2 is constant false and throwing it away
costs the server easily a thousand times as many instructions as it
would take for the client to not emit that in the first place.

The reason you don't get a nice semijoin plan when you do that is that
conversion of IN clauses to semijoins happens before
constant-subexpression simplification.  So the planner hasn't yet
figured out that the OR is useless when it would need to know that to
produce a good plan.  (And no, we can't just flip the order of those two
steps.  Doing two rounds of const-simplification wouldn't be a good
answer either, because it would penalize well-written queries to benefit
badly-written ones.)

            regards, tom lane


Re: "WHERE 1 = 2 OR ..." makes planner choose a very inefficient plan

From
Simon Riggs
Date:
On 18 April 2013 15:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> dmitry potapov <potapov.dmitry@gmail.com> writes:
>> I recently stumbled upon on what could be a planner bug or a corner case.
>> If "<false condition> OR ..." is added to WHERE clause of SELECT query,
>> then the planner chooses a very inefficient plan. Consider a query:
>
>> SELECT count(k0.id)
>> FROM k0
>> WHERE 1 = 2
>>     OR k0.id IN (
>>         SELECT k1.k0_id
>>         FROM k1
>>         WHERE k1.k1k2_id IN (
>>                 SELECT k2.k1k2_id
>>                 FROM k2
>>                 WHERE k2.t = 2
>>                     AND (coalesce(k2.z, '')) LIKE '%12%'
>>                 )
>>         );
>
> Perhaps you should fix your application to not generate such incredibly
> silly SQL.  Figuring out that 1=2 is constant false and throwing it away
> costs the server easily a thousand times as many instructions as it
> would take for the client to not emit that in the first place.
>
> The reason you don't get a nice semijoin plan when you do that is that
> conversion of IN clauses to semijoins happens before
> constant-subexpression simplification.  So the planner hasn't yet
> figured out that the OR is useless when it would need to know that to
> produce a good plan.  (And no, we can't just flip the order of those two
> steps.  Doing two rounds of const-simplification wouldn't be a good
> answer either, because it would penalize well-written queries to benefit
> badly-written ones.)

The situation shown could be the result of SQL injection attack.

It would be nice to have a switch to do additional checks on SQL
queries to ensure such injections don't cause long runtimes to return
useless answers.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: "WHERE 1 = 2 OR ..." makes planner choose a very inefficient plan

From
Claudio Freire
Date:
On Thu, May 2, 2013 at 9:48 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> SELECT count(k0.id)
>>> FROM k0
>>> WHERE 1 = 2
>>>     OR k0.id IN (
>>>         SELECT k1.k0_id
>>>         FROM k1
>>>         WHERE k1.k1k2_id IN (
>>>                 SELECT k2.k1k2_id
>>>                 FROM k2
>>>                 WHERE k2.t = 2
>>>                     AND (coalesce(k2.z, '')) LIKE '%12%'
>>>                 )
>>>         );
>>
...
>
> The situation shown could be the result of SQL injection attack.
>
> It would be nice to have a switch to do additional checks on SQL
> queries to ensure such injections don't cause long runtimes to return
> useless answers.

How could that be the case without becoming much much worse than large runtimes?

I don't think it's the place of the database to worry about SQL injection.