Thread: "WHERE 1 = 2 OR ..." makes planner choose a very inefficient plan
Hello,
I recently stumbled upon on what could be a planner bug or a corner case. If "<false condition> OR ..." is added to WHERE clause of SELECT query, then the planner chooses a very inefficient plan. Consider a query:
SELECT count(k0.id)I recently stumbled upon on what could be a planner bug or a corner case. If "<false condition> OR ..." is added to WHERE clause of SELECT query, then the planner chooses a very inefficient plan. Consider a query:
FROM k0
WHERE 1 = 2
OR k0.id IN (
SELECT k1.k0_id
FROM k1
WHERE k1.k1k2_id IN (
SELECT k2.k1k2_id
FROM k2
WHERE k2.t = 2
AND (coalesce(k2.z, '')) LIKE '%12%'
)
);
http://explain.depesz.com/s/tcn
Execution time: 2037872.420 ms (almost 34 minutes!!)
If I comment out "1=2 OR", then the plan changes dramatically:http://explain.depesz.com/s/5rsW
Execution time: 617.778 ms
I know LEFT JOIN or EXISTS instead of NOT IN in this case will give better plans. What bothers me is not performance of this particular query, but the strange behavior of query planner. Is this behavior considered normal, or should I file a bug?
version: PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
package: postgresql91-server.x86_64 9.1.9-1PGDG.rhel6
os: Scientific Linux 6.3
postgresql.conf: http://pgsql.privatepaste.com/e3e75bb789
--
On 18/04/13 15:20, dmitry potapov wrote: > Hello, > > I recently stumbled upon on what could be a planner bug or a corner > case. If "<false condition> OR ..." is added to WHERE clause of SELECT > query, then the planner chooses a very inefficient plan. Consider a query: > If I comment out "1=2 OR", then the plan changes dramatically: What happens if you substitute: 1. 1=3 OR 2. false OR -- Richard Huxton Archonet Ltd
dmitry potapov <potapov.dmitry@gmail.com> writes: > I recently stumbled upon on what could be a planner bug or a corner case. > If "<false condition> OR ..." is added to WHERE clause of SELECT query, > then the planner chooses a very inefficient plan. Consider a query: > SELECT count(k0.id) > FROM k0 > WHERE 1 = 2 > OR k0.id IN ( > SELECT k1.k0_id > FROM k1 > WHERE k1.k1k2_id IN ( > SELECT k2.k1k2_id > FROM k2 > WHERE k2.t = 2 > AND (coalesce(k2.z, '')) LIKE '%12%' > ) > ); Perhaps you should fix your application to not generate such incredibly silly SQL. Figuring out that 1=2 is constant false and throwing it away costs the server easily a thousand times as many instructions as it would take for the client to not emit that in the first place. The reason you don't get a nice semijoin plan when you do that is that conversion of IN clauses to semijoins happens before constant-subexpression simplification. So the planner hasn't yet figured out that the OR is useless when it would need to know that to produce a good plan. (And no, we can't just flip the order of those two steps. Doing two rounds of const-simplification wouldn't be a good answer either, because it would penalize well-written queries to benefit badly-written ones.) regards, tom lane
On 18 April 2013 15:46, Tom Lane <tgl@sss.pgh.pa.us> wrote: > dmitry potapov <potapov.dmitry@gmail.com> writes: >> I recently stumbled upon on what could be a planner bug or a corner case. >> If "<false condition> OR ..." is added to WHERE clause of SELECT query, >> then the planner chooses a very inefficient plan. Consider a query: > >> SELECT count(k0.id) >> FROM k0 >> WHERE 1 = 2 >> OR k0.id IN ( >> SELECT k1.k0_id >> FROM k1 >> WHERE k1.k1k2_id IN ( >> SELECT k2.k1k2_id >> FROM k2 >> WHERE k2.t = 2 >> AND (coalesce(k2.z, '')) LIKE '%12%' >> ) >> ); > > Perhaps you should fix your application to not generate such incredibly > silly SQL. Figuring out that 1=2 is constant false and throwing it away > costs the server easily a thousand times as many instructions as it > would take for the client to not emit that in the first place. > > The reason you don't get a nice semijoin plan when you do that is that > conversion of IN clauses to semijoins happens before > constant-subexpression simplification. So the planner hasn't yet > figured out that the OR is useless when it would need to know that to > produce a good plan. (And no, we can't just flip the order of those two > steps. Doing two rounds of const-simplification wouldn't be a good > answer either, because it would penalize well-written queries to benefit > badly-written ones.) The situation shown could be the result of SQL injection attack. It would be nice to have a switch to do additional checks on SQL queries to ensure such injections don't cause long runtimes to return useless answers. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, May 2, 2013 at 9:48 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> SELECT count(k0.id) >>> FROM k0 >>> WHERE 1 = 2 >>> OR k0.id IN ( >>> SELECT k1.k0_id >>> FROM k1 >>> WHERE k1.k1k2_id IN ( >>> SELECT k2.k1k2_id >>> FROM k2 >>> WHERE k2.t = 2 >>> AND (coalesce(k2.z, '')) LIKE '%12%' >>> ) >>> ); >> ... > > The situation shown could be the result of SQL injection attack. > > It would be nice to have a switch to do additional checks on SQL > queries to ensure such injections don't cause long runtimes to return > useless answers. How could that be the case without becoming much much worse than large runtimes? I don't think it's the place of the database to worry about SQL injection.