Thread: Poor plan when using EXISTS in the expression list
Hello Our developpers ORM (Django's) sadly can not use EXISTS in the where clauses without having it in the expression part of the SELECT statement. I was expecting it to create queries performing a bit worse than queries without this useless expression, but it turns out this trigger an extremely poor planning, with an additional Seq Scan of the table referenced in EXISTS. Thus the query select a.*, exists (select * from b where a_id = a.id) from a where exists (select * from b where a_id = a.id); can be orders of magnitude slower than select a.* from a where exists (select * from b where a_id = a.id); This has been reproduced on PostgreSQL 9.6 and 11 beta4. Example : test=> create table a (id serial primary key, b text); CREATE TABLE test=> create table b (id serial primary key, a_id integer not null references a(id), c text); CREATE TABLE test=> explain select a.* from a where exists (select * from b where a_id = a.id); QUERY PLAN ----------------------------------------------------------------------- Hash Join (cost=29.50..62.60 rows=635 width=36) Hash Cond: (a.id = b.a_id) -> Seq Scan on a (cost=0.00..22.70 rows=1270 width=36) -> Hash (cost=27.00..27.00 rows=200 width=4) -> HashAggregate (cost=25.00..27.00 rows=200 width=4) Group Key: b.a_id -> Seq Scan on b (cost=0.00..22.00 rows=1200 width=4) (7 rows) test=> explain select a.*, exists (select * from b where a_id = a.id) from a; QUERY PLAN ----------------------------------------------------------------- Seq Scan on a (cost=0.00..5314.37 rows=1270 width=37) SubPlan 1 -> Seq Scan on b (cost=0.00..25.00 rows=6 width=0) Filter: (a_id = a.id) SubPlan 2 -> Seq Scan on b b_1 (cost=0.00..22.00 rows=1200 width=4) (6 rows) test=> explain select a.*, exists (select * from b where a_id = a.id) from a where exists (select * from b where a_id = a.id); QUERY PLAN ----------------------------------------------------------------------- Hash Join (cost=29.50..2708.43 rows=635 width=37) Hash Cond: (a.id = b.a_id) -> Seq Scan on a (cost=0.00..22.70 rows=1270 width=36) -> Hash (cost=27.00..27.00 rows=200 width=4) -> HashAggregate (cost=25.00..27.00 rows=200 width=4) Group Key: b.a_id -> Seq Scan on b (cost=0.00..22.00 rows=1200 width=4) SubPlan 1 -> Seq Scan on b b_1 (cost=0.00..25.00 rows=6 width=0) Filter: (a_id = a.id) SubPlan 2 -> Seq Scan on b b_2 (cost=0.00..22.00 rows=1200 width=4) (12 rows) Thanks Pierre
On Thu, 4 Oct 2018 at 13:11, Pierre Ducroquet <p.psql@pinaraf.info> wrote:
Our developpers ORM (Django's) sadly can not use EXISTS in the where clauses
without having it in the expression part of the SELECT statement.
I don't know if this will be helpful to you (and I appreciate there's still the underlying PG issue), but there's a suggestion here that you can work around this using .extra()
Geoff
On Thursday, October 4, 2018 4:46:26 PM CEST Geoff Winkless wrote: > On Thu, 4 Oct 2018 at 13:11, Pierre Ducroquet <p.psql@pinaraf.info> wrote: > > Our developpers ORM (Django's) sadly can not use EXISTS in the where > > clauses > > without having it in the expression part of the SELECT statement. > > I don't know if this will be helpful to you (and I appreciate there's still > the underlying PG issue), but there's a suggestion here that you can work > around this using .extra() > > https://stackoverflow.com/a/38880144/321161 Sure this helps when you know the trap and don't use the Exist support in Django, but this still mean any developer with Django may create a query that, on small volumes, will be a bit slow, and will blow up on big volumes. We sadly can not monitor every piece of code written by developers or imported in the dependencies.