Thread: Unusual slowdown using subselects

Unusual slowdown using subselects

From
John Aughey
Date:
I'm stress testing my application by creating large data sets.  This
particular query selects rows from the schedule table that have a specific
owner_id.  (I'll show you the results of explain)

calendar=# explain select * from schedule where schedule.owner_id=101 or
schedule.owner_id=102;
Index Scan using schedule_id_index, schedule_id_index on schedule
(cost=0.00..78.64 rows=20 width=40)

Looks great and executes very fast.

calendar=# explain select group_id from groups where
user_id=101;
NOTICE:  QUERY PLAN:
Index Scan using groups_id_index on groups  (cost=0.00..2.02 rows=1
width=4)

Again, very fast.  The groups table maps users to groups.

However, this next one is slow.

calendar=# explain select * from schedule where schedule.owner_id in
(select group_id from groups where user_id=101);
NOTICE:  QUERY PLAN:
Seq Scan on schedule  (cost=0.00..2039895.00 rows=1000000 width=40)
  SubPlan
    ->  Materialize  (cost=2.02..2.02 rows=1 width=4)
          ->  Index Scan using groups_id_index on groups  (cost=0.00..2.02
rows=1 width=4)

You'll see in this one, where the first example did a index scan, this one
with a very similar query does a seq scan.  The two queries should be
nearly identical, but this one runs very slowly.

Can anyone explain why this happens and/or how I can do a sub-select like
this and get fast results?

Thank you
John Aughey

Re: Unusual slowdown using subselects

From
Tom Lane
Date:
John Aughey <jha@washucsc.org> writes:
> However, this next one is slow.

> calendar=# explain select * from schedule where schedule.owner_id in
> (select group_id from groups where user_id=101);

IN is not very well implemented at present.  You could try something
like this (in 7.1):

select schedule.* from schedule,
(select distinct group_id from groups where user_id=101) ss
where schedule.owner_id = ss.group_id;

            regards, tom lane