On 15 Září 2011, 9:53, Yang Zhang wrote:
> I have a simple query that's been running for a while, which is fine,
> but it seems to be running very slowly, which is a problem:
>
> mydb=# explain select user_id from den where user_id not in (select
> duid from user_mappings) and timestamp between '2009-04-01' and
> '2010-04-01';
>
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Seq Scan on den (cost=711.58..66062724212.74 rows=22634720 width=4)
> Filter: (("timestamp" >= '2009-04-01 00:00:00'::timestamp without
> time zone) AND ("timestamp" <= '2010-04-01 00:00:00'::timestamp
> without time zone) AND (NOT (SubPlan 1)))
> SubPlan 1
> -> Materialize (cost=711.58..1223.38 rows=36780 width=4)
> -> Seq Scan on user_mappings (cost=0.00..530.80 rows=36780
> width=4)
>
> user_mappings is fairly small:
The problem is that for each of the 22634720 rows in "den" a separate
uncorrelated subquery (a seq scan on user_mappings) has to be executed.
Althogh the subquery is not very expensive, multiplied by the number of
rows in "den" the total cost is extreme.
The only solution is to get rid of the "not in" subquery - try to turn it
to a join like this:
SELECT user_id FROM den LEFT JOIN user_mappings ON (user_id = duid)
WHERE (timestamp BETWEEN '2009-04-01' AND '2010-04-01')
AND (duid IS NULL)
That should give the same result I guess.
Tomas