different queries and their efficiencies - Mailing list pgsql-general
From | David Welton |
---|---|
Subject | different queries and their efficiencies |
Date | |
Msg-id | 9877cd600511240824i484d62f4t84af43faf085cf0b@mail.gmail.com Whole thread Raw |
Responses |
Re: different queries and their efficiencies
|
List | pgsql-general |
Hi, I'm curious as to what the general opinion is on these different queries and their relative merits, especially in terms of speed: ----- select count(*) from zstore_temp where product_code not in (select product_code from zstore) QUERY PLAN -------------------------------------------------------------------------------- Aggregate (cost=17200040774.42..17200040774.42 rows=1 width=0) -> Seq Scan on zstore_temp (cost=0.00..17200038808.85 rows=786226 width=0) Filter: (NOT (subplan)) SubPlan -> Seq Scan on zstore (cost=0.00..19747.74 rows=851574 width=21) (5 rows) ----- select count(foo) from (select product_code from zstore_temp except select product_code from zstore) as foo; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (cost=502201.07..502201.07 rows=1 width=32) -> Subquery Scan foo (cost=487050.91..501595.06 rows=242403 width=32) -> SetOp Except (cost=487050.91..499171.03 rows=242402 width=21) -> Sort (cost=487050.91..493110.97 rows=2424025 width=21) Sort Key: product_code -> Append (cost=0.00..79125.50 rows=2424025 width=21) -> Subquery Scan "*SELECT* 1" (cost=0.00..50862.02 rows=1572451 width=21) -> Seq Scan on zstore_temp (cost=0.00..35137.51 rows=1572451 width=21) -> Subquery Scan "*SELECT* 2" (cost=0.00..28263.48 rows=851574 width=21) -> Seq Scan on zstore (cost=0.00..19747.74 rows=851574 width=21) ---- select count(zstore_temp.product_code) from zstore_temp left join zstore on zstore_temp.product_code = zstore.product_code where zstore.product_code is null; QUERY PLAN ---------------------------------------------------------------------------------------- Aggregate (cost=100965813.69..100965813.69 rows=1 width=21) -> Merge Right Join (cost=448485.57..100882122.54 rows=33476460 width=21) Merge Cond: ("outer"."?column2?" = "inner"."?column2?") Filter: ("outer".product_code IS NULL) -> Sort (cost=154686.81..156815.74 rows=851574 width=21) Sort Key: (zstore.product_code)::text -> Seq Scan on zstore (cost=0.00..19747.74 rows=851574 width=21) -> Sort (cost=293798.76..297729.88 rows=1572451 width=21) Sort Key: (zstore_temp.product_code)::text -> Seq Scan on zstore_temp (cost=0.00..35137.51 rows=1572451 width=21) ----- It seems as if there are multiple order-of-magnitude differences between these queries given the above costs: 17200040774 100965813 502201 Is there anything that can be done to improve the performance of the subquery or left join strategies? Thanks, and happy Thanksgiving to all the USians, -- David N. Welton - http://www.dedasys.com/davidw/ Apache, Linux, Tcl Consulting - http://www.dedasys.com/
pgsql-general by date: