different queries and their efficiencies - Mailing list pgsql-general

From David Welton
Subject different queries and their efficiencies
Date
Msg-id 9877cd600511240824i484d62f4t84af43faf085cf0b@mail.gmail.com
Whole thread Raw
Responses Re: different queries and their efficiencies  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Hi,

I'm curious as to what the general opinion is on these different
queries and their relative merits, especially in terms of speed:

-----
select count(*) from zstore_temp where product_code not in (select
product_code from zstore)

                                   QUERY PLAN
--------------------------------------------------------------------------------
 Aggregate  (cost=17200040774.42..17200040774.42 rows=1 width=0)
   ->  Seq Scan on zstore_temp  (cost=0.00..17200038808.85 rows=786226 width=0)
         Filter: (NOT (subplan))
         SubPlan
           ->  Seq Scan on zstore  (cost=0.00..19747.74 rows=851574 width=21)
(5 rows)
-----

select count(foo) from (select product_code from zstore_temp except
select product_code from zstore) as foo;

                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Aggregate  (cost=502201.07..502201.07 rows=1 width=32)
   ->  Subquery Scan foo  (cost=487050.91..501595.06 rows=242403 width=32)
         ->  SetOp Except  (cost=487050.91..499171.03 rows=242402 width=21)
               ->  Sort  (cost=487050.91..493110.97 rows=2424025 width=21)
                     Sort Key: product_code
                     ->  Append  (cost=0.00..79125.50 rows=2424025 width=21)
                           ->  Subquery Scan "*SELECT* 1"
(cost=0.00..50862.02 rows=1572451 width=21)
                                 ->  Seq Scan on zstore_temp
(cost=0.00..35137.51 rows=1572451 width=21)
                           ->  Subquery Scan "*SELECT* 2"
(cost=0.00..28263.48 rows=851574 width=21)
                                 ->  Seq Scan on zstore
(cost=0.00..19747.74 rows=851574 width=21)

----

select count(zstore_temp.product_code) from zstore_temp left join
zstore on zstore_temp.product_code = zstore.product_code where
zstore.product_code is null;


                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Aggregate  (cost=100965813.69..100965813.69 rows=1 width=21)
   ->  Merge Right Join  (cost=448485.57..100882122.54 rows=33476460 width=21)
         Merge Cond: ("outer"."?column2?" = "inner"."?column2?")
         Filter: ("outer".product_code IS NULL)
         ->  Sort  (cost=154686.81..156815.74 rows=851574 width=21)
               Sort Key: (zstore.product_code)::text
               ->  Seq Scan on zstore  (cost=0.00..19747.74
rows=851574 width=21)
         ->  Sort  (cost=293798.76..297729.88 rows=1572451 width=21)
               Sort Key: (zstore_temp.product_code)::text
               ->  Seq Scan on zstore_temp  (cost=0.00..35137.51
rows=1572451 width=21)


-----

It seems as if there are multiple order-of-magnitude differences
between these queries given the above costs:

17200040774
100965813
502201

Is there anything that can be done to improve the performance of the
subquery or left join strategies?

Thanks, and happy Thanksgiving to all the USians,
--
David N. Welton
 - http://www.dedasys.com/davidw/

Apache, Linux, Tcl Consulting
 - http://www.dedasys.com/

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: "invalid page header in block 597621 of relation..."error
Next
From: Adam Witney
Date:
Subject: Re: "invalid page header in block 597621 of relation..."error