Thread: Re: [HACKERS] please help on query
[moving to pgsql-sql] On Thu, 11 Jul 2002 17:22:14 +0200, "Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> wrote: >I can't improve performance on this query: > >SELECT > supplier.name, > supplier.address >FROM > supplier, > nation >WHERE > supplier.suppkey IN( > SELECT > partsupp.suppkey > FROM > partsupp > WHERE > partsupp.partkey IN( > SELECT > part.partkey > FROM > part > WHERE > part.name like 'forest%' > ) > AND partsupp.availqty>( > SELECT > 0.5*(sum(lineitem.quantity)::FLOAT) > FROM > lineitem > WHERE > lineitem.partkey=partsupp.partkey > AND lineitem.suppkey=partsupp.partkey ^^^^^^^ suppkey??? > AND lineitem.shipdate>=('1994-01-01')::DATE > AND lineitem.shipdate<(('1994-01-01')::DATE+('1 year')::INTERVAL)::DATE > ) > ) > AND supplier.nationkey=nation.nationkey > AND nation.name='CANADA' >ORDER BY > supplier.name; Luis, rules of thumb: "Avoid subselects; use joins!" and "If you have to use subselects, avoid IN, use EXISTS!" Let's try. If partkey is unique in part, then | FROM partsupp | WHERE partsupp.partkey IN (SELECT part.partkey can be replaced by FROM partsupp ps, part p WHERE ps.partkey = p.partkey or partsupp ps INNER JOIN part p ON (ps.partkey = p.partkey AND p.name LIKE '...') When we ignore "part" for now, your subselect boils down to | SELECT partsupp.suppkey | FROM partsupp | WHERE partsupp.availqty > ( | SELECT 0.5*(sum(lineitem.quantity)::FLOAT) | FROM lineitem | WHERE lineitem.partkey=partsupp.partkey | AND lineitem.suppkey=partsupp.suppkey | AND lineitem.shipdate BETWEEN ... AND ... | ) which can be rewritten to (untested) SELECT ps.suppkey FROM partsupp ps, lineitem li WHERE li.partkey=ps.partkey AND li.suppkey=ps.suppkey AND lineitem.shipdateBETWEEN ... AND ... GROUP BY ps.partkey, ps.suppkey HAVING min(ps.availqty) > 0.5*(sum(lineitem.quantity)::FLOAT) ^^^ As all ps.availqty are equal in one group, you can as well use max() or avg(). Now we have left only one IN: | WHERE supplier.suppkey IN ( | SELECT partsupp.suppkey FROM partsupp WHERE <condition> ) Being to lazy to find out, if this can be rewritten to a join, let`s apply rule 2 here: WHERE EXISTS ( SELECT ... FROM partsupp ps WHERE supplier.suppkey = ps.suppkey AND <condition> ) HTH, but use with a grain of salt ... >Sort (cost=2777810917708.17..2777810917708.17 rows=200 width=81) ^^^^^^^^^^^^^^^^ BTW, how many years are these? :-) ServusManfred
On Thu, 11 Jul 2002 19:40:46 +0200, "Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> wrote: >I've tried [reformatted to fit on one page] | SELECT supplier.name, supplier.address | FROM supplier, nation, lineitem You already found out that you do not need lineitem here. | WHERE EXISTS( | SELECT partsupp.suppkey | FROM partsupp,lineitem | WHERE | lineitem.partkey=partsupp.partkey | AND lineitem.suppkey=partsupp.partkey I still don't believe this suppkey=partkey | AND lineitem.shipdate [...] | AND EXISTS( SELECT part.partkey | FROM part WHERE part.name like 'forest%') This subselect gives either true or false, but in any case always the same result. You might want to add a conditionAND part.partkey=partsupp.partkey Are you sure partkey is not unique? If it is unique you can replace this subselect by a join. | GROUP BY partsupp.partkey,partsupp.suppkey | HAVING min(availqty)>(0.5*(sum(lineitem.quantity)::FLOAT)) | ) | AND supplier.nationkey=nation.nationkey | AND nation.name='CANADA' | ORDER BY supplier.name; >as you said and something is wrong >Sort (cost=1141741215.35..1141741215.35 rows=2400490000 width=81) The cost is now only 1141741215.35 compared to 2777810917708.17 before; this is an improvement factor of more than 2000. So what's your problem? ;-) ServusManfred
> The cost is now only 1141741215.35 compared to 2777810917708.17 > before; this is an improvement factor of more than 2000. So what's > your problem? ;-) > > Servus > Manfred > In fact planner is estimating incredibly badly, it took only 833msecs now runs perfectly I'm going to keep on asking about another query: SELECTcustomer.name,customer.custkey,orders.orderkey,orders.orderdate,orders.totalprice,sum(lineitem.quantity) FROMcustomer,orders,lineitem WHEREexists( SELECT lineitem.orderkey FROM lineitem WHERE lineitem.orderkey=orders.orderkey GROUP BY lineitem.orderkeyHAVING sum(lineitem.quantity)>300 )AND customer.custkey=orders.custkeyAND orders.orderkey=lineitem.orderkey GROUP BYcustomer.name,customer.custkey,orders.orderkey,orders.orderdate,orders.totalprice ORDER BYorders.totalprice DESC,orders.orderdate; NOTICE: QUERY PLAN: Sort (cost=26923941.97..26923941.97 rows=300061 width=66) -> Aggregate (cost=26851634.86..26896644.05 rows=300061 width=66) -> Group (cost=26851634.86..26889142.52 rows=3000612 width=66) -> Sort (cost=26851634.86..26851634.86rows=3000612 width=66) -> Hash Join (cost=26107574.81..26457309.10 rows=3000612 width=66) -> Seq Scan on lineitem (cost=0.00..222208.25 rows=6001225 width=8) -> Hash (cost=26105699.81..26105699.81 rows=750000 width=58) -> Hash Join (cost=7431.00..26105699.81 rows=750000 width=58) -> Seq Scan on orders (cost=0.00..26083268.81 rows=750000 width=25) SubPlan -> Aggregate (cost=0.00..17.35 rows=1 width=8) -> Group (cost=0.00..17.34 rows=5 width=8) -> Index Scan using lineitem_pkey on lineitem (cost=0.00..17.33 rows=5 width=8) -> Hash (cost=7056.00..7056.00 rows=150000 width=33) -> Seq Scan on customer (cost=0.00..7056.00 rows=150000 width=33) again: orders 1500000 tuples lineitem 6000000 tuples there are 1 to 7 lineitems per orderkey Customer 150000 tuples select attname,n_distinct,correlation from pg_stats where tablename='lineitem'; attname | n_distinct | correlation ---------------+------------+-------------orderkey | -0.199847 | 1partkey | 196448 | 0.0223377suppkey | 9658 | -0.00822751linenumber | 7 | 0.17274quantity | 50 | 0.0150153extendedprice| 25651 | -0.00790245discount | 11 | 0.103761tax | 9 | 0.0993771returnflag | 3 | 0.391434linestatus | 2 | 0.509791shipdate | 2440 | 0.0072777commitdate | 2497 | 0.00698162receiptdate | 2416 | 0.00726686shipinstruct | 4 | 0.241511shipmode | 7 | 0.138432comment | 275488 | 0.0188006 (16 rows) select attname,n_distinct,correlation from pg_stats where tablename='orders'; attname | n_distinct | correlation ---------------+------------+-------------orderkey | -1 | -0.999925custkey | 76309 | 0.00590596orderstatus | 3 | 0.451991totalprice | -1 | -0.00768806orderdate | 2431 | -0.0211354orderpriority| 5 | 0.182489clerk | 1009 | 0.00546939shippriority | 1 | 1comment | -0.750125 | -0.0123887 Customer attname | n_distinct | correlation ------------+------------+-------------custkey | -1 | 1name | -1 | 1address | -1 | -0.00510274nationkey | 25 | 0.0170533phone | -1 | -0.0227816acctbal | -0.83444 | -0.00220958mktsegment | 5 | 0.205013comment | -1 | 0.0327827 This query takes 12 minutes to run and returns about 50 customers. lineitem.quantity takes values from 1 to 50, so 300 per orderkey is very restrictive May someone help on improving performance? Again thanks in advance Regards
hi, avoid subselect: create a temp table and use join... CREATE TEMP TABLE tmp AS SELECT lineitem.orderkey FROM lineitem WHERE lineitem.orderkey=orders.orderkey GROUP BY lineitem.orderkey HAVING sum(lineitem.quantity)>300; CREATE INDEX tmp_idx ON tmp (orderkey); SELECT customer.name, customer.custkey, orders.orderkey, orders.orderdate, orders.totalprice, sum(lineitem.quantity)FROMcustomer, orders, lineitem, tmpWHERE orders.orderkey=tmp.orderkey AND customer.custkey=orders.custkeyAND orders.orderkey=lineitem.orderkeyGROUP BY customer.name, customer.custkey, orders.orderkey,orders.orderdate, orders.totalpriceORDER BY orders.totalprice DESC, orders.orderdate; may be the index is not necessary... kuba > I'm going to keep on asking about another query: > > SELECT > customer.name, > customer.custkey, > orders.orderkey, > orders.orderdate, > orders.totalprice, > sum(lineitem.quantity) > FROM > customer, > orders, > lineitem > WHERE > exists( > SELECT > lineitem.orderkey > FROM > lineitem > WHERE > lineitem.orderkey=orders.orderkey > GROUP BY > lineitem.orderkey HAVING > sum(lineitem.quantity)>300 > ) > AND customer.custkey=orders.custkey > AND orders.orderkey=lineitem.orderkey > GROUP BY > customer.name, > customer.custkey, > orders.orderkey, > orders.orderdate, > orders.totalprice > > ORDER BY > orders.totalprice DESC, > orders.orderdate; > > NOTICE: QUERY PLAN: > > Sort (cost=26923941.97..26923941.97 rows=300061 width=66) > -> Aggregate (cost=26851634.86..26896644.05 rows=300061 width=66) > -> Group (cost=26851634.86..26889142.52 rows=3000612 width=66) > -> Sort (cost=26851634.86..26851634.86 rows=3000612 > width=66) > -> Hash Join (cost=26107574.81..26457309.10 > rows=3000612 width=66) > -> Seq Scan on lineitem (cost=0.00..222208.25 > rows=6001225 width=8) > -> Hash (cost=26105699.81..26105699.81 > rows=750000 width=58) > -> Hash Join (cost=7431.00..26105699.81 > rows=750000 width=58) > -> Seq Scan on orders > (cost=0.00..26083268.81 rows=750000 width=25) > SubPlan > -> Aggregate > (cost=0.00..17.35 rows=1 width=8) > -> Group > (cost=0.00..17.34 rows=5 width=8) > -> Index Scan > using lineitem_pkey on lineitem (cost=0.00..17.33 rows=5 width=8) > -> Hash (cost=7056.00..7056.00 > rows=150000 width=33) > -> Seq Scan on customer > (cost=0.00..7056.00 rows=150000 width=33) > > again: > orders 1500000 tuples > lineitem 6000000 tuples there are 1 to 7 lineitems per orderkey > Customer 150000 tuples > > select attname,n_distinct,correlation from pg_stats where > tablename='lineitem'; > attname | n_distinct | correlation > ---------------+------------+------------- > orderkey | -0.199847 | 1 > partkey | 196448 | 0.0223377 > suppkey | 9658 | -0.00822751 > linenumber | 7 | 0.17274 > quantity | 50 | 0.0150153 > extendedprice | 25651 | -0.00790245 > discount | 11 | 0.103761 > tax | 9 | 0.0993771 > returnflag | 3 | 0.391434 > linestatus | 2 | 0.509791 > shipdate | 2440 | 0.0072777 > commitdate | 2497 | 0.00698162 > receiptdate | 2416 | 0.00726686 > shipinstruct | 4 | 0.241511 > shipmode | 7 | 0.138432 > comment | 275488 | 0.0188006 > (16 rows) > > select attname,n_distinct,correlation from pg_stats where > tablename='orders'; > attname | n_distinct | correlation > ---------------+------------+------------- > orderkey | -1 | -0.999925 > custkey | 76309 | 0.00590596 > orderstatus | 3 | 0.451991 > totalprice | -1 | -0.00768806 > orderdate | 2431 | -0.0211354 > orderpriority | 5 | 0.182489 > clerk | 1009 | 0.00546939 > shippriority | 1 | 1 > comment | -0.750125 | -0.0123887 > > Customer > attname | n_distinct | correlation > ------------+------------+------------- > custkey | -1 | 1 > name | -1 | 1 > address | -1 | -0.00510274 > nationkey | 25 | 0.0170533 > phone | -1 | -0.0227816 > acctbal | -0.83444 | -0.00220958 > mktsegment | 5 | 0.205013 > comment | -1 | 0.0327827 > > This query takes 12 minutes to run and returns about 50 customers. > lineitem.quantity takes values from 1 to 50, so 300 per orderkey is very > restrictive > > May someone help on improving performance? > Again thanks in advance > Regards > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html >
I've tried SELECTsupplier.name,supplier.address FROMsupplier,nation,lineitem WHEREEXISTS( SELECT partsupp.suppkey FROM partsupp,lineitem WHERE lineitem.partkey=partsupp.partkey AND lineitem.suppkey=partsupp.partkey AND lineitem.shipdate>=('1994-01-01')::DATE AND lineitem.shipdate<(('1994-01-01')::DATE+('1year')::INTERVAL)::DATE AND EXISTS( SELECT part.partkey FROM part WHERE part.name like 'forest%' ) GROUP BY partsupp.partkey,partsupp.suppkey HAVING min(availqty)>(0.5*(sum(lineitem.quantity)::FLOAT)))AND supplier.nationkey=nation.nationkeyAND nation.name='CANADA' ORDER BYsupplier.name; as you said and something is wrong Sort (cost=1141741215.35..1141741215.35 rows=2400490000 width=81) InitPlan -> Aggregate (cost=0.00..921773.85 rows=48width=24) InitPlan -> Seq Scan on part (cost=0.00..8561.00 rows=1 width=4) -> Group (cost=0.00..921771.44 rows=481 width=24) -> Result (cost=0.00..921769.04 rows=481 width=24) -> Merge Join (cost=0.00..921769.04 rows=481 width=24) -> Index Scan using partsupp_pkey on partsupp (cost=0.00..98522.75 rows=800000 width=12) -> Index Scan using lsupp_index on lineitem (cost=0.00..821239.91 rows=145 width=12) -> Result (cost=1.31..112888690.31 rows=2400490000 width=81) -> NestedLoop (cost=1.31..112888690.31 rows=2400490000 width=81) -> Hash Join (cost=1.31..490.31 rows=400 width=81) -> Seq Scan on supplier (cost=0.00..434.00 rows=10000 width=77) -> Hash (cost=1.31..1.31 rows=1 width=4) -> Seq Scan on nation (cost=0.00..1.31rows=1 width=4) -> Seq Scan on lineitem (cost=0.00..222208.25 rows=6001225 width=0) where might be my mistake Thanks and regards ----- Original Message ----- From: "Manfred Koizar" <mkoi-pg@aon.at> To: "Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> Cc: <pgsql-sql@postgresql.org> Sent: Thursday, July 11, 2002 6:47 PM Subject: Re: [HACKERS] please help on query > [moving to pgsql-sql] > On Thu, 11 Jul 2002 17:22:14 +0200, "Luis Alberto Amigo Navarro" > <lamigo@atc.unican.es> wrote: > >I can't improve performance on this query: > > > >SELECT > > supplier.name, > > supplier.address > >FROM > > supplier, > > nation > >WHERE > > supplier.suppkey IN( > > SELECT > > partsupp.suppkey > > FROM > > partsupp > > WHERE > > partsupp.partkey IN( > > SELECT > > part.partkey > > FROM > > part > > WHERE > > part.name like 'forest%' > > ) > > AND partsupp.availqty>( > > SELECT > > 0.5*(sum(lineitem.quantity)::FLOAT) > > FROM > > lineitem > > WHERE > > lineitem.partkey=partsupp.partkey > > AND lineitem.suppkey=partsupp.partkey > ^^^^^^^ > suppkey ??? > > AND lineitem.shipdate>=('1994-01-01')::DATE > > AND lineitem.shipdate<(('1994-01-01')::DATE+('1 year')::INTERVAL)::DATE > > ) > > ) > > AND supplier.nationkey=nation.nationkey > > AND nation.name='CANADA' > >ORDER BY > > supplier.name; >
Lineitem is being modified on run time, so creating a temp table don't solves my problem The time of creating this table is the same of performing the subselect (or so I think), it could be done creating a new table, and a new trigger, but there are already triggers to calculate lineitem.extendedprice=part.retailprice*lineitem.quantity*(1+taxes)*(1-disco unt) and to calculate orderstatus in order with linestatus and to calculate orders.totalprice as sum(extendedprice) where lineitem.orderkey=new.orderkey. A new trigger in order to insert orderkey if sum(quantity) where orderkey=new.orderkey might be excessive. Any other idea? Thanks And Regards ----- Original Message ----- From: "Jakub Ouhrabka" <jakub.ouhrabka@comgate.cz> To: "Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> Cc: "Manfred Koizar" <mkoi-pg@aon.at>; <pgsql-sql@postgresql.org> Sent: Friday, July 12, 2002 1:50 PM Subject: Re: [SQL] [HACKERS] please help on query > hi, > > avoid subselect: create a temp table and use join... > > CREATE TEMP TABLE tmp AS > SELECT > lineitem.orderkey > FROM > lineitem > WHERE > lineitem.orderkey=orders.orderkey > GROUP BY > lineitem.orderkey HAVING > sum(lineitem.quantity)>300; > > CREATE INDEX tmp_idx ON tmp (orderkey); > > SELECT > customer.name, > customer.custkey, > orders.orderkey, > orders.orderdate, > orders.totalprice, > sum(lineitem.quantity) > FROM > customer, > orders, > lineitem, > tmp > WHERE > orders.orderkey=tmp.orderkey > AND customer.custkey=orders.custkey > AND orders.orderkey=lineitem.orderkey > GROUP BY > customer.name, > customer.custkey, > orders.orderkey, > orders.orderdate, > orders.totalprice > ORDER BY > orders.totalprice DESC, > orders.orderdate; > > > may be the index is not necessary... > > kuba > > > > I'm going to keep on asking about another query: > > > > SELECT > > customer.name, > > customer.custkey, > > orders.orderkey, > > orders.orderdate, > > orders.totalprice, > > sum(lineitem.quantity) > > FROM > > customer, > > orders, > > lineitem > > WHERE > > exists( > > SELECT > > lineitem.orderkey > > FROM > > lineitem > > WHERE > > lineitem.orderkey=orders.orderkey > > GROUP BY > > lineitem.orderkey HAVING > > sum(lineitem.quantity)>300 > > ) > > AND customer.custkey=orders.custkey > > AND orders.orderkey=lineitem.orderkey > > GROUP BY > > customer.name, > > customer.custkey, > > orders.orderkey, > > orders.orderdate, > > orders.totalprice > > > > ORDER BY > > orders.totalprice DESC, > > orders.orderdate; > > > > NOTICE: QUERY PLAN: > > > > Sort (cost=26923941.97..26923941.97 rows=300061 width=66) > > -> Aggregate (cost=26851634.86..26896644.05 rows=300061 width=66) > > -> Group (cost=26851634.86..26889142.52 rows=3000612 width=66) > > -> Sort (cost=26851634.86..26851634.86 rows=3000612 > > width=66) > > -> Hash Join (cost=26107574.81..26457309.10 > > rows=3000612 width=66) > > -> Seq Scan on lineitem (cost=0.00..222208.25 > > rows=6001225 width=8) > > -> Hash (cost=26105699.81..26105699.81 > > rows=750000 width=58) > > -> Hash Join (cost=7431.00..26105699.81 > > rows=750000 width=58) > > -> Seq Scan on orders > > (cost=0.00..26083268.81 rows=750000 width=25) > > SubPlan > > -> Aggregate > > (cost=0.00..17.35 rows=1 width=8) > > -> Group > > (cost=0.00..17.34 rows=5 width=8) > > -> Index Scan > > using lineitem_pkey on lineitem (cost=0.00..17.33 rows=5 width=8) > > -> Hash (cost=7056.00..7056.00 > > rows=150000 width=33) > > -> Seq Scan on customer > > (cost=0.00..7056.00 rows=150000 width=33) > > > > again: > > orders 1500000 tuples > > lineitem 6000000 tuples there are 1 to 7 lineitems per orderkey > > Customer 150000 tuples > > > > select attname,n_distinct,correlation from pg_stats where > > tablename='lineitem'; > > attname | n_distinct | correlation > > ---------------+------------+------------- > > orderkey | -0.199847 | 1 > > partkey | 196448 | 0.0223377 > > suppkey | 9658 | -0.00822751 > > linenumber | 7 | 0.17274 > > quantity | 50 | 0.0150153 > > extendedprice | 25651 | -0.00790245 > > discount | 11 | 0.103761 > > tax | 9 | 0.0993771 > > returnflag | 3 | 0.391434 > > linestatus | 2 | 0.509791 > > shipdate | 2440 | 0.0072777 > > commitdate | 2497 | 0.00698162 > > receiptdate | 2416 | 0.00726686 > > shipinstruct | 4 | 0.241511 > > shipmode | 7 | 0.138432 > > comment | 275488 | 0.0188006 > > (16 rows) > > > > select attname,n_distinct,correlation from pg_stats where > > tablename='orders'; > > attname | n_distinct | correlation > > ---------------+------------+------------- > > orderkey | -1 | -0.999925 > > custkey | 76309 | 0.00590596 > > orderstatus | 3 | 0.451991 > > totalprice | -1 | -0.00768806 > > orderdate | 2431 | -0.0211354 > > orderpriority | 5 | 0.182489 > > clerk | 1009 | 0.00546939 > > shippriority | 1 | 1 > > comment | -0.750125 | -0.0123887 > > > > Customer > > attname | n_distinct | correlation > > ------------+------------+------------- > > custkey | -1 | 1 > > name | -1 | 1 > > address | -1 | -0.00510274 > > nationkey | 25 | 0.0170533 > > phone | -1 | -0.0227816 > > acctbal | -0.83444 | -0.00220958 > > mktsegment | 5 | 0.205013 > > comment | -1 | 0.0327827 > > > > This query takes 12 minutes to run and returns about 50 customers. > > lineitem.quantity takes values from 1 to 50, so 300 per orderkey is very > > restrictive > > > > May someone help on improving performance? > > Again thanks in advance > > Regards > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 5: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/users-lounge/docs/faq.html > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > >
On Fri, 12 Jul 2002 17:32:50 +0200 "Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> wrote: > Lineitem is being modified on run time, so creating a temp table don't > solves my problem > The time of creating this table is the same of performing the subselect (or > so I think), it could be done creating a new table, and a new trigger, but > there are already triggers to calculate > lineitem.extendedprice=part.retailprice*lineitem.quantity*(1+taxes)*(1-disco > unt) and to calculate orderstatus in order with linestatus and to calculate > orders.totalprice as sum(extendedprice) where > lineitem.orderkey=new.orderkey. A new trigger in order to insert orderkey if > sum(quantity) where orderkey=new.orderkey might be excessive. > Any other idea? > Thanks And Regards > > ----- Original Message ----- > From: "Jakub Ouhrabka" <jakub.ouhrabka@comgate.cz> > To: "Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> > Cc: "Manfred Koizar" <mkoi-pg@aon.at>; <pgsql-sql@postgresql.org> > Sent: Friday, July 12, 2002 1:50 PM > Subject: Re: [SQL] [HACKERS] please help on query > > > > > avoid subselect: create a temp table and use join... > > > > CREATE TEMP TABLE tmp AS > > SELECT > > lineitem.orderkey > > FROM > > lineitem > > WHERE > > lineitem.orderkey=orders.orderkey > > GROUP BY > > lineitem.orderkey HAVING > > sum(lineitem.quantity)>300; Hi, I'm not sure whether its performance can be improved or not. But I feel there is a slight chance to reduce the total number of the tuples which Planner must think. BTW, how much time does the following query take in your situation, and how many rows does it retrieve ? EXPLAIN ANALYZE SELECT lineitem.orderkey FROM lineitem GROUP BY lineitem.orderkey HAVING SUM(lineitem.quantity)> 300; Regards, Masaru Sugawara
----- Original Message ----- From: "Masaru Sugawara" <rk73@sea.plala.or.jp> To: "Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> Cc: <pgsql-sql@postgresql.org> Sent: Sunday, July 14, 2002 2:23 PM Subject: Re: [SQL] [HACKERS] please help on query This is the output: Aggregate (cost=0.00..647161.10 rows=600122 width=8) (actual time=4959.19..347328.83 rows=62 loops=1) -> Group (cost=0.00..632158.04 rows=6001225 width=8) (actual time=10.79..274259.16 rows=6001225 loops=1) -> Index Scan using lineitem_pkey on lineitem (cost=0.00..617154.97 rows=6001225 width=8) (actual time=10.77..162439.11 rows=6001225 loops=1) Total runtime: 347330.28 msec it is returning all rows in lineitem. Why is it using index? Thanks and regards > On Fri, 12 Jul 2002 17:32:50 +0200 > "Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> wrote: > > > > Lineitem is being modified on run time, so creating a temp table don't > > solves my problem > > The time of creating this table is the same of performing the subselect (or > > so I think), it could be done creating a new table, and a new trigger, but > > there are already triggers to calculate > > lineitem.extendedprice=part.retailprice*lineitem.quantity*(1+taxes)*(1-disco > > unt) and to calculate orderstatus in order with linestatus and to calculate > > orders.totalprice as sum(extendedprice) where > > lineitem.orderkey=new.orderkey. A new trigger in order to insert orderkey if > > sum(quantity) where orderkey=new.orderkey might be excessive. > > Any other idea? > > Thanks And Regards > > > > ----- Original Message ----- > > From: "Jakub Ouhrabka" <jakub.ouhrabka@comgate.cz> > > To: "Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> > > Cc: "Manfred Koizar" <mkoi-pg@aon.at>; <pgsql-sql@postgresql.org> > > Sent: Friday, July 12, 2002 1:50 PM > > Subject: Re: [SQL] [HACKERS] please help on query > > > > > > > > avoid subselect: create a temp table and use join... > > > > > > CREATE TEMP TABLE tmp AS > > > SELECT > > > lineitem.orderkey > > > FROM > > > lineitem > > > WHERE > > > lineitem.orderkey=orders.orderkey > > > GROUP BY > > > lineitem.orderkey HAVING > > > sum(lineitem.quantity)>300; > > > Hi, > > I'm not sure whether its performance can be improved or not. But I feel > there is a slight chance to reduce the total number of the tuples which > Planner must think. > > BTW, how much time does the following query take in your situation, > and how many rows does it retrieve ? > > > EXPLAIN ANALYZE > SELECT > lineitem.orderkey > FROM > lineitem > GROUP BY > lineitem.orderkey > HAVING > SUM(lineitem.quantity) > 300; > > > > Regards, > Masaru Sugawara > > >
On Mon, 15 Jul 2002 09:45:36 +0200 "Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> wrote: > This is the output: > > Aggregate (cost=0.00..647161.10 rows=600122 width=8) (actual > time=4959.19..347328.83 rows=62 loops=1) > -> Group (cost=0.00..632158.04 rows=6001225 width=8) (actual > time=10.79..274259.16 rows=6001225 loops=1) > -> Index Scan using lineitem_pkey on lineitem > (cost=0.00..617154.97 rows=6001225 width=8) (actual time=10.77..162439.11 > rows=6001225 loops=1) > Total runtime: 347330.28 msec > > it is returning all rows in lineitem. Why is it using index? Sorry, I don't know the reason. I need more info. Can you show me the outputs of EXPLAIN ANALYZE ? EXPLAIN ANALYZE SELECT orders.orderkey FROM lineitem LEFT OUTER JOIN orders USING(orderkey) WHERE orders.orderkeyIS NOT NULL GROUP BY orders.orderkey HAVING SUM(lineitem.quantity) > 300; EXPLAIN ANALYZE SELECT t2.* FROM (SELECT orders.orderkey FROM lineitem LEFT OUTER JOIN orders USING(orderkey) WHERE orders.orderkey IS NOT NULL GROUP BY orders.orderkey HAVING SUM(lineitem.quantity) > 300 ) AS t1 LEFT OUTER JOIN orders AS t2 USING(orderkey) ORDER BY t2.custkey Regards, Masaru Sugawara
----- Original Message ----- From: "Masaru Sugawara" <rk73@sea.plala.or.jp> To: "Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> Cc: <pgsql-sql@postgresql.org> Sent: Monday, July 15, 2002 6:15 PM Subject: Re: [SQL] [HACKERS] please help on query > > Sorry, I don't know the reason. > I need more info. Can you show me the outputs of EXPLAIN ANALYZE ? > Here it is: > > EXPLAIN ANALYZE > SELECT > orders.orderkey > FROM > lineitem LEFT OUTER JOIN > orders USING(orderkey) > WHERE > orders.orderkey IS NOT NULL > GROUP BY > orders.orderkey > HAVING > SUM(lineitem.quantity) > 300; > Aggregate (cost=1257368.92..1287375.04 rows=600122 width=12) (actual time=1236941.71..1454824.56 rows=62 loops=1) -> Group (cost=1257368.92..1272371.98 rows=6001225 width=12) (actual time=1233968.87..1385034.91 rows=6001225 loops=1) -> Sort (cost=1257368.92..1257368.92 rows=6001225 width=12) (actual time=1233968.82..1276147.37 rows=6001225 loops=1) -> Hash Join (cost=166395.00..520604.08 rows=6001225 width=12) (actual time=59061.21..773997.08 rows=6001225 loops=1) -> Seq Scan on lineitem (cost=0.00..195405.25 rows=6001225 width=8) (actual time=20.66..115511.34 rows=6001225 loops=1) -> Hash (cost=162645.00..162645.00rows=1500000 width=4) (actual time=59032.16..59032.16 rows=0 loops=1) -> Seq Scan on orders (cost=0.00..162645.00 rows=1500000 width=4) (actual time=17.33..44420.10 rows=1500000 loops=1) Total runtime: 1454929.11 msec > > > EXPLAIN ANALYZE > SELECT > t2.* > FROM (SELECT > orders.orderkey > FROM > lineitem LEFT OUTER JOIN > orders USING(orderkey) > WHERE > orders.orderkey IS NOT NULL > GROUP BY > orders.orderkey > HAVING > SUM(lineitem.quantity) > 300 > ) AS t1 LEFT OUTER JOIN > orders AS t2 USING(orderkey) > ORDER BY t2.custkey > Sort (cost=1739666.43..1739666.43 rows=600122 width=119) (actual time=1538897.23..1538897.47 rows=62 loops=1) -> Merge Join (cost=1344971.49..1682069.98 rows=600122 width=119) (actual time=1440886.58..1538886.03 rows=62 loops=1) -> Index Scan using orders_pkey on orders t2 (cost=0.00..324346.65 rows=1500000 width=115) (actual time=32.80..87906.98 rows=1455276 loops=1) -> Sort (cost=1344971.49..1344971.49 rows=600122width=12) (actual time=1439550.31..1439550.73 rows=62 loops=1) -> Subquery Scan t1 (cost=1257368.92..1287375.04 rows=600122 width=12) (actual time=1222560.86..1439549.36 rows=62 loops=1) -> Aggregate (cost=1257368.92..1287375.04rows=600122 width=12) (actual time=1222560.84..1439548.42 rows=62 loops=1) -> Group (cost=1257368.92..1272371.98 rows=6001225 width=12) (actual time=1219607.04..1369327.42 rows=6001225 loops=1) -> Sort (cost=1257368.92..1257368.92 rows=6001225 width=12) (actual time=1219607.00..1261208.08 rows=6001225 loops=1) -> Hash Join (cost=166395.00..520604.08 rows=6001225 width=12) (actual time=65973.31..769253.41 rows=6001225 loops=1) -> Seq Scan on lineitem (cost=0.00..195405.25 rows=6001225 width=8) (actual time=20.07..115247.61 rows=6001225 loops=1) -> Hash (cost=162645.00..162645.00 rows=1500000 width=4) (actual time=65943.80..65943.80 rows=0 loops=1) -> Seq Scan on orders (cost=0.00..162645.00 rows=1500000 width=4) (actual time=39.04..52049.90 rows=1500000 loops=1) Total runtime: 1539010.00 msec Thanks and regards
On Tue, 16 Jul 2002 10:51:03 +0200 "Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> wrote: > Aggregate (cost=1257368.92..1287375.04 rows=600122 width=12) (actual > time=1236941.71..1454824.56 rows=62 loops=1) > -> Group (cost=1257368.92..1272371.98 rows=6001225 width=12) (actual > time=1233968.87..1385034.91 rows=6001225 loops=1) > -> Sort (cost=1257368.92..1257368.92 rows=6001225 width=12) > (actual time=1233968.82..1276147.37 rows=6001225 loops=1) > -> Hash Join (cost=166395.00..520604.08 rows=6001225 > width=12) (actual time=59061.21..773997.08 rows=6001225 loops=1) > -> Seq Scan on lineitem (cost=0.00..195405.25 > rows=6001225 width=8) (actual time=20.66..115511.34 rows=6001225 loops=1) > -> Hash (cost=162645.00..162645.00 rows=1500000 > width=4) (actual time=59032.16..59032.16 rows=0 loops=1) > -> Seq Scan on orders (cost=0.00..162645.00 > rows=1500000 width=4) (actual time=17.33..44420.10 rows=1500000 loops=1) > Total runtime: 1454929.11 msec Hmm, does each of the three tables have some indices like the following? If not so, could you execute EXPLAIN ANALYZE after creating the indices. create index idx_lineitem_orderkey on lineitem(orderkey); create index idx_orders_orderkey on orders(orderkey); create index idx_orders_custkey on orders(custkey); create index idx_customer_custkey on customer(custkey); Regards, Masaru Sugawara