Re: [HACKERS] please help on query - Mailing list pgsql-sql
From | Jakub Ouhrabka |
---|---|
Subject | Re: [HACKERS] please help on query |
Date | |
Msg-id | Pine.LNX.4.44.0207121351390.8739-100000@server Whole thread Raw |
List | pgsql-sql |
hi, avoid subselect: create a temp table and use join... CREATE TEMP TABLE tmp AS SELECT lineitem.orderkey FROM lineitem WHERE lineitem.orderkey=orders.orderkey GROUP BY lineitem.orderkey HAVING sum(lineitem.quantity)>300; CREATE INDEX tmp_idx ON tmp (orderkey); SELECT customer.name, customer.custkey, orders.orderkey, orders.orderdate, orders.totalprice, sum(lineitem.quantity)FROMcustomer, orders, lineitem, tmpWHERE orders.orderkey=tmp.orderkey AND customer.custkey=orders.custkeyAND orders.orderkey=lineitem.orderkeyGROUP BY customer.name, customer.custkey, orders.orderkey,orders.orderdate, orders.totalpriceORDER BY orders.totalprice DESC, orders.orderdate; may be the index is not necessary... kuba > I'm going to keep on asking about another query: > > SELECT > customer.name, > customer.custkey, > orders.orderkey, > orders.orderdate, > orders.totalprice, > sum(lineitem.quantity) > FROM > customer, > orders, > lineitem > WHERE > exists( > SELECT > lineitem.orderkey > FROM > lineitem > WHERE > lineitem.orderkey=orders.orderkey > GROUP BY > lineitem.orderkey HAVING > sum(lineitem.quantity)>300 > ) > AND customer.custkey=orders.custkey > AND orders.orderkey=lineitem.orderkey > GROUP BY > customer.name, > customer.custkey, > orders.orderkey, > orders.orderdate, > orders.totalprice > > ORDER BY > orders.totalprice DESC, > orders.orderdate; > > NOTICE: QUERY PLAN: > > Sort (cost=26923941.97..26923941.97 rows=300061 width=66) > -> Aggregate (cost=26851634.86..26896644.05 rows=300061 width=66) > -> Group (cost=26851634.86..26889142.52 rows=3000612 width=66) > -> Sort (cost=26851634.86..26851634.86 rows=3000612 > width=66) > -> Hash Join (cost=26107574.81..26457309.10 > rows=3000612 width=66) > -> Seq Scan on lineitem (cost=0.00..222208.25 > rows=6001225 width=8) > -> Hash (cost=26105699.81..26105699.81 > rows=750000 width=58) > -> Hash Join (cost=7431.00..26105699.81 > rows=750000 width=58) > -> Seq Scan on orders > (cost=0.00..26083268.81 rows=750000 width=25) > SubPlan > -> Aggregate > (cost=0.00..17.35 rows=1 width=8) > -> Group > (cost=0.00..17.34 rows=5 width=8) > -> Index Scan > using lineitem_pkey on lineitem (cost=0.00..17.33 rows=5 width=8) > -> Hash (cost=7056.00..7056.00 > rows=150000 width=33) > -> Seq Scan on customer > (cost=0.00..7056.00 rows=150000 width=33) > > again: > orders 1500000 tuples > lineitem 6000000 tuples there are 1 to 7 lineitems per orderkey > Customer 150000 tuples > > select attname,n_distinct,correlation from pg_stats where > tablename='lineitem'; > attname | n_distinct | correlation > ---------------+------------+------------- > orderkey | -0.199847 | 1 > partkey | 196448 | 0.0223377 > suppkey | 9658 | -0.00822751 > linenumber | 7 | 0.17274 > quantity | 50 | 0.0150153 > extendedprice | 25651 | -0.00790245 > discount | 11 | 0.103761 > tax | 9 | 0.0993771 > returnflag | 3 | 0.391434 > linestatus | 2 | 0.509791 > shipdate | 2440 | 0.0072777 > commitdate | 2497 | 0.00698162 > receiptdate | 2416 | 0.00726686 > shipinstruct | 4 | 0.241511 > shipmode | 7 | 0.138432 > comment | 275488 | 0.0188006 > (16 rows) > > select attname,n_distinct,correlation from pg_stats where > tablename='orders'; > attname | n_distinct | correlation > ---------------+------------+------------- > orderkey | -1 | -0.999925 > custkey | 76309 | 0.00590596 > orderstatus | 3 | 0.451991 > totalprice | -1 | -0.00768806 > orderdate | 2431 | -0.0211354 > orderpriority | 5 | 0.182489 > clerk | 1009 | 0.00546939 > shippriority | 1 | 1 > comment | -0.750125 | -0.0123887 > > Customer > attname | n_distinct | correlation > ------------+------------+------------- > custkey | -1 | 1 > name | -1 | 1 > address | -1 | -0.00510274 > nationkey | 25 | 0.0170533 > phone | -1 | -0.0227816 > acctbal | -0.83444 | -0.00220958 > mktsegment | 5 | 0.205013 > comment | -1 | 0.0327827 > > This query takes 12 minutes to run and returns about 50 customers. > lineitem.quantity takes values from 1 to 50, so 300 per orderkey is very > restrictive > > May someone help on improving performance? > Again thanks in advance > Regards > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html >