Thread: Re: [HACKERS] please help on query

Re: [HACKERS] please help on query

From
Jakub Ouhrabka
Date:
12 July 2002, 07:52:37
hi,

avoid subselect: create a temp table and use join...

CREATE TEMP TABLE tmp AS  SELECT   lineitem.orderkey  FROM   lineitem  WHERE   lineitem.orderkey=orders.orderkey  GROUP
BY  lineitem.orderkey HAVING   sum(lineitem.quantity)>300;
 

CREATE INDEX tmp_idx ON tmp (orderkey);
SELECT customer.name, customer.custkey, orders.orderkey, orders.orderdate, orders.totalprice,
sum(lineitem.quantity)FROMcustomer, orders, lineitem, tmpWHERE orders.orderkey=tmp.orderkey AND
customer.custkey=orders.custkeyAND orders.orderkey=lineitem.orderkeyGROUP BY customer.name, customer.custkey,
orders.orderkey,orders.orderdate, orders.totalpriceORDER BY orders.totalprice DESC, orders.orderdate;
 


may be the index is not necessary...

kuba


> I'm going to keep on asking about another query:
>
> SELECT
>  customer.name,
>  customer.custkey,
>  orders.orderkey,
>  orders.orderdate,
>  orders.totalprice,
>  sum(lineitem.quantity)
> FROM
>  customer,
>  orders,
>  lineitem
> WHERE
>  exists(
>   SELECT
>    lineitem.orderkey
>   FROM
>    lineitem
>   WHERE
>    lineitem.orderkey=orders.orderkey
>   GROUP BY
>    lineitem.orderkey HAVING
>    sum(lineitem.quantity)>300
>   )
>  AND customer.custkey=orders.custkey
>  AND orders.orderkey=lineitem.orderkey
> GROUP BY
>  customer.name,
>  customer.custkey,
>  orders.orderkey,
>  orders.orderdate,
>  orders.totalprice
>
> ORDER BY
>  orders.totalprice DESC,
>  orders.orderdate;
>
> NOTICE:  QUERY PLAN:
>
> Sort  (cost=26923941.97..26923941.97 rows=300061 width=66)
>   ->  Aggregate  (cost=26851634.86..26896644.05 rows=300061 width=66)
>         ->  Group  (cost=26851634.86..26889142.52 rows=3000612 width=66)
>               ->  Sort  (cost=26851634.86..26851634.86 rows=3000612
> width=66)
>                     ->  Hash Join  (cost=26107574.81..26457309.10
> rows=3000612 width=66)
>                           ->  Seq Scan on lineitem  (cost=0.00..222208.25
> rows=6001225 width=8)
>                           ->  Hash  (cost=26105699.81..26105699.81
> rows=750000 width=58)
>                                 ->  Hash Join  (cost=7431.00..26105699.81
> rows=750000 width=58)
>                                       ->  Seq Scan on orders
> (cost=0.00..26083268.81 rows=750000 width=25)
>                                             SubPlan
>                                               ->  Aggregate
> (cost=0.00..17.35 rows=1 width=8)
>                                                     ->  Group
> (cost=0.00..17.34 rows=5 width=8)
>                                                           ->  Index Scan
> using lineitem_pkey on lineitem  (cost=0.00..17.33 rows=5 width=8)
>                                       ->  Hash  (cost=7056.00..7056.00
> rows=150000 width=33)
>                                             ->  Seq Scan on customer
> (cost=0.00..7056.00 rows=150000 width=33)
>
> again:
> orders 1500000 tuples
> lineitem 6000000 tuples there are 1 to 7 lineitems per orderkey
> Customer 150000 tuples
>
> select attname,n_distinct,correlation from pg_stats where
> tablename='lineitem';
>     attname    | n_distinct | correlation
> ---------------+------------+-------------
>  orderkey      |  -0.199847 |           1
>  partkey       |     196448 |   0.0223377
>  suppkey       |       9658 | -0.00822751
>  linenumber    |          7 |     0.17274
>  quantity      |         50 |   0.0150153
>  extendedprice |      25651 | -0.00790245
>  discount      |         11 |    0.103761
>  tax           |          9 |   0.0993771
>  returnflag    |          3 |    0.391434
>  linestatus    |          2 |    0.509791
>  shipdate      |       2440 |   0.0072777
>  commitdate    |       2497 |  0.00698162
>  receiptdate   |       2416 |  0.00726686
>  shipinstruct  |          4 |    0.241511
>  shipmode      |          7 |    0.138432
>  comment       |     275488 |   0.0188006
> (16 rows)
>
> select attname,n_distinct,correlation from pg_stats where
> tablename='orders';
>     attname    | n_distinct | correlation
> ---------------+------------+-------------
>  orderkey      |         -1 |   -0.999925
>  custkey       |      76309 |  0.00590596
>  orderstatus   |          3 |    0.451991
>  totalprice    |         -1 | -0.00768806
>  orderdate     |       2431 |  -0.0211354
>  orderpriority |          5 |    0.182489
>  clerk         |       1009 |  0.00546939
>  shippriority  |          1 |           1
>  comment       |  -0.750125 |  -0.0123887
>
> Customer
>   attname   | n_distinct | correlation
> ------------+------------+-------------
>  custkey    |         -1 |           1
>  name       |         -1 |           1
>  address    |         -1 | -0.00510274
>  nationkey  |         25 |   0.0170533
>  phone      |         -1 |  -0.0227816
>  acctbal    |   -0.83444 | -0.00220958
>  mktsegment |          5 |    0.205013
>  comment    |         -1 |   0.0327827
>
> This query takes 12 minutes to run and returns about 50 customers.
> lineitem.quantity takes values from 1 to 50, so 300 per orderkey is very
> restrictive
>
> May someone help on improving performance?
> Again thanks in advance
> Regards
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>