Re: [HACKERS] please help on query - Mailing list pgsql-sql

From Luis Alberto Amigo Navarro
Subject Re: [HACKERS] please help on query
Date
Msg-id 019201c229b9$61acfa50$cab990c1@atc.unican.es
Whole thread Raw
In response to Re: [HACKERS] please help on query  (Jakub Ouhrabka <jakub.ouhrabka@comgate.cz>)
Responses Re: [HACKERS] please help on query  (Masaru Sugawara <rk73@sea.plala.or.jp>)
List pgsql-sql
Lineitem is being modified on run time, so creating a temp table don't
solves my problem
The time of creating this table is the same of performing the subselect (or
so I think), it could be done creating a new table, and a new trigger, but
there are already triggers to calculate
lineitem.extendedprice=part.retailprice*lineitem.quantity*(1+taxes)*(1-disco
unt) and to calculate orderstatus in order with linestatus and to calculate
orders.totalprice as sum(extendedprice) where
lineitem.orderkey=new.orderkey. A new trigger in order to insert orderkey if
sum(quantity) where orderkey=new.orderkey might be excessive.
Any other idea?
Thanks And Regards

----- Original Message -----
From: "Jakub Ouhrabka" <jakub.ouhrabka@comgate.cz>
To: "Luis Alberto Amigo Navarro" <lamigo@atc.unican.es>
Cc: "Manfred Koizar" <mkoi-pg@aon.at>; <pgsql-sql@postgresql.org>
Sent: Friday, July 12, 2002 1:50 PM
Subject: Re: [SQL] [HACKERS] please help on query


> hi,
>
> avoid subselect: create a temp table and use join...
>
> CREATE TEMP TABLE tmp AS
>    SELECT
>     lineitem.orderkey
>    FROM
>     lineitem
>    WHERE
>     lineitem.orderkey=orders.orderkey
>    GROUP BY
>     lineitem.orderkey HAVING
>     sum(lineitem.quantity)>300;
>
> CREATE INDEX tmp_idx ON tmp (orderkey);
>
>  SELECT
>   customer.name,
>   customer.custkey,
>   orders.orderkey,
>   orders.orderdate,
>   orders.totalprice,
>   sum(lineitem.quantity)
>  FROM
>   customer,
>   orders,
>   lineitem,
>   tmp
>  WHERE
>   orders.orderkey=tmp.orderkey
>   AND customer.custkey=orders.custkey
>   AND orders.orderkey=lineitem.orderkey
>  GROUP BY
>   customer.name,
>   customer.custkey,
>   orders.orderkey,
>   orders.orderdate,
>   orders.totalprice
>  ORDER BY
>   orders.totalprice DESC,
>   orders.orderdate;
>
>
> may be the index is not necessary...
>
> kuba
>
>
> > I'm going to keep on asking about another query:
> >
> > SELECT
> >  customer.name,
> >  customer.custkey,
> >  orders.orderkey,
> >  orders.orderdate,
> >  orders.totalprice,
> >  sum(lineitem.quantity)
> > FROM
> >  customer,
> >  orders,
> >  lineitem
> > WHERE
> >  exists(
> >   SELECT
> >    lineitem.orderkey
> >   FROM
> >    lineitem
> >   WHERE
> >    lineitem.orderkey=orders.orderkey
> >   GROUP BY
> >    lineitem.orderkey HAVING
> >    sum(lineitem.quantity)>300
> >   )
> >  AND customer.custkey=orders.custkey
> >  AND orders.orderkey=lineitem.orderkey
> > GROUP BY
> >  customer.name,
> >  customer.custkey,
> >  orders.orderkey,
> >  orders.orderdate,
> >  orders.totalprice
> >
> > ORDER BY
> >  orders.totalprice DESC,
> >  orders.orderdate;
> >
> > NOTICE:  QUERY PLAN:
> >
> > Sort  (cost=26923941.97..26923941.97 rows=300061 width=66)
> >   ->  Aggregate  (cost=26851634.86..26896644.05 rows=300061 width=66)
> >         ->  Group  (cost=26851634.86..26889142.52 rows=3000612 width=66)
> >               ->  Sort  (cost=26851634.86..26851634.86 rows=3000612
> > width=66)
> >                     ->  Hash Join  (cost=26107574.81..26457309.10
> > rows=3000612 width=66)
> >                           ->  Seq Scan on lineitem
(cost=0.00..222208.25
> > rows=6001225 width=8)
> >                           ->  Hash  (cost=26105699.81..26105699.81
> > rows=750000 width=58)
> >                                 ->  Hash Join
(cost=7431.00..26105699.81
> > rows=750000 width=58)
> >                                       ->  Seq Scan on orders
> > (cost=0.00..26083268.81 rows=750000 width=25)
> >                                             SubPlan
> >                                               ->  Aggregate
> > (cost=0.00..17.35 rows=1 width=8)
> >                                                     ->  Group
> > (cost=0.00..17.34 rows=5 width=8)
> >                                                           ->  Index Scan
> > using lineitem_pkey on lineitem  (cost=0.00..17.33 rows=5 width=8)
> >                                       ->  Hash  (cost=7056.00..7056.00
> > rows=150000 width=33)
> >                                             ->  Seq Scan on customer
> > (cost=0.00..7056.00 rows=150000 width=33)
> >
> > again:
> > orders 1500000 tuples
> > lineitem 6000000 tuples there are 1 to 7 lineitems per orderkey
> > Customer 150000 tuples
> >
> > select attname,n_distinct,correlation from pg_stats where
> > tablename='lineitem';
> >     attname    | n_distinct | correlation
> > ---------------+------------+-------------
> >  orderkey      |  -0.199847 |           1
> >  partkey       |     196448 |   0.0223377
> >  suppkey       |       9658 | -0.00822751
> >  linenumber    |          7 |     0.17274
> >  quantity      |         50 |   0.0150153
> >  extendedprice |      25651 | -0.00790245
> >  discount      |         11 |    0.103761
> >  tax           |          9 |   0.0993771
> >  returnflag    |          3 |    0.391434
> >  linestatus    |          2 |    0.509791
> >  shipdate      |       2440 |   0.0072777
> >  commitdate    |       2497 |  0.00698162
> >  receiptdate   |       2416 |  0.00726686
> >  shipinstruct  |          4 |    0.241511
> >  shipmode      |          7 |    0.138432
> >  comment       |     275488 |   0.0188006
> > (16 rows)
> >
> > select attname,n_distinct,correlation from pg_stats where
> > tablename='orders';
> >     attname    | n_distinct | correlation
> > ---------------+------------+-------------
> >  orderkey      |         -1 |   -0.999925
> >  custkey       |      76309 |  0.00590596
> >  orderstatus   |          3 |    0.451991
> >  totalprice    |         -1 | -0.00768806
> >  orderdate     |       2431 |  -0.0211354
> >  orderpriority |          5 |    0.182489
> >  clerk         |       1009 |  0.00546939
> >  shippriority  |          1 |           1
> >  comment       |  -0.750125 |  -0.0123887
> >
> > Customer
> >   attname   | n_distinct | correlation
> > ------------+------------+-------------
> >  custkey    |         -1 |           1
> >  name       |         -1 |           1
> >  address    |         -1 | -0.00510274
> >  nationkey  |         25 |   0.0170533
> >  phone      |         -1 |  -0.0227816
> >  acctbal    |   -0.83444 | -0.00220958
> >  mktsegment |          5 |    0.205013
> >  comment    |         -1 |   0.0327827
> >
> > This query takes 12 minutes to run and returns about 50 customers.
> > lineitem.quantity takes values from 1 to 50, so 300 per orderkey is very
> > restrictive
> >
> > May someone help on improving performance?
> > Again thanks in advance
> > Regards
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: Have you checked our extensive FAQ?
> >
> > http://www.postgresql.org/users-lounge/docs/faq.html
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>




pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: rules / triggers on insert. why after?
Next
From: Jie Liang
Date:
Subject: Re: Please, HELP! Why is the query plan so wrong???