Re: [HACKERS] please help on query - Mailing list pgsql-sql

From Luis Alberto Amigo Navarro
Subject Re: [HACKERS] please help on query
Date
Msg-id 012101c22998$8bbe5670$cab990c1@atc.unican.es
Whole thread Raw
In response to Re: [HACKERS] please help on query  (Manfred Koizar <mkoi-pg@aon.at>)
Responses Re: [HACKERS] please help on query  (Jakub Ouhrabka <jakub.ouhrabka@comgate.cz>)
List pgsql-sql


> The cost is now only 1141741215.35 compared to 2777810917708.17
> before;  this is an improvement factor of more than 2000.  So what's
> your problem? ;-)
>
> Servus
>  Manfred
>
In fact planner is estimating incredibly badly, it took only 833msecs now
runs perfectly

I'm going to keep on asking about another query:

SELECTcustomer.name,customer.custkey,orders.orderkey,orders.orderdate,orders.totalprice,sum(lineitem.quantity)
FROMcustomer,orders,lineitem
WHEREexists( SELECT  lineitem.orderkey FROM  lineitem WHERE  lineitem.orderkey=orders.orderkey GROUP BY
lineitem.orderkeyHAVING  sum(lineitem.quantity)>300 )AND customer.custkey=orders.custkeyAND
orders.orderkey=lineitem.orderkey
GROUP BYcustomer.name,customer.custkey,orders.orderkey,orders.orderdate,orders.totalprice

ORDER BYorders.totalprice DESC,orders.orderdate;

NOTICE:  QUERY PLAN:

Sort  (cost=26923941.97..26923941.97 rows=300061 width=66) ->  Aggregate  (cost=26851634.86..26896644.05 rows=300061
width=66)      ->  Group  (cost=26851634.86..26889142.52 rows=3000612 width=66)             ->  Sort
(cost=26851634.86..26851634.86rows=3000612
 
width=66)                   ->  Hash Join  (cost=26107574.81..26457309.10
rows=3000612 width=66)                         ->  Seq Scan on lineitem  (cost=0.00..222208.25
rows=6001225 width=8)                         ->  Hash  (cost=26105699.81..26105699.81
rows=750000 width=58)                               ->  Hash Join  (cost=7431.00..26105699.81
rows=750000 width=58)                                     ->  Seq Scan on orders
(cost=0.00..26083268.81 rows=750000 width=25)                                           SubPlan
                   ->  Aggregate
 
(cost=0.00..17.35 rows=1 width=8)                                                   ->  Group
(cost=0.00..17.34 rows=5 width=8)                                                         ->  Index Scan
using lineitem_pkey on lineitem  (cost=0.00..17.33 rows=5 width=8)                                     ->  Hash
(cost=7056.00..7056.00
rows=150000 width=33)                                           ->  Seq Scan on customer
(cost=0.00..7056.00 rows=150000 width=33)

again:
orders 1500000 tuples
lineitem 6000000 tuples there are 1 to 7 lineitems per orderkey
Customer 150000 tuples

select attname,n_distinct,correlation from pg_stats where
tablename='lineitem';   attname    | n_distinct | correlation
---------------+------------+-------------orderkey      |  -0.199847 |           1partkey       |     196448 |
0.0223377suppkey      |       9658 | -0.00822751linenumber    |          7 |     0.17274quantity      |         50 |
0.0150153extendedprice|      25651 | -0.00790245discount      |         11 |    0.103761tax           |          9 |
0.0993771returnflag   |          3 |    0.391434linestatus    |          2 |    0.509791shipdate      |       2440 |
0.0072777commitdate   |       2497 |  0.00698162receiptdate   |       2416 |  0.00726686shipinstruct  |          4 |
0.241511shipmode     |          7 |    0.138432comment       |     275488 |   0.0188006
 
(16 rows)

select attname,n_distinct,correlation from pg_stats where
tablename='orders';   attname    | n_distinct | correlation
---------------+------------+-------------orderkey      |         -1 |   -0.999925custkey       |      76309 |
0.00590596orderstatus  |          3 |    0.451991totalprice    |         -1 | -0.00768806orderdate     |       2431 |
-0.0211354orderpriority|          5 |    0.182489clerk         |       1009 |  0.00546939shippriority  |          1 |
       1comment       |  -0.750125 |  -0.0123887
 

Customer attname   | n_distinct | correlation
------------+------------+-------------custkey    |         -1 |           1name       |         -1 |
1address   |         -1 | -0.00510274nationkey  |         25 |   0.0170533phone      |         -1 |  -0.0227816acctbal
 |   -0.83444 | -0.00220958mktsegment |          5 |    0.205013comment    |         -1 |   0.0327827
 

This query takes 12 minutes to run and returns about 50 customers.
lineitem.quantity takes values from 1 to 50, so 300 per orderkey is very
restrictive

May someone help on improving performance?
Again thanks in advance
Regards




pgsql-sql by date:

Previous
From: frederik nietzsche
Date:
Subject: Re: how to inherits the references...
Next
From: Jakub Ouhrabka
Date:
Subject: Re: [HACKERS] please help on query