Thread: please help on query

please help on query

From
"Luis Alberto Amigo Navarro"
Date:
I can't improve performance on this query:
 
SELECT
 supplier.name,
 supplier.address
FROM
 supplier,
 nation
WHERE
 supplier.suppkey IN(
  SELECT
   partsupp.suppkey
  FROM
   partsupp
  WHERE
   partsupp.partkey IN(
    SELECT
     part.partkey
    FROM
     part
    WHERE
     part.name like 'forest%'
     )
   AND partsupp.availqty>(
    SELECT
     0.5*(sum(lineitem.quantity)::FLOAT)
    FROM
     lineitem
    WHERE
     lineitem.partkey=partsupp.partkey
     AND lineitem.suppkey=partsupp.partkey
     AND lineitem.shipdate>=('1994-01-01')::DATE
     AND lineitem.shipdate<(('1994-01-01')::DATE+('1 year')::INTERVAL)::DATE
     )
  )
 AND supplier.nationkey=nation.nationkey
 AND nation.name='CANADA'
ORDER BY
 supplier.name;
 
explain results:
NOTICE:  QUERY PLAN:
 
Sort  (cost=2777810917708.17..2777810917708.17 rows=200 width=81)
  ->  Nested Loop  (cost=0.00..2777810917700.53 rows=200 width=81)
        ->  Seq Scan on nation  (cost=0.00..1.31 rows=1 width=4)
        ->  Index Scan using snation_index on supplier  (cost=0.00..2777810917696.72 rows=200 width=77)
              SubPlan
                ->  Materialize  (cost=6944527291.72..6944527291.72 rows=133333 width=4)
                      ->  Seq Scan on partsupp  (cost=0.00..6944527291.72 rows=133333 width=4)
                            SubPlan
                              ->  Materialize  (cost=8561.00..8561.00 rows=1 width=4)
                                    ->  Seq Scan on part  (cost=0.00..8561.00 rows=1 width=4)
                              ->  Aggregate  (cost=119.61..119.61 rows=1 width=4)
                                    ->  Index Scan using lineitem_index on lineitem  (cost=0.00..119.61 rows=1 width=4)
partsupp::800000 tuples
            Table "partsupp"
   Column   |      Type      | Modifiers
------------+----------------+-----------
 partkey    | integer        | not null
 suppkey    | integer        | not null
 availqty   | integer        |
 supplycost | numeric(10,2)  |
 comment    | character(199) |
Primary key: partsupp_pkey
Triggers: RI_ConstraintTrigger_16597,
          RI_ConstraintTrigger_16603
tpch=# select attname,n_distinct,correlation from pg_stats where tablename='partsupp';
  attname   | n_distinct | correlation
------------+------------+-------------
 partkey    |  -0.195588 |           1
 suppkey    |       9910 |  0.00868363
 availqty   |       9435 | -0.00788662
 supplycost |  -0.127722 |  -0.0116864
 comment    |         -1 |   0.0170702
I accept query changes, reordering, indexes ideas and horizontal partitioning
thanks in advance.
Regards
 
 
 

Re: please help on query

From
"J. R. Nield"
Date:
On Thu, 2002-07-11 at 11:22, Luis Alberto Amigo Navarro wrote:
> I can't improve performance on this query:

Blame Canada!

-- 
J. R. Nield
jrnield@usol.com





Re: please help on query

From
Andy Kopciuch
Date:
On Thursday 11 July 2002 12:06, J. R. Nield wrote:
> On Thu, 2002-07-11 at 11:22, Luis Alberto Amigo Navarro wrote:
> > I can't improve performance on this query:
>
> Blame Canada!

Whatever ... 

How's that silver medal down there in the states?

;-)



Re: please help on query

From
Hannu Krosing
Date:
On Thu, 2002-07-11 at 17:22, Luis Alberto Amigo Navarro wrote:
> I can't improve performance on this query:

You could try rewriting the IN's into = joins
or even use explicit INNER JOIN syntax to force certain plans

with a select inside another and depending on value of partsupp.partkey
it is really hard for optimiser to do anything else than to perform the
query for each row.

But it may help to rewrite
 SELECT  partsupp.suppkey FROM  partsupp WHERE  partsupp.partkey IN (   SELECT    part.partkey   FROM    part   WHERE
part.name like 'forest%'    )  AND partsupp.availqty>(   SELECT    0.5*(sum(lineitem.quantity)::FLOAT)   FROM
lineitem  WHERE    lineitem.partkey=partsupp.partkey    AND lineitem.suppkey=partsupp.partkey    AND
lineitem.shipdate>=('1994-01-01')::DATE   AND lineitem.shipdate<(('1994-01-01')::DATE+('1
 
year')::INTERVAL)::DATE    ) )

into
 SELECT   partsupp.suppkey FROM  partsupp,  (SELECT part.partkey as partkey     FROM part    WHERE part.name like
'forest%'  ) fp,  (SELECT 0.5*(sum(lineitem.quantity)::FLOAT) as halfsum,          partkey     FROM lineitem   WHERE
lineitem.partkey=partsupp.partkey   AND lineitem.suppkey=partsupp.partkey    AND
lineitem.shipdate>=('1994-01-01')::DATE   AND lineitem.shipdate<(('1994-01-01')::DATE+('1
 
year')::INTERVAL)::DATE   ) li WHERE partsupp.partkey = fp.partkey   AND partsupp.partkey  = li.partkey   AND
partsupp.availqty> halfsum
 

if "lineitem" is significantly smaller than "partsupp"



But you really should tell us more, like how many lines does lineitem
and other tables have,  

----------
Hannu



Re: please help on query

From
Hannu Krosing
Date:
On Thu, 2002-07-11 at 17:22, Luis Alberto Amigo Navarro wrote:
> I can't improve performance on this query:

You may also want to rewrite

lineitem.shipdate<(('1994-01-01')::DATE+('1 year')::INTERVAL)::DATE

into

lineitem.shipdate<(('1995-01-01')::DATE

if you can, as probably the optimiser will not recognize it else as a
constant and won't use index on lineitem.shipdate.

----------------
Hannu




Re: please help on query

From
Hannu Krosing
Date:
On Thu, 2002-07-11 at 17:22, Luis Alberto Amigo Navarro wrote:
> I can't improve performance on this query:


This _may_ work.

SELECT   supplier.name,   supplier.address FROM   supplier,   nation,WHERE supplier.suppkey IN (   SELECT part.partkey
  FROM part    WHERE part.name like 'forest%'    INNER JOIN partsupp ON part.partkey=partsupp.partkey    INNER JOIN (
    SELECT 0.5*(sum(lineitem.quantity)::FLOAT) as halfsum          FROM lineitem         WHERE
lineitem.partkey=partsupp.partkey          AND shipdate >= '1994-01-01'           AND shipdate <  '1995-01-01'    ) li
ONpartsupp.availqty > halfsum ) AND supplier.nationkey=nation.nationkey AND nation.name='CANADA'
 
ORDER BY supplier.name;

---------------
Hannu