Thread: excessive performance difference

excessive performance difference

From
Luis Amigo
Date:
We're running a query on psql7.2b3, in which planner is unable to find
correct way to solve.
This is only informative message
If we run query19.old it takes more than 7 hours with this plan

Aggregate  (cost=17310310.04..17310310.04 rows=1 width=116)
  ->  Nested Loop  (cost=0.00..17310310.00 rows=15 width=116)
        ->  Seq Scan on part  (cost=0.00..140.00 rows=2000 width=36)
        ->  Seq Scan on lineitem  (cost=0.00..2325.79 rows=60279
width=80)

If we use query19.sql it takes 6 secs and plan is

Nested Loop  (cost=9770.00..9770.02 rows=1 width=96) (actual
time=6538.94..6539.09 rows=1 loops=1)
  ->  Nested Loop  (cost=6513.40..6513.41 rows=1 width=64) (actual
time=4472.32..4472.41 rows=1 loops=1)
        ->  Subquery Scan resultado  (cost=3256.82..3256.82 rows=1
width=30) (actual time=2283.08..2283.11 rows=1 loops=1)
              ->  Aggregate  (cost=3256.82..3256.82 rows=1 width=30)
(actual time=2283.03..2283.04 rows=1 loops=1)
                    ->  Hash Join  (cost=175.00..3256.81 rows=1
width=30) (actual time=688.18..2282.70 rows=1 loops=1)
                          ->  Seq Scan on lineitem  (cost=0.00..3079.28
rows=506 width=26) (actual time=6.29..2217.74 rows=492 loops=1)
                          ->  Hash  (cost=175.00..175.00 rows=1 width=4)
(actual time=55.20..55.20 rows=0 loops=1)
                                ->  Seq Scan on part  (cost=0.00..175.00
rows=1 width=4) (actual time=42.12..55.11 rows=2 loops=1)
        ->  Subquery Scan resultado2  (cost=3256.59..3256.59 rows=1
width=30) (actual time=2189.15..2189.18 rows=1 loops=1)
              ->  Aggregate  (cost=3256.59..3256.59 rows=1 width=30)
(actual time=2189.10..2189.11 rows=1 loops=1)
                    ->  Hash Join  (cost=175.00..3256.58 rows=1
width=30) (actual time=2188.99..2188.99 rows=0 loops=1)
                          ->  Seq Scan on lineitem  (cost=0.00..3079.28
rows=460 width=26) (actual time=2.53..2124.05 rows=519 loops=1)
                          ->  Hash  (cost=175.00..175.00 rows=1 width=4)
(actual time=54.97..54.97 rows=0 loops=1)
                                ->  Seq Scan on part  (cost=0.00..175.00
rows=1 width=4) (actual time=21.05..54.85 rows=3 loops=1)
  ->  Subquery Scan resultado3  (cost=3256.59..3256.59 rows=1 width=30)
(actual time=2066.41..2066.44 rows=1 loops=1)
        ->  Aggregate  (cost=3256.59..3256.59 rows=1 width=30) (actual
time=2066.36..2066.37 rows=1 loops=1)
              ->  Hash Join  (cost=175.01..3256.59 rows=1 width=30)
(actual time=2066.24..2066.24 rows=0 loops=1)
                    ->  Seq Scan on lineitem  (cost=0.00..3079.28
rows=460 width=26) (actual time=2.25..2001.36 rows=483 loops=1)
                    ->  Hash  (cost=175.00..175.00 rows=2 width=4)
(actual time=55.48..55.48 rows=0 loops=1)
                          ->  Seq Scan on part  (cost=0.00..175.00
rows=2 width=4) (actual time=17.21..55.34 rows=3 loops=1)


explain
SELECT
    sum(lineitem.extendedprice*(1-lineitem.discount))AS revenue
FROM
    part,
    lineitem
WHERE
    (
        part.partkey=lineitem.partkey
        AND part.brand='Brand#12'
        AND part.container IN('SM CASE','SM BOX','SM PACK','SM PKG')
        AND lineitem.quantity>=1 AND lineitem.quantity<=1+10
        AND part.size BETWEEN 1 AND 5
        AND lineitem.shipmode IN ('AIR','AIR REG')
        AND lineitem.shipinstruct='DELIVER IN PERSON'
    )
    OR
    (
        part.partkey=lineitem.partkey
        AND part.brand='Brand#23'
        AND part.container IN('MED BAG','MED BOX','MED PACK','MED PKG')
        AND lineitem.quantity>=10 AND lineitem.quantity<=10+10
        AND part.size BETWEEN 1 AND 10
        AND lineitem.shipmode IN ('AIR','AIR REG')
        AND lineitem.shipinstruct='DELIVER IN PERSON'
    )
    OR
    (
        part.partkey=lineitem.partkey
        AND part.brand='Brand#34'
        AND part.container IN('LG CASE','LG BOX','LG PACK','LG PKG')
        AND lineitem.quantity>=20 AND lineitem.quantity<=20+10
        AND part.size BETWEEN 1 AND 15
        AND lineitem.shipmode IN ('AIR','AIR REG')
        AND lineitem.shipinstruct='DELIVER IN PERSON'
    );
SELECT
    (suma+suma1+suma2)AS revenue

FROM
    (SELECT
    CASE
    WHEN sum(extendedprice*(1-discount))>0 THEN sum(extendedprice*(1-discount))
    ELSE 0
    END AS suma
    FROM
        part,
        lineitem
    WHERE
    (
        part.partkey=lineitem.partkey
        AND part.brand='Brand#12'
        AND part.container IN('SM CASE','SM BOX','SM PACK','SM PKG')
        AND lineitem.quantity>=1 AND lineitem.quantity<=1+10
        AND part.size BETWEEN 1 AND 5
        AND lineitem.shipmode IN ('AIR','AIR REG')
        AND lineitem.shipinstruct='DELIVER IN PERSON'
    ))AS resultado,
    (SELECT
    CASE
    WHEN sum(extendedprice*(1-discount))>0 THEN sum(extendedprice*(1-discount))
    ELSE 0
    END AS suma1
    FROM
        part,
        lineitem
    WHERE
    (
        part.partkey=lineitem.partkey
        AND part.brand='Brand#23'
        AND part.container IN('MED BAG','MED BOX','MED PACK','MED PKG')
        AND lineitem.quantity>=10 AND lineitem.quantity<=20
        AND part.size BETWEEN 1 AND 10
        AND lineitem.shipmode IN ('AIR','AIR REG')
        AND lineitem.shipinstruct='DELIVER IN PERSON'
    ))AS resultado2,
    (SELECT
    CASE
    WHEN sum(extendedprice*(1-discount))>0 THEN sum(extendedprice*(1-discount))
    ELSE 0
    END AS suma2
    FROM
        part,
        lineitem
    WHERE
    (
        part.partkey=lineitem.partkey
        AND part.brand='Brand#34'
        AND part.container IN('LG CASE','LG BOX','LG PACK','LG PKG')
        AND lineitem.quantity>=20 AND lineitem.quantity<=30
        AND part.size BETWEEN 1 AND 15
        AND lineitem.shipmode IN ('AIR','AIR REG')
        AND lineitem.shipinstruct='DELIVER IN PERSON'
    ))AS resultado3;

Re: excessive performance difference

From
Masaru Sugawara
Date:
On Thu, 13 Dec 2001 13:35:16 +0100
Luis Amigo <lamigo@atc.unican.es> wrote:


> We're running a query on psql7.2b3, in which planner is unable to find
> correct way to solve.
> This is only informative message
> If we run query19.old it takes more than 7 hours with this plan
>
> Aggregate  (cost=17310310.04..17310310.04 rows=1 width=116)
>   ->  Nested Loop  (cost=0.00..17310310.00 rows=15 width=116)
>         ->  Seq Scan on part  (cost=0.00..140.00 rows=2000 width=36)
>         ->  Seq Scan on lineitem  (cost=0.00..2325.79 rows=60279
> width=80)
>
> If we use query19.sql it takes 6 secs and plan is
>
> Nested Loop  (cost=9770.00..9770.02 rows=1 width=96) (actual
> time=6538.94..6539.09 rows=1 loops=1)


I found that the query, query19.old, has no hints to run the planner
effectively.  I would think you need to change a bit like this:


explain
SELECT
    sum(lineitem.extendedprice*(1-lineitem.discount))AS revenue
FROM
    part,
    lineitem
WHERE
        part.partkey=lineitem.partkey               -- change
    AND
        ((
            part.brand='Brand#12'
        AND part.container IN('SM CASE','SM BOX','SM PACK','SM PKG')
        AND lineitem.quantity>=1 AND lineitem.quantity<=1+10
        AND part.size BETWEEN 1 AND 5
        AND lineitem.shipmode IN ('AIR','AIR REG')
        AND lineitem.shipinstruct='DELIVER IN PERSON'
    )
    OR
    (
            part.brand='Brand#23'
        AND part.container IN('MED BAG','MED BOX','MED PACK','MED PKG')
        AND lineitem.quantity>=10 AND lineitem.quantity<=10+10
        AND part.size BETWEEN 1 AND 10
        AND lineitem.shipmode IN ('AIR','AIR REG')
        AND lineitem.shipinstruct='DELIVER IN PERSON'
    )
    OR
    (
            part.brand='Brand#34'
        AND part.container IN('LG CASE','LG BOX','LG PACK','LG PKG')
        AND lineitem.quantity>=20 AND lineitem.quantity<=20+10
        AND part.size BETWEEN 1 AND 15
        AND lineitem.shipmode IN ('AIR','AIR REG')
        AND lineitem.shipinstruct='DELIVER IN PERSON'
    ));

NOTICE:  QUERY PLAN:
Aggregate  (cost=344.66..344.66 rows=1 width=72)
  ->  Merge Join  (cost=139.66..344.66 rows=1 width=72)
        ->  Sort  (cost=69.83..69.83 rows=1000 width=32)
              ->  Seq Scan on part  (cost=0.00..20.00 rows=1000 width=32)
        ->  Sort  (cost=69.83..69.83 rows=1000 width=40)
              ->  Seq Scan on lineitem  (cost=0.00..20.00 rows=1000 width=40)




AFAICT, you probably might use indices to remove the two sorts
from the previous result of the query plan.

  create index idx_part_partkey on part (partkey);
  create index idx_lineitem_partkey on lineitem (partkey);


NOTICE:  QUERY PLAN:
Aggregate  (cost=323.01..323.01 rows=1 width=72)
  ->  Merge Join  (cost=0.00..323.01 rows=1 width=72)
        ->  Index Scan using idx_part_partkey on part
                       (cost=0.00..59.00 rows=1000 width=32)
        ->  Index Scan using idx_lineitem_partkey on lineitem
                       (cost=0.00..59.00 rows=1000 width=40)



Regards,
Masaru Sugawara


Re: excessive performance difference

From
Luis Amigo
Date:
Masaru Sugawara wrote:

>
> I found that the query, query19.old, has no hints to run the planner
> effectively.  I would think you need to change a bit like this:

Thank you for your answer Masaru, the only thing I wanted to communicate is that
query19.old works perfectly with pay bases, and someone could get stucked with the
same problem. What you are getting outside is same condition that is inside of
all, partkey is primary key, so it is not neccesary to create index.
query19.sql gives better performance than yours, it is pseudo intra-paralellism,
if anyone is interested here are the results

tpch=# \e /disco3/lamigo/tpch/consultas/query19.old
NOTICE:  QUERY PLAN:

Aggregate  (cost=6073.03..6073.03 rows=1 width=116) (actual time=6865.81..6865.81
rows=1 loops=1)
  ->  Hash Join  (cost=145.00..6073.03 rows=1 width=116) (actual
time=2057.06..6865.58 rows=1 loops=1)
        ->  Seq Scan on lineitem  (cost=0.00..2326.05 rows=60305 width=80) (actual
time=0.38..3244.77 rows=60279 loops=1)
        ->  Hash  (cost=140.00..140.00 rows=2000 width=36) (actual
time=132.55..132.55 rows=0 loops=1)
              ->  Seq Scan on part  (cost=0.00..140.00 rows=2000 width=36) (actual
time=13.66..98.65 rows=2000 loops=1)
Total runtime: 6867.08 msec

EXPLAIN
tpch=# \e /disco3/lamigo/tpch/consultas/query19.sql
NOTICE:  QUERY PLAN:

Nested Loop  (cost=9771.17..9771.19 rows=1 width=96) (actual time=6539.58..6539.74
rows=1 loops=1)
  ->  Nested Loop  (cost=6513.97..6513.98 rows=1 width=64) (actual
time=4254.88..4254.98 rows=1 loops=1)
        ->  Subquery Scan resultado2  (cost=3257.16..3257.16 rows=1 width=30)
(actual time=2185.23..2185.27 rows=1 loops=1)
              ->  Aggregate  (cost=3257.16..3257.16 rows=1 width=30) (actual
time=2185.19..2185.20 rows=1 loops=1)
                    ->  Hash Join  (cost=175.00..3257.16 rows=1 width=30) (actual
time=2185.07..2185.07 rows=0 loops=1)
                          ->  Seq Scan on lineitem  (cost=0.00..3079.86 rows=458
width=26) (actual time=2.50..2119.31 rows=519 loops=1)
                          ->  Hash  (cost=175.00..175.00 rows=1 width=4) (actual
time=55.40..55.40 rows=0 loops=1)
                                ->  Seq Scan on part  (cost=0.00..175.00 rows=1
width=4) (actual time=21.11..55.27 rows=3 loops=1)
        ->  Subquery Scan resultado3  (cost=3256.81..3256.81 rows=1 width=30)
(actual time=2069.55..2069.59 rows=1 loops=1)
              ->  Aggregate  (cost=3256.81..3256.81 rows=1 width=30) (actual
time=2069.51..2069.51 rows=1 loops=1)
                    ->  Hash Join  (cost=175.01..3256.81 rows=1 width=30) (actual
time=2069.39..2069.39 rows=0 loops=1)
                          ->  Seq Scan on lineitem  (cost=0.00..3079.86 rows=387
width=26) (actual time=2.24..2004.59 rows=483 loops=1)
                          ->  Hash  (cost=175.00..175.00 rows=2 width=4) (actual
time=55.00..55.00 rows=0 loops=1)
                                ->  Seq Scan on part  (cost=0.00..175.00 rows=2
width=4) (actual time=17.20..54.87 rows=3 loops=1)
  ->  Subquery Scan resultado  (cost=3257.19..3257.19 rows=1 width=30) (actual
time=2284.52..2284.55 rows=1 loops=1)
        ->  Aggregate  (cost=3257.19..3257.19 rows=1 width=30) (actual
time=2284.47..2284.48 rows=1 loops=1)
              ->  Hash Join  (cost=175.00..3257.19 rows=1 width=30) (actual
time=688.98..2284.13 rows=1 loops=1)
                    ->  Seq Scan on lineitem  (cost=0.00..3079.86 rows=464
width=26) (actual time=6.31..2219.45 rows=492 loops=1)
                    ->  Hash  (cost=175.00..175.00 rows=1 width=4) (actual
time=54.99..54.99 rows=0 loops=1)
                          ->  Seq Scan on part  (cost=0.00..175.00 rows=1 width=4)
(actual time=42.02..54.89 rows=2 loops=1)
Total runtime: 6541.95 msec

EXPLAIN
tpch=# create index index_partkey on lineitem (partkey int4_ops);
CREATE
tpch=# analyze;
ANALYZE
tpch=# \e /disco3/lamigo/tpch/consultas/query19.old
ANALYZE
tpch=# \i /disco3/lamigo/tpch/consultas/query19.old
psql:/disco3/lamigo/tpch/consultas/query19.old:35: NOTICE:  QUERY PLAN:

Aggregate  (cost=6183.50..6183.50 rows=1 width=116) (actual time=6816.22..6816.22
rows=1 loops=1)
  ->  Hash Join  (cost=145.00..6183.50 rows=1 width=116) (actual
time=2013.52..6815.98 rows=1 loops=1)
        ->  Seq Scan on lineitem  (cost=0.00..2326.05 rows=60305 width=80) (actual
time=0.38..3220.05 rows=60279 loops=1)
        ->  Hash  (cost=140.00..140.00 rows=2000 width=36) (actual
time=108.44..108.44 rows=0 loops=1)
              ->  Seq Scan on part  (cost=0.00..140.00 rows=2000 width=36) (actual
time=1.02..76.31 rows=2000 loops=1)
Total runtime: 6817.15 msec

Thank you and regards.
Luis Amigo
Universidad de Cantabria