Thread: please help on query
I can't improve performance on this query:
SELECT
supplier.name,
supplier.address
FROM
supplier,
nation
WHERE
supplier.suppkey IN(
SELECT
partsupp.suppkey
FROM
partsupp
WHERE
partsupp.partkey IN(
SELECT
part.partkey
FROM
part
WHERE
part.name like 'forest%'
)
AND partsupp.availqty>(
SELECT
0.5*(sum(lineitem.quantity)::FLOAT)
FROM
lineitem
WHERE
lineitem.partkey=partsupp.partkey
AND lineitem.suppkey=partsupp.partkey
AND lineitem.shipdate>=('1994-01-01')::DATE
AND lineitem.shipdate<(('1994-01-01')::DATE+('1 year')::INTERVAL)::DATE
)
)
AND supplier.nationkey=nation.nationkey
AND nation.name='CANADA'
ORDER BY
supplier.name;
supplier.name,
supplier.address
FROM
supplier,
nation
WHERE
supplier.suppkey IN(
SELECT
partsupp.suppkey
FROM
partsupp
WHERE
partsupp.partkey IN(
SELECT
part.partkey
FROM
part
WHERE
part.name like 'forest%'
)
AND partsupp.availqty>(
SELECT
0.5*(sum(lineitem.quantity)::FLOAT)
FROM
lineitem
WHERE
lineitem.partkey=partsupp.partkey
AND lineitem.suppkey=partsupp.partkey
AND lineitem.shipdate>=('1994-01-01')::DATE
AND lineitem.shipdate<(('1994-01-01')::DATE+('1 year')::INTERVAL)::DATE
)
)
AND supplier.nationkey=nation.nationkey
AND nation.name='CANADA'
ORDER BY
supplier.name;
explain results:
NOTICE: QUERY PLAN:
Sort (cost=2777810917708.17..2777810917708.17 rows=200 width=81)
-> Nested Loop (cost=0.00..2777810917700.53 rows=200 width=81)
-> Seq Scan on nation (cost=0.00..1.31 rows=1 width=4)
-> Index Scan using snation_index on supplier (cost=0.00..2777810917696.72 rows=200 width=77)
SubPlan
-> Materialize (cost=6944527291.72..6944527291.72 rows=133333 width=4)
-> Seq Scan on partsupp (cost=0.00..6944527291.72 rows=133333 width=4)
SubPlan
-> Materialize (cost=8561.00..8561.00 rows=1 width=4)
-> Seq Scan on part (cost=0.00..8561.00 rows=1 width=4)
-> Aggregate (cost=119.61..119.61 rows=1 width=4)
-> Index Scan using lineitem_index on lineitem (cost=0.00..119.61 rows=1 width=4)
-> Nested Loop (cost=0.00..2777810917700.53 rows=200 width=81)
-> Seq Scan on nation (cost=0.00..1.31 rows=1 width=4)
-> Index Scan using snation_index on supplier (cost=0.00..2777810917696.72 rows=200 width=77)
SubPlan
-> Materialize (cost=6944527291.72..6944527291.72 rows=133333 width=4)
-> Seq Scan on partsupp (cost=0.00..6944527291.72 rows=133333 width=4)
SubPlan
-> Materialize (cost=8561.00..8561.00 rows=1 width=4)
-> Seq Scan on part (cost=0.00..8561.00 rows=1 width=4)
-> Aggregate (cost=119.61..119.61 rows=1 width=4)
-> Index Scan using lineitem_index on lineitem (cost=0.00..119.61 rows=1 width=4)
partsupp::800000 tuples
Table "partsupp"
Column | Type | Modifiers
------------+----------------+-----------
partkey | integer | not null
suppkey | integer | not null
availqty | integer |
supplycost | numeric(10,2) |
comment | character(199) |
Primary key: partsupp_pkey
Triggers: RI_ConstraintTrigger_16597,
RI_ConstraintTrigger_16603
Column | Type | Modifiers
------------+----------------+-----------
partkey | integer | not null
suppkey | integer | not null
availqty | integer |
supplycost | numeric(10,2) |
comment | character(199) |
Primary key: partsupp_pkey
Triggers: RI_ConstraintTrigger_16597,
RI_ConstraintTrigger_16603
tpch=# select attname,n_distinct,correlation from pg_stats where tablename='partsupp';
attname | n_distinct | correlation
------------+------------+-------------
partkey | -0.195588 | 1
suppkey | 9910 | 0.00868363
availqty | 9435 | -0.00788662
supplycost | -0.127722 | -0.0116864
comment | -1 | 0.0170702
attname | n_distinct | correlation
------------+------------+-------------
partkey | -0.195588 | 1
suppkey | 9910 | 0.00868363
availqty | 9435 | -0.00788662
supplycost | -0.127722 | -0.0116864
comment | -1 | 0.0170702
I accept query changes, reordering, indexes ideas and horizontal partitioning
thanks in advance.
Regards
On Thu, 2002-07-11 at 11:22, Luis Alberto Amigo Navarro wrote: > I can't improve performance on this query: Blame Canada! -- J. R. Nield jrnield@usol.com
On Thursday 11 July 2002 12:06, J. R. Nield wrote: > On Thu, 2002-07-11 at 11:22, Luis Alberto Amigo Navarro wrote: > > I can't improve performance on this query: > > Blame Canada! Whatever ... How's that silver medal down there in the states? ;-)
On Thu, 2002-07-11 at 17:22, Luis Alberto Amigo Navarro wrote: > I can't improve performance on this query: You could try rewriting the IN's into = joins or even use explicit INNER JOIN syntax to force certain plans with a select inside another and depending on value of partsupp.partkey it is really hard for optimiser to do anything else than to perform the query for each row. But it may help to rewrite SELECT partsupp.suppkey FROM partsupp WHERE partsupp.partkey IN ( SELECT part.partkey FROM part WHERE part.name like 'forest%' ) AND partsupp.availqty>( SELECT 0.5*(sum(lineitem.quantity)::FLOAT) FROM lineitem WHERE lineitem.partkey=partsupp.partkey AND lineitem.suppkey=partsupp.partkey AND lineitem.shipdate>=('1994-01-01')::DATE AND lineitem.shipdate<(('1994-01-01')::DATE+('1 year')::INTERVAL)::DATE ) ) into SELECT partsupp.suppkey FROM partsupp, (SELECT part.partkey as partkey FROM part WHERE part.name like 'forest%' ) fp, (SELECT 0.5*(sum(lineitem.quantity)::FLOAT) as halfsum, partkey FROM lineitem WHERE lineitem.partkey=partsupp.partkey AND lineitem.suppkey=partsupp.partkey AND lineitem.shipdate>=('1994-01-01')::DATE AND lineitem.shipdate<(('1994-01-01')::DATE+('1 year')::INTERVAL)::DATE ) li WHERE partsupp.partkey = fp.partkey AND partsupp.partkey = li.partkey AND partsupp.availqty> halfsum if "lineitem" is significantly smaller than "partsupp" But you really should tell us more, like how many lines does lineitem and other tables have, ---------- Hannu
On Thu, 2002-07-11 at 17:22, Luis Alberto Amigo Navarro wrote: > I can't improve performance on this query: You may also want to rewrite lineitem.shipdate<(('1994-01-01')::DATE+('1 year')::INTERVAL)::DATE into lineitem.shipdate<(('1995-01-01')::DATE if you can, as probably the optimiser will not recognize it else as a constant and won't use index on lineitem.shipdate. ---------------- Hannu
On Thu, 2002-07-11 at 17:22, Luis Alberto Amigo Navarro wrote: > I can't improve performance on this query: This _may_ work. SELECT supplier.name, supplier.address FROM supplier, nation,WHERE supplier.suppkey IN ( SELECT part.partkey FROM part WHERE part.name like 'forest%' INNER JOIN partsupp ON part.partkey=partsupp.partkey INNER JOIN ( SELECT 0.5*(sum(lineitem.quantity)::FLOAT) as halfsum FROM lineitem WHERE lineitem.partkey=partsupp.partkey AND shipdate >= '1994-01-01' AND shipdate < '1995-01-01' ) li ONpartsupp.availqty > halfsum ) AND supplier.nationkey=nation.nationkey AND nation.name='CANADA' ORDER BY supplier.name; --------------- Hannu