Re: TPC-R benchmarks - Mailing list pgsql-performance

From Josh Berkus
Subject Re: TPC-R benchmarks
Date
Msg-id 200310071659.17863.josh@agliodbs.com
Whole thread Raw
In response to TPC-R benchmarks  (Oleg Lebedev <oleg.lebedev@waterford.org>)
Responses Re: TPC-R benchmarks
List pgsql-performance
Tom,

I've found the problem with TPC-R query #19.  And it, unfortunately, appears
to be a problem in the PostgreSQL query planner.

To sum up the below:  it appears that whenever a set of WHERE conditions
exceeds a certain level of complexity, the planner just ignores all
applicable indexes and goes for a seq scan.   While this may be unavoidable
to some degree, it seems to me that we need to raise the threshold of
complexity at which it does this.

tpcr=# select version();
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3 20030226
(prerelease) (SuSE Linux)
(1 row)

I've tested a number of indexes on the query, and found the two most efficient
on subsets of the query.  Thus:

explain analyze
select
    sum(l_extendedprice* (1 - l_discount)) as revenue
from
    lineitem,
    part
where
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#33'
        and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
        and l_quantity >= 8 and l_quantity <= 8 + 10
        and p_size between 1 and 5
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    );

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10380.70..10380.70 rows=1 width=30) (actual
time=161.61..161.61 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..10380.67 rows=13 width=30) (actual
time=81.54..161.47 rows=17 loops=1)
         ->  Index Scan using idx_part_1 on part  (cost=0.00..9466.33 rows=62
width=4) (actual time=81.21..137.24 rows=98 loops=1)
               Index Cond: (p_brand = 'Brand#33'::bpchar)
               Filter: (((p_container = 'SM CASE'::bpchar) OR (p_container =
'SM BOX'::bpchar) OR (p_container = 'SM PACK'::bpchar) OR (p_container = 'SM
PKG'::bpchar)) AND (p_size >= 1) AND (p_size <= 5))
         ->  Index Scan using idx_lineitem_3 on lineitem  (cost=0.00..14.84
rows=1 width=26) (actual time=0.22..0.24 rows=0 loops=98)
               Index Cond: (("outer".p_partkey = lineitem.l_partkey) AND
(lineitem.l_quantity >= 8::numeric) AND (lineitem.l_quantity <= 18::numeric))
               Filter: (((l_shipmode = 'AIR'::bpchar) OR (l_shipmode = 'AIR
REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN PERSON'::bpchar))
 Total runtime: 161.71 msec



This also works for a similar query:

explain analyze
select
    sum(l_extendedprice* (1 - l_discount)) as revenue
from
    lineitem,
    part
where
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#52'
        and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
        and l_quantity >= 14 and l_quantity <= 14 + 10
        and p_size between 1 and 10
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    );

 Aggregate  (cost=11449.36..11449.36 rows=1 width=30) (actual
time=195.72..195.72 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..11449.29 rows=28 width=30) (actual
time=56.42..195.39 rows=48 loops=1)
         ->  Index Scan using idx_part_1 on part  (cost=0.00..9466.33 rows=139
width=4) (actual time=56.15..153.17 rows=166 loops=1)
               Index Cond: (p_brand = 'Brand#52'::bpchar)
               Filter: (((p_container = 'MED BAG'::bpchar) OR (p_container =
'MED BOX'::bpchar) OR (p_container = 'MED PKG'::bpchar) OR (p_container =
'MED PACK'::bpchar)) AND (p_size >= 1) AND (p_size <= 10))
         ->  Index Scan using idx_lineitem_3 on lineitem  (cost=0.00..14.29
rows=1 width=26) (actual time=0.23..0.25 rows=0 loops=166)
               Index Cond: (("outer".p_partkey = lineitem.l_partkey) AND
(lineitem.l_quantity >= 14::numeric) AND (lineitem.l_quantity <=
24::numeric))
               Filter: (((l_shipmode = 'AIR'::bpchar) OR (l_shipmode = 'AIR
REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN PERSON'::bpchar))
 Total runtime: 195.82 msec
(9 rows)


If, however, I combine the two where clauses with an OR, the planner gets
confused and insists on loading the entire tables into memory (even though I
don't have that much memory):

explain
select
    sum(l_extendedprice* (1 - l_discount)) as revenue
from
    lineitem,
    part
where
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#33'
        and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
        and l_quantity >= 8 and l_quantity <= 8 + 10
        and p_size between 1 and 5
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#52'
        and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
        and l_quantity >= 14 and l_quantity <= 14 + 10
        and p_size between 1 and 10
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    );

 Aggregate  (cost=488301096525.25..488301096525.25 rows=1 width=146)
   ->  Nested Loop  (cost=0.00..488301096525.15 rows=42 width=146)
         Join Filter: ((("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'SM CASE'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND
("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM
CASE'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric)
AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND
("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'SM BOX'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND
("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM
BOX'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric)
AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND
("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'SM PACK'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND
("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM
PACK'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric)
AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND
("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'SM PKG'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND
("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM
PKG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric)
AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND
("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'MED BAG'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND
("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED
BAG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >=
14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >=
1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'MED BOX'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND
("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED
BOX'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >=
14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >=
1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'MED PKG'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND
("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED
PKG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >=
14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >=
1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'MED PACK'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND
("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED
PACK'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >=
14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >=
1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)))
         ->  Seq Scan on lineitem  (cost=0.00..235620.15 rows=6001215
width=95)
         ->  Seq Scan on part  (cost=0.00..7367.00 rows=200000 width=51)


You'll pardon me for not doing an "ANALYZE", but I didn't want to wait
overnight.   Manually disabling Seqscan and Nestloop did nothing to affect
this query plan; neither did removing the aggregate.

Tommorrow I will test 7.4 Beta 4.

How can we fix this?

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


pgsql-performance by date:

Previous
From: "Timothy D. Witham"
Date:
Subject: Re: TPC-R benchmarks
Next
From: Tom Lane
Date:
Subject: Re: TPC-R benchmarks