Re: TPC-R benchmarks - Mailing list pgsql-performance
From | Josh Berkus |
---|---|
Subject | Re: TPC-R benchmarks |
Date | |
Msg-id | 200310071659.17863.josh@agliodbs.com Whole thread Raw |
In response to | TPC-R benchmarks (Oleg Lebedev <oleg.lebedev@waterford.org>) |
Responses |
Re: TPC-R benchmarks
|
List | pgsql-performance |
Tom, I've found the problem with TPC-R query #19. And it, unfortunately, appears to be a problem in the PostgreSQL query planner. To sum up the below: it appears that whenever a set of WHERE conditions exceeds a certain level of complexity, the planner just ignores all applicable indexes and goes for a seq scan. While this may be unavoidable to some degree, it seems to me that we need to raise the threshold of complexity at which it does this. tpcr=# select version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3 20030226 (prerelease) (SuSE Linux) (1 row) I've tested a number of indexes on the query, and found the two most efficient on subsets of the query. Thus: explain analyze select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#33' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 8 and l_quantity <= 8 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=10380.70..10380.70 rows=1 width=30) (actual time=161.61..161.61 rows=1 loops=1) -> Nested Loop (cost=0.00..10380.67 rows=13 width=30) (actual time=81.54..161.47 rows=17 loops=1) -> Index Scan using idx_part_1 on part (cost=0.00..9466.33 rows=62 width=4) (actual time=81.21..137.24 rows=98 loops=1) Index Cond: (p_brand = 'Brand#33'::bpchar) Filter: (((p_container = 'SM CASE'::bpchar) OR (p_container = 'SM BOX'::bpchar) OR (p_container = 'SM PACK'::bpchar) OR (p_container = 'SM PKG'::bpchar)) AND (p_size >= 1) AND (p_size <= 5)) -> Index Scan using idx_lineitem_3 on lineitem (cost=0.00..14.84 rows=1 width=26) (actual time=0.22..0.24 rows=0 loops=98) Index Cond: (("outer".p_partkey = lineitem.l_partkey) AND (lineitem.l_quantity >= 8::numeric) AND (lineitem.l_quantity <= 18::numeric)) Filter: (((l_shipmode = 'AIR'::bpchar) OR (l_shipmode = 'AIR REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) Total runtime: 161.71 msec This also works for a similar query: explain analyze select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#52' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 14 and l_quantity <= 14 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); Aggregate (cost=11449.36..11449.36 rows=1 width=30) (actual time=195.72..195.72 rows=1 loops=1) -> Nested Loop (cost=0.00..11449.29 rows=28 width=30) (actual time=56.42..195.39 rows=48 loops=1) -> Index Scan using idx_part_1 on part (cost=0.00..9466.33 rows=139 width=4) (actual time=56.15..153.17 rows=166 loops=1) Index Cond: (p_brand = 'Brand#52'::bpchar) Filter: (((p_container = 'MED BAG'::bpchar) OR (p_container = 'MED BOX'::bpchar) OR (p_container = 'MED PKG'::bpchar) OR (p_container = 'MED PACK'::bpchar)) AND (p_size >= 1) AND (p_size <= 10)) -> Index Scan using idx_lineitem_3 on lineitem (cost=0.00..14.29 rows=1 width=26) (actual time=0.23..0.25 rows=0 loops=166) Index Cond: (("outer".p_partkey = lineitem.l_partkey) AND (lineitem.l_quantity >= 14::numeric) AND (lineitem.l_quantity <= 24::numeric)) Filter: (((l_shipmode = 'AIR'::bpchar) OR (l_shipmode = 'AIR REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) Total runtime: 195.82 msec (9 rows) If, however, I combine the two where clauses with an OR, the planner gets confused and insists on loading the entire tables into memory (even though I don't have that much memory): explain select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#33' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 8 and l_quantity <= 8 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#52' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 14 and l_quantity <= 14 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); Aggregate (cost=488301096525.25..488301096525.25 rows=1 width=146) -> Nested Loop (cost=0.00..488301096525.15 rows=42 width=146) Join Filter: ((("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'SM CASE'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM CASE'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'SM BOX'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM BOX'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'SM PACK'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM PACK'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'SM PKG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM PKG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'MED BAG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED BAG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'MED BOX'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED BOX'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'MED PKG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED PKG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND ("inner".p_container = 'MED PACK'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED PACK'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar))) -> Seq Scan on lineitem (cost=0.00..235620.15 rows=6001215 width=95) -> Seq Scan on part (cost=0.00..7367.00 rows=200000 width=51) You'll pardon me for not doing an "ANALYZE", but I didn't want to wait overnight. Manually disabling Seqscan and Nestloop did nothing to affect this query plan; neither did removing the aggregate. Tommorrow I will test 7.4 Beta 4. How can we fix this? -- -Josh Berkus Aglio Database Solutions San Francisco
pgsql-performance by date: