Thread: TPCH 100GB - need some help
Hi folks, I need some help in a TPCH 100GB benchmark. I described our settings in: http://archives.postgresql.org/pgsql-performance/2004-04/msg00377.php Some queries are taking to long to finish (4, 8, 9, 10, 19,20 and 22) and I need some help to increase the system performance. Here I put the query #19, the explain and the "top" for it. This query is running since yesterday 10 AM. Query text is: select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#32' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 2 and l_quantity <= 2 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#42' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 11 and l_quantity <= 11 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#54' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 27 and l_quantity <= 27 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); Tasks: 57 total, 2 running, 55 sleeping, 0 stopped, 0 zombie Cpu(s): 16.5% user, 1.8% system, 0.0% nice, 59.2% idle, 22.5% IO-wait Mem: 4036184k total, 4025008k used, 11176k free, 4868k buffers Swap: 4088500k total, 13204k used, 4075296k free, 3770208k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28118 postgres 25 0 372m 354m 335m R 99.4 9.0 1724:45 postmaster Aggregate (cost=6825900228313539.00..6825900228313539.00 rows=1 width=22) -> Nested Loop (cost=887411.00..6825900228313538.00 rows=325 width=22) -> Seq Scan on lineitem (cost=0.00..21797716.88 rows=600037888 width=79) -> Materialize (cost=887411.00..1263193.00 rows=20000000 width=36) -> Seq Scan on part (cost=0.00..711629.00 rows=20000000 width=36) __________________________________ Do you Yahoo!? SBC Yahoo! - Internet access at a great low price. http://promo.yahoo.com/sbc/
On Fri, 2004-05-14 at 14:00, Eduardo Almeida wrote: > Hi folks, > > I need some help in a TPCH 100GB benchmark. Performance with 7.5 is much improved over 7.4 for TPCH due to efforts of Tom Lane and OSDL. Give it a try with a recent snapshot of PostgreSQL. Otherwise, disable nested loops for that query. set enable_nestloop = off;
Eduardo Almeida <edalmeida@yahoo.com> writes: > I need some help in a TPCH 100GB benchmark. > Here I put the query #19, the explain and the "top" > for it. IIRC, this is one of the cases that inspired the work that's been done on the query optimizer for 7.5. I don't think you will be able to get 7.4 to generate a good plan for it (at least not without changing the query, which is against the TPC rules). How do you feel about running CVS tip? BTW, are you aware that OSDL has already done a good deal of work with running TPC benchmarks for Postgres (and some other OS databases)? regards, tom lane
Mr. Tom Lane --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Eduardo Almeida <edalmeida@yahoo.com> writes: > > I need some help in a TPCH 100GB benchmark. > > Here I put the query #19, the explain and the > "top" > > for it. > > IIRC, this is one of the cases that inspired the > work that's been done > on the query optimizer for 7.5. I don't think you > will be able to get > 7.4 to generate a good plan for it (at least not > without changing the > query, which is against the TPC rules). How do you > feel about running > CVS tip? We are testing the postgre 7.4.2 to show results to some projects here in Brazil. We are near the deadline for these projects and we need to show results with a stable version. ASAP I want and I will help the PG community testing the CVS with VLDB. > > BTW, are you aware that OSDL has already done a good > deal of work with > running TPC benchmarks for Postgres (and some other > OS databases)? No! Now I'm considering the use of OSDL because of query rewrite. Yesterday the query #19 that I describe runs in the OSDL way. We found some interesting patterns in queries that take to long to finish in the 100 GB test. � Sub-queries inside other sub-queries (Q20 and Q22); � Exists and Not exists selection (Q4, Q21 and Q22); � Aggregations with in-line views, that is queries inside FROM clause (Q7, Q8, Q9 and Q22); In fact these queries were aborted by timeout statement_timeout = 25000000 I took off the timeout to Q20 and it finished in 23:53:49 hs. tks a lot, Eduardo ps. sorry about my english > > regards, tom lane __________________________________ Do you Yahoo!? SBC Yahoo! - Internet access at a great low price. http://promo.yahoo.com/sbc/