Re: TPC-R benchmarks - Mailing list pgsql-performance
From | Oleg Lebedev |
---|---|
Subject | Re: TPC-R benchmarks |
Date | |
Msg-id | 993DBE5B4D02194382EC8DF8554A52731D7624@postoffice.waterford.org Whole thread Raw |
In response to | TPC-R benchmarks (Oleg Lebedev <oleg.lebedev@waterford.org>) |
Responses |
Re: TPC-R benchmarks
|
List | pgsql-performance |
As Scott recommended, I did the following: # set enable_nestloop = false; # vacuum full analyze; After this I re-ran the query and its execution time went down from 2 hours to 2 minutes. I attached the new query plan to this posting. Is there any way to optimize it even further? What should I do to make this query run fast without hurting the performance of the other queries? Thanks. Oleg -----Original Message----- From: scott.marlowe [mailto:scott.marlowe@ihs.com] Sent: Wednesday, October 01, 2003 4:00 PM To: Oleg Lebedev Cc: Josh Berkus; pgsql-performance@postgresql.org Subject: Re: [PERFORM] TPC-R benchmarks For troubleshooting, can you try it with "set enable_nestloop = false" and rerun the query and see how long it takes? It looks like the estimates of rows returned is WAY off (estimate is too low compared to what really comes back.) Also, you might try to alter the table.column to have a higher target on the rows p_partkey and ps_partkey and any others where the estimate is so far off of the reality. On Wed, 1 Oct 2003, Oleg Lebedev wrote: > All right, my query just finished running with EXPLAIN ANALYZE. I show > the plan below and also attached it as a file. Any ideas? > > -> Sort (cost=54597.49..54597.50 rows=1 width=121) (actual > time=6674562.03..6674562.15 rows=175 loops=1) > Sort Key: nation.n_name, date_part('year'::text, > orders.o_orderdate) > -> Aggregate (cost=54597.45..54597.48 rows=1 width=121) > (actual time=6668919.41..6674522.48 rows=175 loops=1) > -> Group (cost=54597.45..54597.47 rows=3 width=121) > (actual time=6668872.68..6672136.96 rows=348760 loops=1) > -> Sort (cost=54597.45..54597.46 rows=3 > width=121) (actual time=6668872.65..6669499.95 rows=348760 loops=1) > Sort Key: nation.n_name, > date_part('year'::text, orders.o_orderdate) > -> Hash Join (cost=54596.00..54597.42 > rows=3 > width=121) (actual time=6632768.89..6650192.67 rows=348760 loops=1) > Hash Cond: ("outer".n_nationkey = > "inner".s_nationkey) > -> Seq Scan on nation > (cost=0.00..1.25 rows=25 width=33) (actual time=6.75..7.13 rows=25 > loops=1) > -> Hash (cost=54596.00..54596.00 > rows=3 > width=88) (actual time=6632671.96..6632671.96 rows=0 loops=1) > -> Nested Loop > (cost=0.00..54596.00 rows=3 width=88) (actual time=482.41..6630601.46 > rows=348760 loops=1) > Join Filter: > ("inner".s_suppkey = "outer".l_suppkey) > -> Nested Loop > (cost=0.00..54586.18 rows=3 width=80) (actual time=383.87..6594984.40 > rows=348760 loops=1) > -> Nested Loop > (cost=0.00..54575.47 rows=4 width=68) (actual time=199.95..3580882.07 > rows=348760 loops=1) > Join Filter: > ("outer".p_partkey = "inner".ps_partkey) > -> Nested > Loop (cost=0.00..22753.33 rows=9343 width=49) (actual > time=146.85..3541433.10 rows=348760 loops=1) > -> Seq > Scan on part (cost=0.00..7868.00 rows=320 width=4) (actual > time=33.64..15651.90 rows=11637 loops=1) > > Filter: (p_name ~~ '%green%'::text) > -> > Index Scan using i_l_partkey on lineitem (cost=0.00..46.15 rows=29 > width=45) (actual time=10.71..302.67 rows=30 loops=11637) > > Index > Cond: ("outer".p_partkey = lineitem.l_partkey) > -> Index > Scan using pk_partsupp on partsupp (cost=0.00..3.39 rows=1 width=19) > (actual time=0.09..0.09 rows=1 loops=348760) > Index > Cond: ((partsupp.ps_partkey = "outer".l_partkey) AND > (partsupp.ps_suppkey = > "outer".l_suppkey)) > -> Index Scan > using pk_orders on orders (cost=0.00..3.01 rows=1 width=12) (actual > time=8.62..8.62 rows=1 loops=348760) > Index Cond: > (orders.o_orderkey = "outer".l_orderkey) > -> Index Scan using > pk_supplier on supplier (cost=0.00..3.01 rows=1 width=8) (actual > time=0.08..0.08 rows=1 loops=348760) > Index Cond: > ("outer".ps_suppkey = supplier.s_suppkey) Total runtime: 6674724.23 > msec (28 rows) > > > -----Original Message----- > From: Oleg Lebedev > Sent: Wednesday, October 01, 2003 12:00 PM > To: Josh Berkus; scott.marlowe > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] TPC-R benchmarks > Importance: Low > > > Sure, below is the query. I attached the plan to this posting. > > select > nation, > o_year, > sum(amount) as sum_profit > from > ( > select > n_name as nation, > extract(year from o_orderdate) as o_year, > l_extendedprice * (1 - l_discount) - > ps_supplycost * l_quantity as amount > from > part, > supplier, > lineitem, > partsupp, > orders, > nation > where > s_suppkey = l_suppkey > and ps_suppkey = l_suppkey > and ps_partkey = l_partkey > and p_partkey = l_partkey > and o_orderkey = l_orderkey > and s_nationkey = n_nationkey > and p_name like '%green%' > ) as profit > group by > nation, > o_year > order by > nation, > o_year desc; > > > -----Original Message----- > From: Josh Berkus [mailto:josh@agliodbs.com] > Sent: Wednesday, October 01, 2003 11:42 AM > To: Oleg Lebedev; scott.marlowe > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] TPC-R benchmarks > > > Oleg, > > > The output of the query should contain about 200 rows. So, I guess > > the > > > planer is off assuming that the query should return 1 row. > > Oh, also did you post the query before? Can you re-post it with the > planner > results? > > ************************************* This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *************************************
Attachment
pgsql-performance by date: