Re: TPC-R benchmarks - Mailing list pgsql-performance

From scott.marlowe
Subject Re: TPC-R benchmarks
Date
Msg-id Pine.LNX.4.33.0310021027500.24503-100000@css120.ihs.com
Whole thread Raw
In response to Re: TPC-R benchmarks  (Oleg Lebedev <oleg.lebedev@waterford.org>)
List pgsql-performance
Have you tried increasing the statistics target for those columns that are
getting bad estimates yet and then turning back on enable_nestloop and
rerunning analyze and seeing how the query does?

The idea being to try and get a good enough estimate of your statistics so
the planner stops using nestloops on its own rather than forcing it to
with enable_nestloop = false.

On Thu, 2 Oct 2003, Oleg Lebedev wrote:

> As Scott recommended, I did the following:
> # set enable_nestloop = false;
> # vacuum full analyze;
>
> After this I re-ran the query and its execution time went down from 2
> hours to 2 minutes. I attached the new query plan to this posting.
> Is there any way to optimize it even further?
> What should I do to make this query run fast without hurting the
> performance of the other queries?
> Thanks.
>
> Oleg
>
> -----Original Message-----
> From: scott.marlowe [mailto:scott.marlowe@ihs.com]
> Sent: Wednesday, October 01, 2003 4:00 PM
> To: Oleg Lebedev
> Cc: Josh Berkus; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] TPC-R benchmarks
>
>
> For troubleshooting, can you try it with "set enable_nestloop = false"
> and
> rerun the query and see how long it takes?
>
> It looks like the estimates of rows returned is WAY off (estimate is too
>
> low compared to what really comes back.)
>
> Also, you might try to alter the table.column to have a higher target on
>
> the rows p_partkey and ps_partkey and any others where the estimate is
> so
> far off of the reality.
>
> On Wed, 1 Oct 2003, Oleg Lebedev wrote:
>
> > All right, my query just finished running with EXPLAIN ANALYZE. I show
>
> > the plan below and also attached it as a file. Any ideas?
> >
> >    ->  Sort  (cost=54597.49..54597.50 rows=1 width=121) (actual
> > time=6674562.03..6674562.15 rows=175 loops=1)
> >          Sort Key: nation.n_name, date_part('year'::text,
> > orders.o_orderdate)
> >          ->  Aggregate  (cost=54597.45..54597.48 rows=1 width=121)
> > (actual time=6668919.41..6674522.48 rows=175 loops=1)
> >                ->  Group  (cost=54597.45..54597.47 rows=3 width=121)
> > (actual time=6668872.68..6672136.96 rows=348760 loops=1)
> >                      ->  Sort  (cost=54597.45..54597.46 rows=3
> > width=121) (actual time=6668872.65..6669499.95 rows=348760 loops=1)
> >                            Sort Key: nation.n_name,
> > date_part('year'::text, orders.o_orderdate)
> >                            ->  Hash Join  (cost=54596.00..54597.42
> > rows=3
> > width=121) (actual time=6632768.89..6650192.67 rows=348760 loops=1)
> >                                  Hash Cond: ("outer".n_nationkey =
> > "inner".s_nationkey)
> >                                  ->  Seq Scan on nation
> > (cost=0.00..1.25 rows=25 width=33) (actual time=6.75..7.13 rows=25
> > loops=1)
> >                                  ->  Hash  (cost=54596.00..54596.00
> > rows=3
> > width=88) (actual time=6632671.96..6632671.96 rows=0 loops=1)
> >                                        ->  Nested Loop
> > (cost=0.00..54596.00 rows=3 width=88) (actual time=482.41..6630601.46
> > rows=348760 loops=1)
> >                                              Join Filter:
> > ("inner".s_suppkey = "outer".l_suppkey)
> >                                              ->  Nested Loop
> > (cost=0.00..54586.18 rows=3 width=80) (actual time=383.87..6594984.40
> > rows=348760 loops=1)
> >                                                    ->  Nested Loop
> > (cost=0.00..54575.47 rows=4 width=68) (actual time=199.95..3580882.07
> > rows=348760 loops=1)
> >                                                          Join Filter:
> > ("outer".p_partkey = "inner".ps_partkey)
> >                                                          ->  Nested
> > Loop (cost=0.00..22753.33 rows=9343 width=49) (actual
> > time=146.85..3541433.10 rows=348760 loops=1)
> >                                                                ->  Seq
>
> > Scan on part  (cost=0.00..7868.00 rows=320 width=4) (actual
> > time=33.64..15651.90 rows=11637 loops=1)
> >
> > Filter: (p_name ~~ '%green%'::text)
> >                                                                ->
> > Index Scan using i_l_partkey on lineitem  (cost=0.00..46.15 rows=29
> > width=45) (actual time=10.71..302.67 rows=30 loops=11637)
> >
> > Index
> > Cond: ("outer".p_partkey = lineitem.l_partkey)
> >                                                          ->  Index
> > Scan using pk_partsupp on partsupp  (cost=0.00..3.39 rows=1 width=19)
> > (actual time=0.09..0.09 rows=1 loops=348760)
> >                                                                Index
> > Cond: ((partsupp.ps_partkey = "outer".l_partkey) AND
> > (partsupp.ps_suppkey =
> > "outer".l_suppkey))
> >                                                    ->  Index Scan
> > using pk_orders on orders  (cost=0.00..3.01 rows=1 width=12) (actual
> > time=8.62..8.62 rows=1 loops=348760)
> >                                                          Index Cond:
> > (orders.o_orderkey = "outer".l_orderkey)
> >                                              ->  Index Scan using
> > pk_supplier on supplier  (cost=0.00..3.01 rows=1 width=8) (actual
> > time=0.08..0.08 rows=1 loops=348760)
> >                                                    Index Cond:
> > ("outer".ps_suppkey = supplier.s_suppkey)  Total runtime: 6674724.23
> > msec (28 rows)
> >
> >
> > -----Original Message-----
> > From: Oleg Lebedev
> > Sent: Wednesday, October 01, 2003 12:00 PM
> > To: Josh Berkus; scott.marlowe
> > Cc: pgsql-performance@postgresql.org
> > Subject: Re: [PERFORM] TPC-R benchmarks
> > Importance: Low
> >
> >
> > Sure, below is the query. I attached the plan to this posting.
> >
> > select
> >     nation,
> >     o_year,
> >     sum(amount) as sum_profit
> > from
> >     (
> >         select
> >             n_name as nation,
> >             extract(year from o_orderdate) as o_year,
> >             l_extendedprice * (1 - l_discount) -
> > ps_supplycost * l_quantity as amount
> >         from
> >             part,
> >             supplier,
> >             lineitem,
> >             partsupp,
> >             orders,
> >             nation
> >         where
> >             s_suppkey = l_suppkey
> >             and ps_suppkey = l_suppkey
> >             and ps_partkey = l_partkey
> >             and p_partkey = l_partkey
> >             and o_orderkey = l_orderkey
> >             and s_nationkey = n_nationkey
> >             and p_name like '%green%'
> >     ) as profit
> > group by
> >     nation,
> >     o_year
> > order by
> >     nation,
> >     o_year desc;
> >
> >
> > -----Original Message-----
> > From: Josh Berkus [mailto:josh@agliodbs.com]
> > Sent: Wednesday, October 01, 2003 11:42 AM
> > To: Oleg Lebedev; scott.marlowe
> > Cc: pgsql-performance@postgresql.org
> > Subject: Re: [PERFORM] TPC-R benchmarks
> >
> >
> > Oleg,
> >
> > > The output of the query should contain about 200 rows. So, I guess
> > > the
> >
> > > planer is off assuming that the query should return 1 row.
> >
> > Oh, also did you post the query before?   Can you re-post it with the
> > planner
> > results?
> >
> >
>
> *************************************
>
> This e-mail may contain privileged or confidential material intended for the named recipient only.
> If you are not the named recipient, delete this message and all attachments.
> Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited.
> We reserve the right to monitor e-mail sent through our network.
>
> *************************************
>


pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: TPC-R benchmarks
Next
From: "Rong Wu"
Date:
Subject: low cardinality column