Home > mailing lists

Re: mis-estimation on data-warehouse aggregate creation - Mailing list pgsql-performance

From	Simon Riggs
Subject	Re: mis-estimation on data-warehouse aggregate creation
Date	November 16, 2004 21:33:28
Msg-id	1100640596.4113.5680.camel@localhost.localdomain Whole thread Raw
In response to	mis-estimation on data-warehouse aggregate creation (Kris Jurka <books@ejurka.com>)
Responses	Re: mis-estimation on data-warehouse aggregate creation
List	pgsql-performance

Tree view

On Tue, 2004-11-16 at 09:10, Kris Jurka wrote:
>  By rewriting the JOIN
> conditions to LEFT JOIN we force the planner to recognize that there will
> be a match for every row in the sales table:
>

You realise that returns a different answer (or at least it potentially
does, depending upon your data?

>                      ->  Hash Join  (cost=4.70..194.23 rows=1 width=12) (actual time=2.675..74.693 rows=3288 loops=1)
>                            Hash Cond: (("outer".monthnumber = "inner".monthnumber) AND ("outer".monthname =
"inner".monthname)AND ("outer"."year" = "inner"."year") AND ("outer".monthyear = "inner".monthyear) AND
("outer".quarter= "inner".quarter) AND ("outer".quarteryear = "inner".quarteryear)) 
>                            ->  Seq Scan on period  (cost=0.00..90.88 rows=3288 width=54) (actual time=0.118..12.126
rows=3288loops=1) 
>                            ->  Hash  (cost=3.08..3.08 rows=108 width=58) (actual time=1.658..1.658 rows=0 loops=1)
>                                  ->  Seq Scan on shd_month  (cost=0.00..3.08 rows=108 width=58) (actual
time=0.081..0.947rows=108 loops=1) 

ISTM your trouble starts here ^^^
estimate=1, but rows=3288

The join condition has so many ANDed predicates that we assume that this
will reduce the selectivity considerably. It does not, and so you pay
the cost dearly later on.

In both plans, the trouble starts at this point.

If you pre-build tables that have only a single join column between the
full.oldids and shrunken.renumberedids then this will most likely work
correctly, since the planner will be able to correctly estimate the join
selectivity. i.e. put product.id onto shd_productline ahead of time, so
you can avoid the complex join.

Setting join_collapse_limit lower doesn't look like it would help, since
the plan already shows joining the sub-queries together first.

--
Best Regards, Simon Riggs

pgsql-performance by date:

From: Oleg Bartunov
Date: 16 November 2004, 21:13:37
Subject: Re: Tsearch2 really slower than ilike ?

From: "David Parker"
Date: 17 November 2004, 03:55:18
Subject: query plan question

Re: mis-estimation on data-warehouse aggregate creation - Mailing list pgsql-performance

Previous

Next