Re: estimates for nested loop very wrong? - Mailing list pgsql-sql

From Antti Haapala
Subject Re: estimates for nested loop very wrong?
Date
Msg-id Pine.GSO.4.44.0304102335170.10376-100000@paju.oulu.fi
Whole thread Raw
In response to Re: estimates for nested loop very wrong?  (joostje@komputilo.org)
List pgsql-sql
On Thu, 10 Apr 2003 joostje@komputilo.org wrote:

> Je 2003/04/10(4)/12:04, Tom Lane skribis:
> > joostje@komputilo.org writes:
> > > Unless I'm mistaken, pg_nstats.n_distinct should be (aproximately) the same as
> > > count(distinct(id)), but it obviously isn't. Also the most_common_freqs
> > > values are about a 100 times higher than in reality, and, even tough about
> > > 900 values of id occur more often than 40 times, in the 'most_common_vals'
> > > list are 7 (of the 10) vals that occur less than 40 times, and the real
> > > top two isn't even represented.
> >
> > Please try increasing the statistics target (see ALTER TABLE) for db.id, then
> > re-analyze and see if the estimates get better.  The default setting is
> > 10 --- try 20, 50, 100 to see what happens.
>
> Well, the n_distinct estimates get better, but the cost estimates still
> don't quite add up: `actual cost' is 23.24, cost estimate never gets
> below 49930.
>
> stat.targ n_distinct| correlation   cost estimate
>    5         1917   |     0.43189    3621794.92
>   10         1998   |      0.3909    3618363.33
>   20         4330   |   -0.247617    1981594.38
>   50         9708   |   0.0762642     975847.15
>  100        14604   |    0.030706     657631.41
>  200        21855   |   0.0446929     204335.70
>  500        39980   |  -0.0497829     121000.31
> 1000        29468   |   0.0366528      49930.08
> 1000        29453   |   0.0367673      49954.08
>   Table 1: various estimates as a function of statistical target
>            actual distinct values: 42226
>        actual cost: varies from 5.0 to 27.8
>
> So, the planner still prefers the mergejoin and hashjoin plans, causing
> the select to take tens of seconds (60 for the mergejoin, I beleve), wheras
> the Nested Loop takes only 0.024 seconds:
>
> For example, for the stat.targ=500 run:
> => explain analyse SELECT  id  from db, tmp0 WHERE valida AND poseda='uea'  AND tab='pers' AND tmp0.v0=id ;
> NOTICE:  QUERY PLAN:
>
> Nested Loop  (cost=0.00..121000.31 rows=28184 width=39) (actual time=1.05..23.24 rows=415 loops=1)
>   ->  Seq Scan on tmp0  (cost=0.00..20.00 rows=1000 width=32) (actual time=0.22..0.40 rows=29 loops=1)
>   ->  Index Scan using db_id_idx on db  (cost=0.00..120.63 rows=28 width=7) (actual time=0.27..0.75 rows=14
loops=29)
> Total runtime: 23.92 msec
>
> In the above example, tmp0 had 29 values, that correspond to 415 rows in table db.
> Table db has 586157 rows.

Just a thought: is tmp0 analyzed? (1000 rows vs 29). :) Wouldn't this
divide cost estimate by at least 30? (ok... it's still high...).


-- 
Antti Haapala



pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: estimates for nested loop very wrong?
Next
From: joostje@komputilo.org
Date:
Subject: Re: estimates for nested loop very wrong?