On Thu, 10 Apr 2003 joostje@komputilo.org wrote:
> Je 2003/04/10(4)/12:04, Tom Lane skribis:
> > joostje@komputilo.org writes:
> > > Unless I'm mistaken, pg_nstats.n_distinct should be (aproximately) the same as
> > > count(distinct(id)), but it obviously isn't. Also the most_common_freqs
> > > values are about a 100 times higher than in reality, and, even tough about
> > > 900 values of id occur more often than 40 times, in the 'most_common_vals'
> > > list are 7 (of the 10) vals that occur less than 40 times, and the real
> > > top two isn't even represented.
> >
> > Please try increasing the statistics target (see ALTER TABLE) for db.id, then
> > re-analyze and see if the estimates get better. The default setting is
> > 10 --- try 20, 50, 100 to see what happens.
>
> Well, the n_distinct estimates get better, but the cost estimates still
> don't quite add up: `actual cost' is 23.24, cost estimate never gets
> below 49930.
>
> stat.targ n_distinct| correlation cost estimate
> 5 1917 | 0.43189 3621794.92
> 10 1998 | 0.3909 3618363.33
> 20 4330 | -0.247617 1981594.38
> 50 9708 | 0.0762642 975847.15
> 100 14604 | 0.030706 657631.41
> 200 21855 | 0.0446929 204335.70
> 500 39980 | -0.0497829 121000.31
> 1000 29468 | 0.0366528 49930.08
> 1000 29453 | 0.0367673 49954.08
> Table 1: various estimates as a function of statistical target
> actual distinct values: 42226
> actual cost: varies from 5.0 to 27.8
>
> So, the planner still prefers the mergejoin and hashjoin plans, causing
> the select to take tens of seconds (60 for the mergejoin, I beleve), wheras
> the Nested Loop takes only 0.024 seconds:
>
> For example, for the stat.targ=500 run:
> => explain analyse SELECT id from db, tmp0 WHERE valida AND poseda='uea' AND tab='pers' AND tmp0.v0=id ;
> NOTICE: QUERY PLAN:
>
> Nested Loop (cost=0.00..121000.31 rows=28184 width=39) (actual time=1.05..23.24 rows=415 loops=1)
> -> Seq Scan on tmp0 (cost=0.00..20.00 rows=1000 width=32) (actual time=0.22..0.40 rows=29 loops=1)
> -> Index Scan using db_id_idx on db (cost=0.00..120.63 rows=28 width=7) (actual time=0.27..0.75 rows=14
loops=29)
> Total runtime: 23.92 msec
>
> In the above example, tmp0 had 29 values, that correspond to 415 rows in table db.
> Table db has 586157 rows.
Just a thought: is tmp0 analyzed? (1000 rows vs 29). :) Wouldn't this
divide cost estimate by at least 30? (ok... it's still high...).
--
Antti Haapala