Re: estimates for nested loop very wrong? - Mailing list pgsql-sql

From joostje@komputilo.org
Subject Re: estimates for nested loop very wrong?
Date
Msg-id 20030410194553.GB28537@co.uea.org
Whole thread Raw
In response to Re: estimates for nested loop very wrong?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: estimates for nested loop very wrong?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: estimates for nested loop very wrong?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: estimates for nested loop very wrong?  (Antti Haapala <antti.haapala@iki.fi>)
List pgsql-sql
Je 2003/04/10(4)/12:04, Tom Lane skribis:
> joostje@komputilo.org writes:
> > Unless I'm mistaken, pg_nstats.n_distinct should be (aproximately) the same as
> > count(distinct(id)), but it obviously isn't. Also the most_common_freqs
> > values are about a 100 times higher than in reality, and, even tough about
> > 900 values of id occur more often than 40 times, in the 'most_common_vals'
> > list are 7 (of the 10) vals that occur less than 40 times, and the real
> > top two isn't even represented.
> 
> Please try increasing the statistics target (see ALTER TABLE) for db.id, then
> re-analyze and see if the estimates get better.  The default setting is
> 10 --- try 20, 50, 100 to see what happens.

Well, the n_distinct estimates get better, but the cost estimates still
don't quite add up: `actual cost' is 23.24, cost estimate never gets
below 49930.

stat.targ n_distinct| correlation   cost estimate  5         1917   |     0.43189    3621794.92 10         1998   |
0.3909    3618363.33 20         4330   |   -0.247617    1981594.38 50         9708   |   0.0762642     975847.15100
  14604   |    0.030706     657631.41200        21855   |   0.0446929     204335.70500        39980   |  -0.0497829
121000.31
1000        29468   |   0.0366528      49930.08
1000        29453   |   0.0367673      49954.08 Table 1: various estimates as a function of statistical target
actualdistinct values: 42226   actual cost: varies from 5.0 to 27.8   
 
So, the planner still prefers the mergejoin and hashjoin plans, causing
the select to take tens of seconds (60 for the mergejoin, I beleve), wheras
the Nested Loop takes only 0.024 seconds:

For example, for the stat.targ=500 run:
=> explain analyse SELECT  id  from db, tmp0 WHERE valida AND poseda='uea'  AND tab='pers' AND tmp0.v0=id ;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..121000.31 rows=28184 width=39) (actual time=1.05..23.24 rows=415 loops=1) ->  Seq Scan on tmp0
(cost=0.00..20.00 rows=1000 width=32) (actual time=0.22..0.40 rows=29 loops=1) ->  Index Scan using db_id_idx on db
(cost=0.00..120.63rows=28 width=7) (actual time=0.27..0.75 rows=14 loops=29)
 
Total runtime: 23.92 msec

In the above example, tmp0 had 29 values, that correspond to 415 rows in table db.
Table db has 586157 rows.


The shown select statement is the one used to get all cost estimates
in table 1.

postgresql:   7.2.1 (debian release 3)



pgsql-sql by date:

Previous
From: Richard Huxton
Date:
Subject: Re: Trigger
Next
From: Tom Lane
Date:
Subject: Re: estimates for nested loop very wrong?