Re: estimates for nested loop very wrong? - Mailing list pgsql-sql

From joostje@komputilo.org
Subject Re: estimates for nested loop very wrong?
Date
Msg-id 20030410162902.GA28537@co.uea.org
Whole thread Raw
In response to Re: estimates for nested loop very wrong?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: estimates for nested loop very wrong?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-sql
Je 2003/04/10(4)/10:04, Tom Lane skribis:

> Have you done an ANALYZE or VACUUM ANALYZE recently?

Jes, both, actually, and the `analyse' quite a few times.

> > Nested Loop  (cost=0.00..208256.60 rows=61140 width=38) (actual time=0.92..18.49 rows=756 loops=1)

> The planner is evidently estimating that each row of tmp1 will match 2600+
> rows of db, whereas in reality there is only one match.  Rather than
> mess with enable_hashjoin, you need to find out why that estimate is so
> badly off.  Are the entries in tmp1 specially selected to correspond to
> unique rows of db?

Well, each entry in tmp1 matches with about 7-80 entries in db, but yes
the problem indeed seems to be that the estimate is so far off.
And no, the entries in tmp1 are not specially selected, they correspond
to `normal' values of id in db (values that are about as frequent as
other values).

I have done VACUUM ANALYSE on the table (and drop index; create index db_id_idx on db(id);).
=> analyse db;
=> select n_distinct from pg_stats where tablename='db' and attname='id';       1996
=> select count(distinct(id)) from db;      42225

Unless I'm mistaken, pg_nstats.n_distinct should be (aproximately) the same as
count(distinct(id)), but it obviously isn't. Also the most_common_freqs
values are about a 100 times higher than in reality, and, even tough about
900 values of id occur more often than 40 times, in the 'most_common_vals'
list are 7 (of the 10) vals that occur less than 40 times, and the real
top two isn't even represented.


(BTW, the table I'm using now is a little smaller, as it turned out that
a few (75%) of the entries in db had only 3 different id values. This
didn't have any effect on the accurateness of the estimates, though).

BTW,
=> select count(id) from db;    586035


Thanks!
joostje

-- what pg_stat thinks about db.id:

=> select n_distinct, most_common_vals, most_common_freqs from pg_stats where tablename='db' and
attname='id';n_distinct|                  most_common_vals                   |
most_common_freqs                                           
 

------------+-----------------------------------------------------+-------------------------------------------------------------------------------------------------------
    1907 | {subo,smys,raha,sjbo,sdai,roal,sooi,stsw,rmwi,snuw} |
{0.00733333,0.007,0.00633333,0.00633333,0.006,0.00566667,0.00566667,0.00566667,0.00533333,0.00533333}


--these estimates are far off:

=> select id, count (id)/586035.0 from db where id='subo' or id='smys' or id='raha' or id='sjbo' or id='sdai' or
id='roal'or id='sooi' or id='stsw' or id='rmwi' or id='snuw' group by id; id  |       ?column?       
 
------+----------------------raha | 0.000156987210661479rmwi | 3.24212717670446e-05roal |  6.3136160809508e-05sdai |
8.70255189536461e-05sjbo|  6.3136160809508e-05smys |  7.5080839881577e-05snuw | 4.26595681145324e-05sooi |
0.000114327642546947stsw| 6.14297780849267e-05subo | 5.11914817374389e-05
 


--and these would be the real most_common_freqs:

=> select id, count(id), count(id)/586035.0 from db group by id order by - count(id) limit 10;  id   | count |
?column?      
 
--------+-------+----------------------indmem |   194 | 0.000331038248568771hton   |    97 | 0.000165519124284386raha
|   92 | 0.000156987210661479simo   |    87 | 0.000148455297038573sugn   |    87 | 0.000148455297038573rjgl   |    85 |
0.00014504253158941hroy   |    84 | 0.000143336148864829jrgv   |    84 | 0.000143336148864829tojo   |    83 |
0.000141629766140248lucy  |    82 | 0.000139923383415666
 


-- the above all done after a 'vacuum analyse';



pgsql-sql by date:

Previous
From: Franco Bruno Borghesi
Date:
Subject: Re: Trigger
Next
From: Tom Lane
Date:
Subject: Re: estimates for nested loop very wrong?