Fernando Schapachnik <fpscha@ns1.via-net-works.net.ar> writes:
>> It'd be useful to double-check my theory that the system is
>> misestimating the selectivity of the WHERE (u.activa) clause.
>> You could try this:
>> SELECT count(*) FROM usarios WHERE activa;
> 10571
>> EXPLAIN SELECT count(*) FROM usarios WHERE activa;
>> and see how far off the row count estimate in the EXPLAIN is
>> from reality.
> NOTICE: QUERY PLAN:
> Aggregate (cost=498.84 rows=1 width=4)
> -> Seq Scan on usuarios (cost=498.84 rows=1 width=4)
Well, it's sure confused about the selectivity of WHERE activa,
all right.
I tried to duplicate this here, by duplicating the table definition you
sent and filling it with some junk data --- about 1800 rows, 1500 of
which had activa = 't'. I found that after loading the table and
running a plain "vacuum", the system indeed estimated one row out, just
as you show above. But after "vacuum analyze", it estimated 1360 rows
out, which is a lot closer to reality (and would make a big difference
in the plan selected for a join).
Now I know you said you did a "vacuum analyze" on the table, but
I am wondering if maybe you got confused about what you did.
Please try it again just to make sure.
The only other explanation I can think of is that I am not running this
test on a pristine 6.5.2 release, but on a recent CVS update from the
REL6_5 branch. I don't see any indication that anything has been
changed in the selectivity code since 6.5 in that branch, but maybe I
missed something. You might need to update to almost-6.5.3. (I am not
sure if there is a beta-test tarball for 6.5.3 or not; if not, you could
pull the sources from the CVS server, or wait for 6.5.3 which should be
out very soon.)
BTW, current sources (7.0-to-be) get the estimate spot-on after "vacuum
analyze", though without it they are not much better than 6.5. The
current system is estimating 1% of the rows will match, because it's
treating the WHERE condition like "WHERE activa = 't'" and the default
estimate for "=" selectivity is 1% in the absence of VACUUM ANALYZE
statistics. Probably we ought to special-case boolean columns to
default to a 50% estimate if no statistics are available...
regards, tom lane