Re: [HACKERS] Neverending query on 6.5.2 over Solaris 2.5.1 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Neverending query on 6.5.2 over Solaris 2.5.1
Date
Msg-id 2359.940720039@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Neverending query on 6.5.2 over Solaris 2.5.1  (Fernando Schapachnik <fpscha@ns1.via-net-works.net.ar>)
Responses Re: [HACKERS] Neverending query on 6.5.2 over Solaris 2.5.1
List pgsql-hackers
Fernando Schapachnik <fpscha@ns1.via-net-works.net.ar> writes:
>> It'd be useful to double-check my theory that the system is
>> misestimating the selectivity of the WHERE (u.activa) clause.
>> You could try this:
>> SELECT count(*) FROM usarios WHERE activa;

> 10571

>> EXPLAIN SELECT count(*) FROM usarios WHERE activa;
>> and see how far off the row count estimate in the EXPLAIN is
>> from reality.

> NOTICE:  QUERY PLAN:

> Aggregate  (cost=498.84 rows=1 width=4)
>   -> Seq Scan on usuarios  (cost=498.84 rows=1 width=4)

Well, it's sure confused about the selectivity of WHERE activa,
all right.

I tried to duplicate this here, by duplicating the table definition you
sent and filling it with some junk data --- about 1800 rows, 1500 of
which had activa = 't'.  I found that after loading the table and
running a plain "vacuum", the system indeed estimated one row out, just
as you show above.  But after "vacuum analyze", it estimated 1360 rows
out, which is a lot closer to reality (and would make a big difference
in the plan selected for a join).

Now I know you said you did a "vacuum analyze" on the table, but
I am wondering if maybe you got confused about what you did.
Please try it again just to make sure.

The only other explanation I can think of is that I am not running this
test on a pristine 6.5.2 release, but on a recent CVS update from the
REL6_5 branch.  I don't see any indication that anything has been
changed in the selectivity code since 6.5 in that branch, but maybe I
missed something.  You might need to update to almost-6.5.3.  (I am not
sure if there is a beta-test tarball for 6.5.3 or not; if not, you could
pull the sources from the CVS server, or wait for 6.5.3 which should be
out very soon.)

BTW, current sources (7.0-to-be) get the estimate spot-on after "vacuum
analyze", though without it they are not much better than 6.5.  The
current system is estimating 1% of the rows will match, because it's
treating the WHERE condition like "WHERE activa = 't'" and the default
estimate for "=" selectivity is 1% in the absence of VACUUM ANALYZE
statistics.  Probably we ought to special-case boolean columns to
default to a 50% estimate if no statistics are available...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Path-length follies
Next
From: "Hiroshi Inoue"
Date:
Subject: System indexes are never unique indexes( was RE: [HACKERS] mdnblocks is an amazing time sink in huge relations)