Re: appendum: Re: *really* simple select doesn't use indices ... - Mailing list pgsql-hackers

From The Hermit Hacker
Subject Re: appendum: Re: *really* simple select doesn't use indices ...
Date
Msg-id Pine.BSF.4.33.0105291110360.82504-100000@mobile.hub.org
Whole thread Raw
In response to Re: appendum: Re: *really* simple select doesn't use indices ...  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, 29 May 2001, Tom Lane wrote:

> Gavin Sherry <swm@linuxworld.com.au> writes:
> > The column 'zip' is of type text. As such, indices will not be used except
> > in the case when the where clause is WHERE zip ~ '^<text>' for btree
> > indices.
>
> Uh ... nonsense.

Oh good, I was worried there for a sec ... :)

> > On Tue, 29 May 2001, Marc G. Fournier wrote:
> >> globalmatch=# vacuum verbose analyze locations;
> >> NOTICE:  --Relation locations--
> >> NOTICE:  Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen76, MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec.
 
>
> >> globalmatch=# explain SELECT location from locations WHERE zip = '80012';
> >> NOTICE:  QUERY PLAN:
> >>
> >> Seq Scan on locations  (cost=0.00..2939.64 rows=4217 width=16)
>
> Okay, so it thinks that "zip = '80012'" will match 4217 out of 123571
> rows, which is more than enough to drive it to a sequential scan
> (with an average of more than three matched rows on every page of the
> relation, there'd be no I/O savings at all from consulting the index).
>
> Since the real number of matches is only 1, this estimate is obviously
> way off.  In 7.1 the estimate is being driven by the frequency of the
> most common value in the column --- what is the most common value? If
> you're lucky, the most common value is a dummy (empty string, maybe)
> that you could replace by NULL with a few simple changes in
> application logic.  7.1 is smart enough to distinguish NULL from real
> data values in its estimates.  If you're not lucky, there really are a
> few values that are far more common than average, in which case you're
> stuck unless you want to run development sources.  Current sources
> should do a lot better on that kind of data distribution.

Hit it right on the mark:
 zip  |  cnt
-------+-------      | 8140300210 |     100211 |     1

Will look at the code and see what I can do abuot that NULL issue ...
thanks :)



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: appendum: Re: *really* simple select doesn't use indices ...
Next
From: Tom Lane
Date:
Subject: Re: /contrib/unixdate: Broke in cvs tip.