Thread: *really* simple select doesn't use indices ...

*really* simple select doesn't use indices ...

From
"Marc G. Fournier"
Date:
First, this is still a v7.1 system ... its totally possible that this is
long fixed, and I'm way overdue to get it to v7.1.2, which I'll gladly
accept as a response ...

That said ... seems like a very painful way to arrive at 1 row ... :)

table structure:

globalmatch=# \d locations                             Table "locations"Attribute |  Type   |
Modifier
-----------+---------+--------------------------------------------------------gid       | integer | not null default
nextval('locationstmp_gid_seq'::text)city     | text    |state     | text    |country   | text    |zip       | text
|location | point   |
 
Indices: locations_zip,        locationstmp_gid_key

globalmatch=# \d locations_zip
Index "locations_zip"Attribute | Type
-----------+------zip       | text
btree

globalmatch=# EXPLAIN SELECT count(location) from locations WHERE zip = '80012';
NOTICE:  QUERY PLAN:

Aggregate  (cost=2950.18..2950.18 rows=1 width=16) ->  Seq Scan on locations  (cost=0.00..2939.64 rows=4217 width=16)

EXPLAIN

globalmatch=# SELECT count(location) from locations WHERE zip = '80012';count
-------    1
(1 row)

globalmatch=# SELECT count(location) from locations;count
--------123571
(1 row)





appendum: Re: *really* simple select doesn't use indices ...

From
"Marc G. Fournier"
Date:
Okay, just bit the bullet, upgraded to v7.1.2, and the problem still
persists:

globalmatch=# vacuum verbose analyze locations;
NOTICE:  --Relation locations--
NOTICE:  Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen
76,MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec.
 
NOTICE:  Index locationstmp_gid_key: Pages 272; Tuples 123571. CPU 0.01s/0.15u sec.
NOTICE:  Index locations_zip: Pages 320; Tuples 123571. CPU 0.02s/0.14u sec.
NOTICE:  Index locations_country: Pages 342; Tuples 123571. CPU 0.03s/0.13u sec.
NOTICE:  --Relation pg_toast_9373225--
NOTICE:  Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen
0;Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
 
NOTICE:  Index pg_toast_9373225_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u sec.
NOTICE:  Analyzing...
VACUUM
globalmatch=# explain SELECT location from locations WHERE zip = '80012';
NOTICE:  QUERY PLAN:

Seq Scan on locations  (cost=0.00..2939.64 rows=4217 width=16)

EXPLAIN
globalmatch=# select version();                              version
---------------------------------------------------------------------PostgreSQL 7.1.2 on i386-unknown-freebsd4.3,
compiledby GCC 2.95.3
 
(1 row)


On Mon, 28 May 2001, Marc G. Fournier wrote:

>
> First, this is still a v7.1 system ... its totally possible that this is
> long fixed, and I'm way overdue to get it to v7.1.2, which I'll gladly
> accept as a response ...
>
> That said ... seems like a very painful way to arrive at 1 row ... :)
>
> table structure:
>
> globalmatch=# \d locations
>                               Table "locations"
>  Attribute |  Type   |                        Modifier
> -----------+---------+--------------------------------------------------------
>  gid       | integer | not null default nextval('locationstmp_gid_seq'::text)
>  city      | text    |
>  state     | text    |
>  country   | text    |
>  zip       | text    |
>  location  | point   |
> Indices: locations_zip,
>          locationstmp_gid_key
>
> globalmatch=# \d locations_zip
> Index "locations_zip"
>  Attribute | Type
> -----------+------
>  zip       | text
> btree
>
> globalmatch=# EXPLAIN SELECT count(location) from locations WHERE zip = '80012';
> NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=2950.18..2950.18 rows=1 width=16)
>   ->  Seq Scan on locations  (cost=0.00..2939.64 rows=4217 width=16)
>
> EXPLAIN
>
> globalmatch=# SELECT count(location) from locations WHERE zip = '80012';
>  count
> -------
>      1
> (1 row)
>
> globalmatch=# SELECT count(location) from locations;
>  count
> --------
>  123571
> (1 row)
>
>
>
>

Marc G. Fournier                               scrappy@hub.org
Systems Administrator @ hub.org
scrappy@{postgresql|isc}.org                       ICQ#7615664



Re: appendum: Re: *really* simple select doesn't use indices ...

From
Gavin Sherry
Date:
Marc,

The column 'zip' is of type text. As such, indices will not be used except
in the case when the where clause is WHERE zip ~ '^<text>' for btree
indices.

Gavin

On Tue, 29 May 2001, Marc G. Fournier wrote:

> 
> Okay, just bit the bullet, upgraded to v7.1.2, and the problem still
> persists:
> 
> globalmatch=# vacuum verbose analyze locations;
> NOTICE:  --Relation locations--
> NOTICE:  Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen
76,MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec.
 
> NOTICE:  Index locationstmp_gid_key: Pages 272; Tuples 123571. CPU 0.01s/0.15u sec.
> NOTICE:  Index locations_zip: Pages 320; Tuples 123571. CPU 0.02s/0.14u sec.
> NOTICE:  Index locations_country: Pages 342; Tuples 123571. CPU 0.03s/0.13u sec.
> NOTICE:  --Relation pg_toast_9373225--
> NOTICE:  Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0,
MaxLen0; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
 
> NOTICE:  Index pg_toast_9373225_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u sec.
> NOTICE:  Analyzing...
> VACUUM
> globalmatch=# explain SELECT location from locations WHERE zip = '80012';
> NOTICE:  QUERY PLAN:
> 
> Seq Scan on locations  (cost=0.00..2939.64 rows=4217 width=16)
> 
> EXPLAIN
> globalmatch=# select version();
>                                version
> ---------------------------------------------------------------------
>  PostgreSQL 7.1.2 on i386-unknown-freebsd4.3, compiled by GCC 2.95.3
> (1 row)
> 
> 
> On Mon, 28 May 2001, Marc G. Fournier wrote:
> 
> >
> > First, this is still a v7.1 system ... its totally possible that this is
> > long fixed, and I'm way overdue to get it to v7.1.2, which I'll gladly
> > accept as a response ...
> >
> > That said ... seems like a very painful way to arrive at 1 row ... :)
> >
> > table structure:
> >
> > globalmatch=# \d locations
> >                               Table "locations"
> >  Attribute |  Type   |                        Modifier
> > -----------+---------+--------------------------------------------------------
> >  gid       | integer | not null default nextval('locationstmp_gid_seq'::text)
> >  city      | text    |
> >  state     | text    |
> >  country   | text    |
> >  zip       | text    |
> >  location  | point   |
> > Indices: locations_zip,
> >          locationstmp_gid_key
> >
> > globalmatch=# \d locations_zip
> > Index "locations_zip"
> >  Attribute | Type
> > -----------+------
> >  zip       | text
> > btree
> >
> > globalmatch=# EXPLAIN SELECT count(location) from locations WHERE zip = '80012';
> > NOTICE:  QUERY PLAN:
> >
> > Aggregate  (cost=2950.18..2950.18 rows=1 width=16)
> >   ->  Seq Scan on locations  (cost=0.00..2939.64 rows=4217 width=16)
> >
> > EXPLAIN
> >
> > globalmatch=# SELECT count(location) from locations WHERE zip = '80012';
> >  count
> > -------
> >      1
> > (1 row)
> >
> > globalmatch=# SELECT count(location) from locations;
> >  count
> > --------
> >  123571
> > (1 row)
> >
> >
> >
> >
> 
> Marc G. Fournier                               scrappy@hub.org
> Systems Administrator @ hub.org
> scrappy@{postgresql|isc}.org                       ICQ#7615664
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 



Re: appendum: Re: *really* simple select doesn't use indices ...

From
Tom Lane
Date:
Gavin Sherry <swm@linuxworld.com.au> writes:
> The column 'zip' is of type text. As such, indices will not be used except
> in the case when the where clause is WHERE zip ~ '^<text>' for btree
> indices.

Uh ... nonsense.

> On Tue, 29 May 2001, Marc G. Fournier wrote:
>> globalmatch=# vacuum verbose analyze locations;
>> NOTICE:  --Relation locations--
>> NOTICE:  Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen
76,MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec.
 

>> globalmatch=# explain SELECT location from locations WHERE zip = '80012';
>> NOTICE:  QUERY PLAN:
>> 
>> Seq Scan on locations  (cost=0.00..2939.64 rows=4217 width=16)

Okay, so it thinks that "zip = '80012'" will match 4217 out of 123571
rows, which is more than enough to drive it to a sequential scan
(with an average of more than three matched rows on every page of the
relation, there'd be no I/O savings at all from consulting the index).

Since the real number of matches is only 1, this estimate is obviously
way off.  In 7.1 the estimate is being driven by the frequency of the
most common value in the column --- what is the most common value?
If you're lucky, the most common value is a dummy (empty string, maybe)
that you could replace by NULL with a few simple changes in application
logic.  7.1 is smart enough to distinguish NULL from real data values
in its estimates.  If you're not lucky, there really are a few values
that are far more common than average, in which case you're stuck unless
you want to run development sources.  Current sources should do a lot
better on that kind of data distribution.
        regards, tom lane


Re: appendum: Re: *really* simple select doesn't use indices ...

From
The Hermit Hacker
Date:
On Tue, 29 May 2001, Tom Lane wrote:

> Gavin Sherry <swm@linuxworld.com.au> writes:
> > The column 'zip' is of type text. As such, indices will not be used except
> > in the case when the where clause is WHERE zip ~ '^<text>' for btree
> > indices.
>
> Uh ... nonsense.

Oh good, I was worried there for a sec ... :)

> > On Tue, 29 May 2001, Marc G. Fournier wrote:
> >> globalmatch=# vacuum verbose analyze locations;
> >> NOTICE:  --Relation locations--
> >> NOTICE:  Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0,
MinLen76, MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec.
 
>
> >> globalmatch=# explain SELECT location from locations WHERE zip = '80012';
> >> NOTICE:  QUERY PLAN:
> >>
> >> Seq Scan on locations  (cost=0.00..2939.64 rows=4217 width=16)
>
> Okay, so it thinks that "zip = '80012'" will match 4217 out of 123571
> rows, which is more than enough to drive it to a sequential scan
> (with an average of more than three matched rows on every page of the
> relation, there'd be no I/O savings at all from consulting the index).
>
> Since the real number of matches is only 1, this estimate is obviously
> way off.  In 7.1 the estimate is being driven by the frequency of the
> most common value in the column --- what is the most common value? If
> you're lucky, the most common value is a dummy (empty string, maybe)
> that you could replace by NULL with a few simple changes in
> application logic.  7.1 is smart enough to distinguish NULL from real
> data values in its estimates.  If you're not lucky, there really are a
> few values that are far more common than average, in which case you're
> stuck unless you want to run development sources.  Current sources
> should do a lot better on that kind of data distribution.

Hit it right on the mark:
 zip  |  cnt
-------+-------      | 8140300210 |     100211 |     1

Will look at the code and see what I can do abuot that NULL issue ...
thanks :)



Re: *really* simple select doesn't use indices ...

From
mlw
Date:
This is one my top two problem with Postgres, the seemingly braindead index
selection mechanism.

First, of course try "VACUUM ANALYZE'
Then if the fails, try

set ENABLE_SEQSCAN = off;

Then try your query.


"Marc G. Fournier" wrote:

> First, this is still a v7.1 system ... its totally possible that this is
> long fixed, and I'm way overdue to get it to v7.1.2, which I'll gladly
> accept as a response ...
>
> That said ... seems like a very painful way to arrive at 1 row ... :)
>
> table structure:
>
> globalmatch=# \d locations
>                               Table "locations"
>  Attribute |  Type   |                        Modifier
> -----------+---------+--------------------------------------------------------
>  gid       | integer | not null default nextval('locationstmp_gid_seq'::text)
>  city      | text    |
>  state     | text    |
>  country   | text    |
>  zip       | text    |
>  location  | point   |
> Indices: locations_zip,
>          locationstmp_gid_key
>
> globalmatch=# \d locations_zip
> Index "locations_zip"
>  Attribute | Type
> -----------+------
>  zip       | text
> btree
>
> globalmatch=# EXPLAIN SELECT count(location) from locations WHERE zip = '80012';
> NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=2950.18..2950.18 rows=1 width=16)
>   ->  Seq Scan on locations  (cost=0.00..2939.64 rows=4217 width=16)
>
> EXPLAIN
>
> globalmatch=# SELECT count(location) from locations WHERE zip = '80012';
>  count
> -------
>      1
> (1 row)
>
> globalmatch=# SELECT count(location) from locations;
>  count
> --------
>  123571
> (1 row)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)