Thread: *really* simple select doesn't use indices ...
First, this is still a v7.1 system ... its totally possible that this is long fixed, and I'm way overdue to get it to v7.1.2, which I'll gladly accept as a response ... That said ... seems like a very painful way to arrive at 1 row ... :) table structure: globalmatch=# \d locations Table "locations"Attribute | Type | Modifier -----------+---------+--------------------------------------------------------gid | integer | not null default nextval('locationstmp_gid_seq'::text)city | text |state | text |country | text |zip | text |location | point | Indices: locations_zip, locationstmp_gid_key globalmatch=# \d locations_zip Index "locations_zip"Attribute | Type -----------+------zip | text btree globalmatch=# EXPLAIN SELECT count(location) from locations WHERE zip = '80012'; NOTICE: QUERY PLAN: Aggregate (cost=2950.18..2950.18 rows=1 width=16) -> Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16) EXPLAIN globalmatch=# SELECT count(location) from locations WHERE zip = '80012';count ------- 1 (1 row) globalmatch=# SELECT count(location) from locations;count --------123571 (1 row)
Okay, just bit the bullet, upgraded to v7.1.2, and the problem still persists: globalmatch=# vacuum verbose analyze locations; NOTICE: --Relation locations-- NOTICE: Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 76,MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec. NOTICE: Index locationstmp_gid_key: Pages 272; Tuples 123571. CPU 0.01s/0.15u sec. NOTICE: Index locations_zip: Pages 320; Tuples 123571. CPU 0.02s/0.14u sec. NOTICE: Index locations_country: Pages 342; Tuples 123571. CPU 0.03s/0.13u sec. NOTICE: --Relation pg_toast_9373225-- NOTICE: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0;Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec. NOTICE: Index pg_toast_9373225_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u sec. NOTICE: Analyzing... VACUUM globalmatch=# explain SELECT location from locations WHERE zip = '80012'; NOTICE: QUERY PLAN: Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16) EXPLAIN globalmatch=# select version(); version ---------------------------------------------------------------------PostgreSQL 7.1.2 on i386-unknown-freebsd4.3, compiledby GCC 2.95.3 (1 row) On Mon, 28 May 2001, Marc G. Fournier wrote: > > First, this is still a v7.1 system ... its totally possible that this is > long fixed, and I'm way overdue to get it to v7.1.2, which I'll gladly > accept as a response ... > > That said ... seems like a very painful way to arrive at 1 row ... :) > > table structure: > > globalmatch=# \d locations > Table "locations" > Attribute | Type | Modifier > -----------+---------+-------------------------------------------------------- > gid | integer | not null default nextval('locationstmp_gid_seq'::text) > city | text | > state | text | > country | text | > zip | text | > location | point | > Indices: locations_zip, > locationstmp_gid_key > > globalmatch=# \d locations_zip > Index "locations_zip" > Attribute | Type > -----------+------ > zip | text > btree > > globalmatch=# EXPLAIN SELECT count(location) from locations WHERE zip = '80012'; > NOTICE: QUERY PLAN: > > Aggregate (cost=2950.18..2950.18 rows=1 width=16) > -> Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16) > > EXPLAIN > > globalmatch=# SELECT count(location) from locations WHERE zip = '80012'; > count > ------- > 1 > (1 row) > > globalmatch=# SELECT count(location) from locations; > count > -------- > 123571 > (1 row) > > > > Marc G. Fournier scrappy@hub.org Systems Administrator @ hub.org scrappy@{postgresql|isc}.org ICQ#7615664
Marc, The column 'zip' is of type text. As such, indices will not be used except in the case when the where clause is WHERE zip ~ '^<text>' for btree indices. Gavin On Tue, 29 May 2001, Marc G. Fournier wrote: > > Okay, just bit the bullet, upgraded to v7.1.2, and the problem still > persists: > > globalmatch=# vacuum verbose analyze locations; > NOTICE: --Relation locations-- > NOTICE: Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 76,MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec. > NOTICE: Index locationstmp_gid_key: Pages 272; Tuples 123571. CPU 0.01s/0.15u sec. > NOTICE: Index locations_zip: Pages 320; Tuples 123571. CPU 0.02s/0.14u sec. > NOTICE: Index locations_country: Pages 342; Tuples 123571. CPU 0.03s/0.13u sec. > NOTICE: --Relation pg_toast_9373225-- > NOTICE: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen0; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec. > NOTICE: Index pg_toast_9373225_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u sec. > NOTICE: Analyzing... > VACUUM > globalmatch=# explain SELECT location from locations WHERE zip = '80012'; > NOTICE: QUERY PLAN: > > Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16) > > EXPLAIN > globalmatch=# select version(); > version > --------------------------------------------------------------------- > PostgreSQL 7.1.2 on i386-unknown-freebsd4.3, compiled by GCC 2.95.3 > (1 row) > > > On Mon, 28 May 2001, Marc G. Fournier wrote: > > > > > First, this is still a v7.1 system ... its totally possible that this is > > long fixed, and I'm way overdue to get it to v7.1.2, which I'll gladly > > accept as a response ... > > > > That said ... seems like a very painful way to arrive at 1 row ... :) > > > > table structure: > > > > globalmatch=# \d locations > > Table "locations" > > Attribute | Type | Modifier > > -----------+---------+-------------------------------------------------------- > > gid | integer | not null default nextval('locationstmp_gid_seq'::text) > > city | text | > > state | text | > > country | text | > > zip | text | > > location | point | > > Indices: locations_zip, > > locationstmp_gid_key > > > > globalmatch=# \d locations_zip > > Index "locations_zip" > > Attribute | Type > > -----------+------ > > zip | text > > btree > > > > globalmatch=# EXPLAIN SELECT count(location) from locations WHERE zip = '80012'; > > NOTICE: QUERY PLAN: > > > > Aggregate (cost=2950.18..2950.18 rows=1 width=16) > > -> Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16) > > > > EXPLAIN > > > > globalmatch=# SELECT count(location) from locations WHERE zip = '80012'; > > count > > ------- > > 1 > > (1 row) > > > > globalmatch=# SELECT count(location) from locations; > > count > > -------- > > 123571 > > (1 row) > > > > > > > > > > Marc G. Fournier scrappy@hub.org > Systems Administrator @ hub.org > scrappy@{postgresql|isc}.org ICQ#7615664 > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Gavin Sherry <swm@linuxworld.com.au> writes: > The column 'zip' is of type text. As such, indices will not be used except > in the case when the where clause is WHERE zip ~ '^<text>' for btree > indices. Uh ... nonsense. > On Tue, 29 May 2001, Marc G. Fournier wrote: >> globalmatch=# vacuum verbose analyze locations; >> NOTICE: --Relation locations-- >> NOTICE: Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 76,MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec. >> globalmatch=# explain SELECT location from locations WHERE zip = '80012'; >> NOTICE: QUERY PLAN: >> >> Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16) Okay, so it thinks that "zip = '80012'" will match 4217 out of 123571 rows, which is more than enough to drive it to a sequential scan (with an average of more than three matched rows on every page of the relation, there'd be no I/O savings at all from consulting the index). Since the real number of matches is only 1, this estimate is obviously way off. In 7.1 the estimate is being driven by the frequency of the most common value in the column --- what is the most common value? If you're lucky, the most common value is a dummy (empty string, maybe) that you could replace by NULL with a few simple changes in application logic. 7.1 is smart enough to distinguish NULL from real data values in its estimates. If you're not lucky, there really are a few values that are far more common than average, in which case you're stuck unless you want to run development sources. Current sources should do a lot better on that kind of data distribution. regards, tom lane
On Tue, 29 May 2001, Tom Lane wrote: > Gavin Sherry <swm@linuxworld.com.au> writes: > > The column 'zip' is of type text. As such, indices will not be used except > > in the case when the where clause is WHERE zip ~ '^<text>' for btree > > indices. > > Uh ... nonsense. Oh good, I was worried there for a sec ... :) > > On Tue, 29 May 2001, Marc G. Fournier wrote: > >> globalmatch=# vacuum verbose analyze locations; > >> NOTICE: --Relation locations-- > >> NOTICE: Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen76, MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec. > > >> globalmatch=# explain SELECT location from locations WHERE zip = '80012'; > >> NOTICE: QUERY PLAN: > >> > >> Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16) > > Okay, so it thinks that "zip = '80012'" will match 4217 out of 123571 > rows, which is more than enough to drive it to a sequential scan > (with an average of more than three matched rows on every page of the > relation, there'd be no I/O savings at all from consulting the index). > > Since the real number of matches is only 1, this estimate is obviously > way off. In 7.1 the estimate is being driven by the frequency of the > most common value in the column --- what is the most common value? If > you're lucky, the most common value is a dummy (empty string, maybe) > that you could replace by NULL with a few simple changes in > application logic. 7.1 is smart enough to distinguish NULL from real > data values in its estimates. If you're not lucky, there really are a > few values that are far more common than average, in which case you're > stuck unless you want to run development sources. Current sources > should do a lot better on that kind of data distribution. Hit it right on the mark: zip | cnt -------+------- | 8140300210 | 100211 | 1 Will look at the code and see what I can do abuot that NULL issue ... thanks :)
This is one my top two problem with Postgres, the seemingly braindead index selection mechanism. First, of course try "VACUUM ANALYZE' Then if the fails, try set ENABLE_SEQSCAN = off; Then try your query. "Marc G. Fournier" wrote: > First, this is still a v7.1 system ... its totally possible that this is > long fixed, and I'm way overdue to get it to v7.1.2, which I'll gladly > accept as a response ... > > That said ... seems like a very painful way to arrive at 1 row ... :) > > table structure: > > globalmatch=# \d locations > Table "locations" > Attribute | Type | Modifier > -----------+---------+-------------------------------------------------------- > gid | integer | not null default nextval('locationstmp_gid_seq'::text) > city | text | > state | text | > country | text | > zip | text | > location | point | > Indices: locations_zip, > locationstmp_gid_key > > globalmatch=# \d locations_zip > Index "locations_zip" > Attribute | Type > -----------+------ > zip | text > btree > > globalmatch=# EXPLAIN SELECT count(location) from locations WHERE zip = '80012'; > NOTICE: QUERY PLAN: > > Aggregate (cost=2950.18..2950.18 rows=1 width=16) > -> Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16) > > EXPLAIN > > globalmatch=# SELECT count(location) from locations WHERE zip = '80012'; > count > ------- > 1 > (1 row) > > globalmatch=# SELECT count(location) from locations; > count > -------- > 123571 > (1 row) > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)